我有一个要求,我需要比较两个文件的每一列,并将相应的差异写入另一个文件中,以及一些显示不匹配列的标识。指出不匹配的列是我的主要问题陈述。例如我们有这样的文件:
File 1
1|piyush|bangalore|dev
1|piyush|bangalore|QA
2|pankaj|bangalore|dev
3|rohit|delhi|QA
File 2
1|piyush|bangalore|QA
1|piyush|bangalore|QA
2|pankaj|bangalore|dev
3|rohit|bangalore|dev
预期的输出文件看起来有点像。
File 1
1|piyush|bangalore|**dev**
File 2
1|piyush|bangalore|**QA**
File 1
3|rohit|**delhi**|**QA**
File 2
3|rohit|**bangalore**|**dev**
我想实现这样的功能,可以看到不匹配的列以及不匹配的行。我试过了
diff File1 File2 > Diff_File
但这只给了我不匹配的记录或行。我也没有办法指出不匹配的列。如果可以使用 shell 脚本或 awk 命令,请帮助我,因为我对此很陌生。提前致谢。
答案1
Python3.x解决方案:
diff_marked.py脚本:
import sys
file1_name = sys.argv[1]
file2_name = sys.argv[2]
with open(file1_name, 'r') as f1, open(file2_name, 'r') as f2:
f1_lines = f1.readlines() # list of lines of File1
f2_lines = f2.readlines() # list of lines of File2
for k,l in enumerate(f1_lines):
f1_fields = l.strip().split('|') # splitting a line into fields by separator '|'
if k < len(f2_lines) and f2_lines[k]:
has_diff = False
f2_fields = f2_lines[k].strip().split('|')
for i,f in enumerate(f1_fields):
if f != f2_fields[i]: # comparing respective lines 'field-by-field' between two files
f1_fields[i] = '**' + f + '**' # wrapping differing fields
f2_fields[i] = '**' + f2_fields[i] + '**'
has_diff = True
if has_diff:
print(f1.name) # print file name
print('|'.join(f1_fields))
print(f2.name)
print('|'.join(f2_fields))
用法:(你可能有另一个python版本,当前的情况已经过测试蟒蛇3.5)
python3.5 diff_marked.py File1 File2 > diff_output
diff_output
内容:
File1
1|piyush|bangalore|**dev**
File2
1|piyush|bangalore|**QA**
File1
3|rohit|**delhi**|**QA**
File2
3|rohit|**bangalore**|**dev**