比较两个文件：通过一列比较，行存在于一个文件中，而不存在于另一个文件中

Question 1

join要求对文件进行预排序，因为它们位于 esample 的 args 中join，因此如果您需要维护文件的顺序输出，这需要不同的方法。请注意，它不会尝试保持原始字段间距的宽度。

join -1 2 -2 2 -v 1 <(sort file1) <(sort file2)

输出

21 12342 2
21 12349 7

Answer

join要求对文件进行预排序，因为它们位于 esample 的 args 中join，因此如果您需要维护文件的顺序输出，这需要不同的方法。请注意，它不会尝试保持原始字段间距的宽度。

join -1 2 -2 2 -v 1 <(sort file1) <(sort file2)

输出

21 12342 2
21 12349 7

Question 2

一种awk解决方案：

awk '
    FNR == NR {
        data[ $2 ] = 1;
        next;
    }
    FNR < NR {
        if ( ! ($2 in data) ) {
            print $0;
        }
    }
' file2 file1

结果：

21  12342   2
21  12349   7

Answer

一种awk解决方案：

awk '
    FNR == NR {
        data[ $2 ] = 1;
        next;
    }
    FNR < NR {
        if ( ! ($2 in data) ) {
            print $0;
        }
    }
' file2 file1

结果：

21  12342   2
21  12349   7

Question 3

从 bash shell 使用 Python：

paddy$ python -c 'import sys
with open(sys.argv[2]) as f: file2col2 = {line.split()[1] for line in f}
with open(sys.argv[1]) as f: print("".join(line for line in f 
                                           if line.split()[1] not in file2col2))
' file1.tmp file2.tmp
21  12342   2
21  12349   7

paddy$

Answer

从 bash shell 使用 Python：

paddy$ python -c 'import sys
with open(sys.argv[2]) as f: file2col2 = {line.split()[1] for line in f}
with open(sys.argv[1]) as f: print("".join(line for line in f 
                                           if line.split()[1] not in file2col2))
' file1.tmp file2.tmp
21  12342   2
21  12349   7

paddy$

Question 4

使用egrep和awk：

egrep -v -f <(awk '{printf "^%s[ ]+%s[ ]+\n", $1, $2}' file2) file1

awk里面的位根据<()的内容生成模式file2。使用egrep这些模式来匹配中的行file1，并-v反转匹配，仅打印不匹配的行。

Answer

使用egrep和awk：

egrep -v -f <(awk '{printf "^%s[ ]+%s[ ]+\n", $1, $2}' file2) file1

awk里面的位根据<()的内容生成模式file2。使用egrep这些模式来匹配中的行file1，并-v反转匹配，仅打印不匹配的行。

比较两个文件：通过一列比较，行存在于一个文件中，而不存在于另一个文件中

答案1

答案2

答案3

答案4

相关内容