与另一个文件相比查找一个文件中的新行

Question 1

start cmd:> awk 'FNR == NR { oldfile[$0]=1; }; 
  FNR != NR { if(oldfile[$0]==0) print; }' file1 file2
delta
omega
rho
phi

Answer

start cmd:> awk 'FNR == NR { oldfile[$0]=1; }; 
  FNR != NR { if(oldfile[$0]==0) print; }' file1 file2
delta
omega
rho
phi

Question 2

我会用grep

grep -Fxvf oldfile newfile

-F：使用固定字符串模式（无元字符）

-x：匹配整行（不是子字符串）

-f oldfile：读取要匹配的字符串oldfile

-v：反转匹配，即打印未在其中找到的字符串oldfile

Answer

我会用grep

grep -Fxvf oldfile newfile

-F：使用固定字符串模式（无元字符）

-x：匹配整行（不是子字符串）

-f oldfile：读取要匹配的字符串oldfile

-v：反转匹配，即打印未在其中找到的字符串oldfile

Question 3

更短的awk命令：

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2

如果file1可以为空，则替换NR==FNR为FILENAME==ARGV[1]。

grep -Fxvf file2 file1对于大文件来说速度很慢：

$ jot -r 10000 1 100000 >file1;jot -r 10000 1 100000 >file2
$ time awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2 >/dev/null
0.015
$ time grep -Fxvf file2 file1 >/dev/null
36.758
$ time comm -13 <(sort file1) <(sort file2)>/dev/null
0.173

如果需要删除重复的行，请使用

awk 'NR==FNR{a[$0];next}!b[$0]++&&!($0 in a)' file1 file2

或者

comm -13 <(sort file1) <(sort -u file2)

Answer