比较 2 个文件并将输出存储为 file1_value、file2_value、Match/NoMatch

Question 1

使用comm排序后的数据：

$  comm <( sort -n file1 ) <( sort -n file2 )
                1
                2
2
                3
5
        6

此输出以制表符分隔。我们可以将第 1 列和第 2 列中的所有内容标记为“NoMatch”，将第 3 列中的所有内容标记为“Match” awk：

$ comm  <( sort -n file1 ) <( sort -n file2 ) |
  awk -F$'\t' 'BEGIN { OFS="," } $3 { print $3, $3, "Match"; next } { print $1, $2, "NoMatch" }'
1,1,Match
2,2,Match
2,,NoMatch
3,3,Match
5,,NoMatch
,6,NoMatch

该awk脚本将读取制表符分隔的输入 ( -F$'\t') 并使用逗号作为输出字段分隔符 ( OFS=",")。如果第 3 字段中有内容，那么它将Match在第三字段中输出两次，然后继续下一行。否则，它将从输入中输出字段 1 和 2 以及NoMatch第三个字段。

Answer

使用comm排序后的数据：

$  comm <( sort -n file1 ) <( sort -n file2 )
                1
                2
2
                3
5
        6

此输出以制表符分隔。我们可以将第 1 列和第 2 列中的所有内容标记为“NoMatch”，将第 3 列中的所有内容标记为“Match” awk：

$ comm  <( sort -n file1 ) <( sort -n file2 ) |
  awk -F$'\t' 'BEGIN { OFS="," } $3 { print $3, $3, "Match"; next } { print $1, $2, "NoMatch" }'
1,1,Match
2,2,Match
2,,NoMatch
3,3,Match
5,,NoMatch
,6,NoMatch

该awk脚本将读取制表符分隔的输入 ( -F$'\t') 并使用逗号作为输出字段分隔符 ( OFS=",")。如果第 3 字段中有内容，那么它将Match在第三字段中输出两次，然后继续下一行。否则，它将从输入中输出字段 1 和 2 以及NoMatch第三个字段。

Question 2

将此 perl 脚本保存为文件 xxx 并运行它perl xxx file1 file2

#!/usr/bin/perl

# save the first two files, the <> slurp clears @ARGV
($f1,$f2) = @ARGV;

# build a hash of hash of lines from all files,
# with the filename as key
do { chomp; push @{$hash{$ARGV}}, $_ } while <>;

# compare every line until both are empty
# the hash slice is a short expression for
# $a = $hash{$f1}->[$x]
# $b = $hash{$f2}->[$x]
for ($x=0;;$x++) {
   ($a,$b) = map { $$_[$x] } @hash{$f1,$f2};
   last unless $a or $b;
   printf "%s,%s,%s\n", $a, $b, $a eq $b ? 'Match' : 'NoMatch';
}

Answer

将此 perl 脚本保存为文件 xxx 并运行它perl xxx file1 file2

#!/usr/bin/perl

# save the first two files, the <> slurp clears @ARGV
($f1,$f2) = @ARGV;

# build a hash of hash of lines from all files,
# with the filename as key
do { chomp; push @{$hash{$ARGV}}, $_ } while <>;

# compare every line until both are empty
# the hash slice is a short expression for
# $a = $hash{$f1}->[$x]
# $b = $hash{$f2}->[$x]
for ($x=0;;$x++) {
   ($a,$b) = map { $$_[$x] } @hash{$f1,$f2};
   last unless $a or $b;
   printf "%s,%s,%s\n", $a, $b, $a eq $b ? 'Match' : 'NoMatch';
}

比较 2 个文件并将输出存储为 file1_value、file2_value、Match/NoMatch

答案1

答案2

相关内容