基于 file1 和 file2 之间的部分匹配来匹配列并打印不匹配的行

Question 1

使用 GNU awk 进行多字符 RS：

$ awk '
    NR==FNR { a[$0]; next }
    { split($5,v,"|"); for (i in v) if (v[i] in a) next; print }
' FS='\t' RS='[[:space:]|]+' file1 RS='\n' file2
c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

或使用任何 awk：

$ awk '
    NR==FNR { for (i=1; i<=NF; i++) a[$i]; next }
    { split($5,v,"|"); for (i in v) if (v[i] in a) next; print }
' FS='|' file1 FS='\t' file2
c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

Answer

使用 GNU awk 进行多字符 RS：

$ awk '
    NR==FNR { a[$0]; next }
    { split($5,v,"|"); for (i in v) if (v[i] in a) next; print }
' FS='\t' RS='[[:space:]|]+' file1 RS='\n' file2
c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

或使用任何 awk：

$ awk '
    NR==FNR { for (i=1; i<=NF; i++) a[$i]; next }
    { split($5,v,"|"); for (i in v) if (v[i] in a) next; print }
' FS='|' file1 FS='\t' file2
c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

Question 2

与任何 awk 兼容。

awk 'BEGIN{FS=OFS="\t"}NR==FNR{split($0,vals,"|");for(i in vals){v[vals[i]]}}NR!=FNR{hide=0;for(j in v){if($5~j){hide=1}};if(!hide){print}}' ./file1 ./file2

我的结果是：

c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

描述：

NR==FNR: 在第一个文件中NR和FNR是相等的。

{v[vals[i]]}：创建不允许值的关联数组。

if($5~j){hide=1}：如果第 5 个字段中有不允许的值，则设置隐藏行

hide=0：重置新行的隐藏状态。

Answer

与任何 awk 兼容。

awk 'BEGIN{FS=OFS="\t"}NR==FNR{split($0,vals,"|");for(i in vals){v[vals[i]]}}NR!=FNR{hide=0;for(j in v){if($5~j){hide=1}};if(!hide){print}}' ./file1 ./file2

我的结果是：

c1 c1 c3 c4 ZZ16|YY15 c6
c1 c1 c3 c4 ZZ16 c6
c1 c1 c3 c4 AB1 c6

描述：

NR==FNR: 在第一个文件中NR和FNR是相等的。

{v[vals[i]]}：创建不允许值的关联数组。

if($5~j){hide=1}：如果第 5 个字段中有不允许的值，则设置隐藏行

hide=0：重置新行的隐藏状态。

Question 3

$ perl -lane '
    # is this the first file? ($fc is file counter)
    if ($fc == 0) {
      # split first field on pipe chars                         
      my @p = split /\|/, $F[0];
      # use as keys for %patterns hash
      foreach my $p (@p) { $patterns{$p} = 1 };
    } else {
     print unless $F[4] =~ /$regex/;
    };

    if (eof) { # end of file
      if ($fc == 0) { # is this still the first (zeroth) file?
        # use keys of %patterns to build a regular expression
        $regex = join "|", keys %patterns;
      };
      $fc++;
    }' file1 file2
c1      c1      c3      c4      ZZ16|YY15       c6
c1      c1      c3      c4      ZZ16    c6
c1      c1      c3      c4      AB1     c6

顺便说一句，这是一个较短的版本，中间变量较少，并且没有注释：

perl -lane '
  if ($fc == 0) {
    foreach (split /\|/, $F[0]) { $patterns{$_} = 1 };
  } else {
   print unless $F[4] =~ /$regex/;
  };

  if (eof) {
    $regex = join "|", keys %patterns if ($fc == 0);
    $fc++;
  }' file1 file2

如果你想让它不可读，你可以缩短变量名称，($c==0)用更短但等效的测试替换测试（对于新手来说更难理解，所以奖金！）(!$c)，将其全部压缩成一行并摆脱多余的空格和半字符-冒号没有改变它的运行方式。有些人更喜欢这个——受虐狂的货物崇拜 FTW！

perl -lane 'if(!$c){foreach(split/\|/,$F[0]){$p{$_}=1}}else{print unless $F[4]=~/$regex/};if(eof){$regex=join"|",keys %p if(!$c);$c++}' file1 file2

Answer

$ perl -lane '
    # is this the first file? ($fc is file counter)
    if ($fc == 0) {
      # split first field on pipe chars                         
      my @p = split /\|/, $F[0];
      # use as keys for %patterns hash
      foreach my $p (@p) { $patterns{$p} = 1 };
    } else {
     print unless $F[4] =~ /$regex/;
    };

    if (eof) { # end of file
      if ($fc == 0) { # is this still the first (zeroth) file?
        # use keys of %patterns to build a regular expression
        $regex = join "|", keys %patterns;
      };
      $fc++;
    }' file1 file2
c1      c1      c3      c4      ZZ16|YY15       c6
c1      c1      c3      c4      ZZ16    c6
c1      c1      c3      c4      AB1     c6

顺便说一句，这是一个较短的版本，中间变量较少，并且没有注释：

perl -lane '
  if ($fc == 0) {
    foreach (split /\|/, $F[0]) { $patterns{$_} = 1 };
  } else {
   print unless $F[4] =~ /$regex/;
  };

  if (eof) {
    $regex = join "|", keys %patterns if ($fc == 0);
    $fc++;
  }' file1 file2

如果你想让它不可读，你可以缩短变量名称，($c==0)用更短但等效的测试替换测试（对于新手来说更难理解，所以奖金！）(!$c)，将其全部压缩成一行并摆脱多余的空格和半字符-冒号没有改变它的运行方式。有些人更喜欢这个——受虐狂的货物崇拜 FTW！

perl -lane 'if(!$c){foreach(split/\|/,$F[0]){$p{$_}=1}}else{print unless $F[4]=~/$regex/};if(eof){$regex=join"|",keys %p if(!$c);$c++}' file1 file2

Question 4

在第二个文件中，第五个字段已包含逻辑元素 OR：

awk '
NR==FNR {A[$1]; next}
        {for(i in A)
                if(i ~ "^("$5")$") next
        print}
' RS='[\n|]' file1 RS='\n' file2

剩下的只是替换条件表达式中行的开头和结尾以及括号的锚点。

Answer

在第二个文件中，第五个字段已包含逻辑元素 OR：

awk '
NR==FNR {A[$1]; next}
        {for(i in A)
                if(i ~ "^("$5")$") next
        print}
' RS='[\n|]' file1 RS='\n' file2

剩下的只是替换条件表达式中行的开头和结尾以及括号的锚点。

基于 file1 和 file2 之间的部分匹配来匹配列并打印不匹配的行

答案1

答案2

答案3

答案4

相关内容