如果对于相同的第一个字段，所有行上的第二个字段都有一个值，则打印

Question 1

这是一种方法awk：

awk 'NR==FNR{if (x[$1]++){if ($2!=t){z[$1]++}} else {t=$2};
next}!($1 in z)' infile infile

此过程两次处理文件 - 第一个传递检查当第一个字段是相同值时第二个字段是否有不同的值 - 如果是这样，它用作$1数组索引，然后在第二个传递中，仅当第一个字段不是该字段的索引时才打印大批。
或者，如果您不介意使用sortwith awk：

sort -u infile | awk 'NR==FNR{seen[$1]++;next}seen[$1]==1' - infile

sort -u从文件中删除重复的行，并将结果通过管道传输到awk计算第一个字段出现的次数，然后再次处理整个文件，如果计数为，则打印行1。

Answer

这是一种方法awk：

awk 'NR==FNR{if (x[$1]++){if ($2!=t){z[$1]++}} else {t=$2};
next}!($1 in z)' infile infile

此过程两次处理文件 - 第一个传递检查当第一个字段是相同值时第二个字段是否有不同的值 - 如果是这样，它用作$1数组索引，然后在第二个传递中，仅当第一个字段不是该字段的索引时才打印大批。
或者，如果您不介意使用sortwith awk：

sort -u infile | awk 'NR==FNR{seen[$1]++;next}seen[$1]==1' - infile

sort -u从文件中删除重复的行，并将结果通过管道传输到awk计算第一个字段出现的次数，然后再次处理整个文件，如果计数为，则打印行1。

Question 2

sed -e '
   # this is a do-while loop which collects lines till the time the first
   # field remains the same. We break out of the loop when we see
   # a line whose 1st field != prev lines 1st field **OR** we hit the eof.
  :a
     $bb
     N
  /^\(\S\+\) .\(\n\1 .\)*$/ba

  :b

  # all equal
  # **Action:** Print and quit

  /^\(\S\+ .\)\(\n\1\)*$/q


  # all same 1st fld, but lines unequal, otw would have matched above
  # **Action:** Drop the whole block as its uninteresting

  /^\(\S\+\) .\(\n\1 .\)*$/d


  # all equal, and trailing line part of next line
  # **Action:** Display upto the last newline and restart 
  # with the trailing portion

  /^\(\(\S\+ .\)\(\n\2\)*\)\n[^\n]*$/{
     h
     s//\1/p   
     g
  }


  # of same 1st fld but some lines unequal, and trailing portion has
  # next line
  # **Action:** strip till the last newline, and restart over with the
  # trailing part

  s/.*\(\n\)/\1/
  D
' yourfile

这是“Sed”要解决的一个非常有趣的问题。然而，我发现缺少的是 OT 在 SE 上提供的更好的或者应该说更大的输入集。我的建议是，可以将实际规模和种类的测试用例放置在 htttp:/pastebin 站点上，这对于此类事情非常有用。

Answer

sed -e '
   # this is a do-while loop which collects lines till the time the first
   # field remains the same. We break out of the loop when we see
   # a line whose 1st field != prev lines 1st field **OR** we hit the eof.
  :a
     $bb
     N
  /^\(\S\+\) .\(\n\1 .\)*$/ba

  :b

  # all equal
  # **Action:** Print and quit

  /^\(\S\+ .\)\(\n\1\)*$/q


  # all same 1st fld, but lines unequal, otw would have matched above
  # **Action:** Drop the whole block as its uninteresting

  /^\(\S\+\) .\(\n\1 .\)*$/d


  # all equal, and trailing line part of next line
  # **Action:** Display upto the last newline and restart 
  # with the trailing portion

  /^\(\(\S\+ .\)\(\n\2\)*\)\n[^\n]*$/{
     h
     s//\1/p   
     g
  }


  # of same 1st fld but some lines unequal, and trailing portion has
  # next line
  # **Action:** strip till the last newline, and restart over with the
  # trailing part

  s/.*\(\n\)/\1/
  D
' yourfile

这是“Sed”要解决的一个非常有趣的问题。然而，我发现缺少的是 OT 在 SE 上提供的更好的或者应该说更大的输入集。我的建议是，可以将实际规模和种类的测试用例放置在 htttp:/pastebin 站点上，这对于此类事情非常有用。

Question 3

如果您有权访问GNU 数据混合，那么您可以按如下方式折叠数据：

datamash -W groupby 1 countunique 2 collapse 2 < file 
A   1   T,T,T
B   2   T,T,F
C   1   F,F
D   2   F,T,F

这使得后处理变得简单，awk例如：

datamash -W groupby 1 countunique 2 collapse 2 < file | 
  awk '$2==1 {n = split($3,a,","); for (i=1;i<=n;i++) print $1, a[i]}'
A T
A T
A T
C F
C F

Answer

如果您有权访问GNU 数据混合，那么您可以按如下方式折叠数据：

datamash -W groupby 1 countunique 2 collapse 2 < file 
A   1   T,T,T
B   2   T,T,F
C   1   F,F
D   2   F,T,F

这使得后处理变得简单，awk例如：

datamash -W groupby 1 countunique 2 collapse 2 < file | 
  awk '$2==1 {n = split($3,a,","); for (i=1;i<=n;i++) print $1, a[i]}'
A T
A T
A T
C F
C F

Question 4

sed '
    /\n/D
    :1
    $! {
        N
        /^\(\S\+\s\).*\n\1[^\n]\+$/ b1
    }
    /^\([^\n]\+\n\)\(\1\)\+[^\n]\+$/! D
    h
    s/\n[^\n]\+$//p
    g
    s/.*\n/\n/
    D
    ' file

Answer

sed '
    /\n/D
    :1
    $! {
        N
        /^\(\S\+\s\).*\n\1[^\n]\+$/ b1
    }
    /^\([^\n]\+\n\)\(\1\)\+[^\n]\+$/! D
    h
    s/\n[^\n]\+$//p
    g
    s/.*\n/\n/
    D
    ' file

如果对于相同的第一个字段，所有行上的第二个字段都有一个值，则打印

答案1

答案2

答案3

答案4

相关内容