删除第一列上的第一行重复行

Question 1

awk如果没有任何行具有唯一的第一列，请尝试下面的操作。

awk -F, 'pre==$1 { print; next }{ pre=$1 }' infile

或者在一般情况下改为以下：

awk -F, 'pre==$1 { print; is_uniq=0; next }
                 # print when current& previous lines' 1^st column were same
                 # unset the 'is_uniq=0' variable since duplicated lines found

         is_uniq { print temp }
                 # print if previous line ('temp' variable keep a backup of previous line) is a 
                 # uniq line (according to the first column)

                 { pre=$1; temp=$0; is_uniq=1 }
                 # backup first column and whole line into 'pre' & 'temp' variable respectively
                 # and set the 'is_uinq=1' (assuming might that will be a uniq line)

END{ if(is_uniq) print temp }' infile
    # if there was a line that it's uniq and is the last line of input file, then print it

相同的脚本，无注释：

awk -F, 'pre==$1 { print; is_uniq=0; next }
         is_uniq { print temp }
                 { pre=$1; temp=$0; is_uniq=1 }
END{ if(is_uniq) print temp }' infile

笔记：这假设您的输入文件infile在其第一个字段上排序，如果不是，那么您将需要将排序后的文件传递给

awk ... <(sort -t, -k1,1 infile)

Answer

awk如果没有任何行具有唯一的第一列，请尝试下面的操作。

awk -F, 'pre==$1 { print; next }{ pre=$1 }' infile

或者在一般情况下改为以下：

awk -F, 'pre==$1 { print; is_uniq=0; next }
                 # print when current& previous lines' 1^st column were same
                 # unset the 'is_uniq=0' variable since duplicated lines found

         is_uniq { print temp }
                 # print if previous line ('temp' variable keep a backup of previous line) is a 
                 # uniq line (according to the first column)

                 { pre=$1; temp=$0; is_uniq=1 }
                 # backup first column and whole line into 'pre' & 'temp' variable respectively
                 # and set the 'is_uinq=1' (assuming might that will be a uniq line)

END{ if(is_uniq) print temp }' infile
    # if there was a line that it's uniq and is the last line of input file, then print it

相同的脚本，无注释：

awk -F, 'pre==$1 { print; is_uniq=0; next }
         is_uniq { print temp }
                 { pre=$1; temp=$0; is_uniq=1 }
END{ if(is_uniq) print temp }' infile

笔记：这假设您的输入文件infile在其第一个字段上排序，如果不是，那么您将需要将排序后的文件传递给

awk ... <(sort -t, -k1,1 infile)

Question 2

假设 csv 具有良好的格式（带引号的字段内没有逗号或换行符，没有双引号"( "") 等），您可以使用以下命令：

awk -F ',' 'NR==FNR{seen1[$1]++;next};seen1[$1]==1||seen2[$1]++
            {print(NR,$0)}' infile infile

了解某行是否在文件的任何位置重复的唯一方法是获取某行重复的次数。这是用完成的seen1。然后，如果该行的计数为 1（无重复）或者它已经被看到（文件的第二次扫描）（使用完成seen2），则打印它。

如果文件是已排序通过第一个字段使用@devWeek 解决方案。

Answer

假设 csv 具有良好的格式（带引号的字段内没有逗号或换行符，没有双引号"( "") 等），您可以使用以下命令：

awk -F ',' 'NR==FNR{seen1[$1]++;next};seen1[$1]==1||seen2[$1]++
            {print(NR,$0)}' infile infile

了解某行是否在文件的任何位置重复的唯一方法是获取某行重复的次数。这是用完成的seen1。然后，如果该行的计数为 1（无重复）或者它已经被看到（文件的第二次扫描）（使用完成seen2），则打印它。

如果文件是已排序通过第一个字段使用@devWeek 解决方案。

Question 3

$ cat file
1,a
2,b
2,c
3,d
3,e
3,f
4,g
4,h
5,i

我们要删除“2,b”、“3,d”和“4,g”行：

perl -F, -anE '
    push $lines{$F[0]}->@*, $_ 
  } END { 
    for $key (sort keys %lines) {
        shift $lines{$key}->@* if (scalar($lines{$key}->@*) > 1); # remove the first
        print join "", $lines{$key}->@*;
    }
' file

1,a
2,c
3,e
3,f
4,h
5,i

Answer

$ cat file
1,a
2,b
2,c
3,d
3,e
3,f
4,g
4,h
5,i

我们要删除“2,b”、“3,d”和“4,g”行：

perl -F, -anE '
    push $lines{$F[0]}->@*, $_ 
  } END { 
    for $key (sort keys %lines) {
        shift $lines{$key}->@* if (scalar($lines{$key}->@*) > 1); # remove the first
        print join "", $lines{$key}->@*;
    }
' file

1,a
2,c
3,e
3,f
4,h
5,i

删除第一列上的第一行重复行

答案1

答案2

答案3

相关内容