删除给定列中具有相同值的行

删除给定列中具有相同值的行

我有输入文件(用 -t 按第 2 列排序):

TOP,25424242,T0137,0.08,0.06,0.02,24
TOP,25424242,T0138,0.07,0.06,0.01,24
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,23525255,T0137,0.40,0.30,0.11,24
TOP,23525255,T0138,0.08,0.07,0.01,24
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,75856354,T0137,0.18,0.17,0.01,36
TOP,75856354,T0138,0.18,0.17,0.01,26
TOP,42401990,T0137,0.06,0.05,0.01,24

我想删除在第 2 列中具有相同值的每两行,因此最终只获得在字段 2 中具有唯一值的行 - 从上面的示例来看,它将是:

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24

答案1

这应该有效:

 $ awk -F, '{a[$2]=$0; b[$2]++;} END{for(i in a){if(b[i]==1){print a[i]}}}' file
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24

答案2

短的uniq当前输入结构的技巧(前两个字段长度为静态):

uniq -s4 -w8 -u file
  • -s4- 跳过前 4 个字符(即TOP,
  • -w8- 比较行中不超过 8 个字符
  • -u- 只打印独特的线条

输出:

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24

答案3

您可以使用 awk 来实现此目的:

for k in `awk -F "," '{print $2}' file.txt | uniq -D`; do
  sed -i '/'$k'/d' file.txt;
done

输出

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24

相关内容