我有输入文件(用 -t 按第 2 列排序):
TOP,25424242,T0137,0.08,0.06,0.02,24
TOP,25424242,T0138,0.07,0.06,0.01,24
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,23525255,T0137,0.40,0.30,0.11,24
TOP,23525255,T0138,0.08,0.07,0.01,24
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,75856354,T0137,0.18,0.17,0.01,36
TOP,75856354,T0138,0.18,0.17,0.01,26
TOP,42401990,T0137,0.06,0.05,0.01,24
我想删除在第 2 列中具有相同值的每两行,因此最终只获得在字段 2 中具有唯一值的行 - 从上面的示例来看,它将是:
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
答案1
这应该有效:
$ awk -F, '{a[$2]=$0; b[$2]++;} END{for(i in a){if(b[i]==1){print a[i]}}}' file
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
答案2
短的uniq
当前输入结构的技巧(前两个字段长度为静态):
uniq -s4 -w8 -u file
-s4
- 跳过前 4 个字符(即TOP,
)-w8
- 比较行中不超过 8 个字符-u
- 只打印独特的线条
输出:
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
答案3
您可以使用 awk 来实现此目的:
for k in `awk -F "," '{print $2}' file.txt | uniq -D`; do
sed -i '/'$k'/d' file.txt;
done
输出
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24