我有一个制表符分隔的文件:
TRINITY_DN42298_c0_g1_i1.p1 NA NA
TRINITY_DN12995_c0_g1_i1.p1 PF06799 NA
TRINITY_DN2326_c0_g1_i4.p3 NA NA
TRINITY_DN6047_c0_g1_i1.p1 PF10585 GO:0008641
TRINITY_DN37780_c0_g1_i3.p2 PF00071 GO:0003924,GO:0005525
TRINITY_DN2787_c0_g1_i2.p1 NA NA
TRINITY_DN29879_c0_g1_i3.p1 PF01657 NA
TRINITY_DN72702_c0_g1_i1.p1 PF00498 GO:0005515
TRINITY_DN24890_c0_g1_i7.p1 PF00854 GO:0016020,GO:0022857,GO:0055085
TRINITY_DN46477_c0_g1_i1.p1 PF00069 GO:0004672,GO:0005524,GO:0006468
我想删除第 2 列和第 3 列中都有 NA 的行:
TRINITY_DN12995_c0_g1_i1.p1 PF06799 NA
TRINITY_DN6047_c0_g1_i1.p1 PF10585 GO:0008641
TRINITY_DN37780_c0_g1_i3.p2 PF00071 GO:0003924,GO:0005525
TRINITY_DN29879_c0_g1_i3.p1 PF01657 NA
TRINITY_DN72702_c0_g1_i1.p1 PF00498 GO:0005515
TRINITY_DN24890_c0_g1_i7.p1 PF00854 GO:0016020,GO:0022857,GO:0055085
TRINITY_DN46477_c0_g1_i1.p1 PF00069 GO:0004672,GO:0005524,GO:0006468
尝试过
sed -i '/NA/d' ./file.txt
答案1
你可以试试awk
:
awk -F'\t' '!($2 == "NA" && $3 == "NA")' file
该选项-F
将字段分隔符设置为\t
允许获取第二个和第三个参数并检查它们的内容是否都不是NA
。在这种情况下awk
打印该行。
答案2
您的脚本还会删除仅包含一个 的行NA
,因此只需添加另一行,用分隔符(空格?制表符?比方说[[:space:]]*
)分隔,并将其修复到行的末尾,使其$
成为字段 2 和 3:
sed -i '/NA[[:space:]]*NA$/d' file.txt
答案3
您可以通过多种方式完成此操作,如下所示:
$ grep -vP '^(?:(?!\t).)+\tNA\tNA(?=\t|$)' inp.tsv
$ sed -Ee 'h;s/\t/\n/;s/$/\t/;/\n(NA\t)\1/d;g' inp.tsv
$ perl -F'\t' -lane 'print if 2 != grep { /^NA$/ } @F[1,2]' inp.tsv
$ perl -MList::MoreUtils=any -F'\t' -lane 'print if any { ! /^NA$/ } @F[1,2]' inp.tsv
# fs => field separator set to a TAB
# nT => not TAB
# F => consecutive run of non TABs, a field
$ fs="`echo x | tr x '\011'`"; nT="[^${fs}]"; F="$nT$nT*"
$ sed -e "/^$F${fs}NA${fs}NA\$/d" -e "/^$F${fs}NA${fs}NA${fs}/d";exit
结果:
TRINITY_DN12995_c0_g1_i1.p1 PF06799 NA
TRINITY_DN6047_c0_g1_i1.p1 PF10585 GO:0008641
TRINITY_DN37780_c0_g1_i3.p2 PF00071 GO:0003924,GO:0005525
TRINITY_DN29879_c0_g1_i3.p1 PF01657 NA
TRINITY_DN72702_c0_g1_i1.p1 PF00498 GO:0005515
TRINITY_DN24890_c0_g1_i7.p1 PF00854 GO:0016020,GO:0022857,GO:0055085
TRINITY_DN46477_c0_g1_i1.p1 PF00069 GO:0004672,GO:0005524,GO:0006468
假设:
- 没有前导 TAB
- Unix 风格行尾 => 换行符 =
\012
- 区域设置设置为 LC_ALL=T
- 输入文件可供用户读取
grep
版本支持-P
选项sed
支持非 POSIX 结构\t
,例如\n
RHS 上的 ,(...)
bash|sh
在命令行上运行
答案4
尝试使用下面的命令并且工作正常
命令
awk '$2 != "NA" && $3 != "NA" {print $0}' filename
输出
TRINITY_DN6047_c0_g1_i1.p1 PF10585 GO:0008641
TRINITY_DN37780_c0_g1_i3.p2 PF00071 GO:0003924,GO:0005525
TRINITY_DN72702_c0_g1_i1.p1 PF00498 GO:0005515
TRINITY_DN24890_c0_g1_i7.p1 PF00854 GO:0016020,GO:0022857,GO:0055085
TRINITY_DN46477_c0_g1_i1.p1 PF00069 GO:0004672,GO:0005524,GO:0006468