如何使用awk从列中提取特定代码?

如何使用awk从列中提取特定代码?

final.txt我有一个名为如下的文本文件:

name_00000001   name_000001 -   u   q1:MSTRG.4|MSTRG.4.1|3|0.000000|0.000000|0.000000|3211
name_00000002   name_000001 -   u   q1:MSTRG.4|MSTRG.4.2|2|0.000000|0.000000|0.000000|894
name_00000003   name_000001 -   p   q1:MSTRG.4|MSTRG.4.3|2|0.000000|0.000000|0.000000|522
name_00000004   name_000002 -   p   q1:MSTRG.26|MSTRG.26.1|1|0.000000|0.000000|0.000000|336
name_00000005   name_000003 -   u   q1:MSTRG.27|MSTRG.27.1|5|0.000000|0.000000|0.000000|730
name_00000006   name_000003 -   k   q1:MSTRG.27|MSTRG.27.2|7|0.000000|0.000000|0.000000|3157
name_00000007   name_000003 -   k   q1:MSTRG.27|MSTRG.27.3|6|0.000000|0.000000|0.000000|3665
name_00000008   name_000003 -   u   q1:MSTRG.27|MSTRG.27.4|4|0.000000|0.000000|0.000000|7900
name_00000009   name_000003 -   u   q1:MSTRG.27|MSTRG.27.5|4|0.000000|0.000000|0.000000|4356
name_00000010   name_000003 -   k   q1:MSTRG.27|MSTRG.27.6|4|0.000000|0.000000|0.000000|1842
name_00000011   name_000003 -   u   q1:MSTRG.27|MSTRG.27.7|3|0.000000|0.000000|0.000000|2752
name_00000012   name_000003 -   p   q1:MSTRG.27|MSTRG.27.8|2|0.000000|0.000000|0.000000|300
name_00000013   name_000003 -   u   q1:MSTRG.27|MSTRG.27.9|2|0.000000|0.000000|0.000000|2895
name_00000014   name_000003 -   k   q1:MSTRG.27|MSTRG.27.10|2|0.000000|0.000000|0.000000|696
name_00000015   name_000003 -   u   q1:MSTRG.27|MSTRG.27.11|4|0.000000|0.000000|0.000000|9046
name_00000016   name_000003 -   u   q1:MSTRG.27|MSTRG.27.12|5|0.000000|0.000000|0.000000|9962
name_00000017   name_000003 -   u   q1:MSTRG.27|MSTRG.27.13|3|0.000000|0.000000|0.000000|17753
name_00000018   name_000003 -   l   q1:MSTRG.27|MSTRG.27.14|2|0.000000|0.000000|0.000000|6895
name_00000019   name_000003 -   l   q1:MSTRG.27|MSTRG.27.15|4|0.000000|0.000000|0.000000|1889
name_00000020   name_000003 -   l   q1:MSTRG.27|MSTRG.27.16|4|0.000000|0.000000|0.000000|4712
name_00000021   name_000003 -   u   q1:MSTRG.27|MSTRG.27.17|3|0.000000|0.000000|0.000000|1154
name_00000022   name_000003 -   u   q1:MSTRG.27|MSTRG.27.18|2|0.000000|0.000000|0.000000|511
name_00000023   name_000003 -   x   q1:MSTRG.27|MSTRG.27.19|3|0.000000|0.000000|0.000000|2984
name_00000024   name_000003 -   u   q1:MSTRG.27|MSTRG.27.20|2|0.000000|0.000000|0.000000|4944
name_00000025   name_000003 -   x   q1:MSTRG.32|MSTRG.32.1|1|0.000000|0.000000|0.000000|279
name_00000026   name_000003 -   x   q1:MSTRG.33|MSTRG.33.1|2|0.000000|0.000000|0.000000|543
name_00000027   name_000003 -   u   q1:MSTRG.34|MSTRG.34.1|2|0.000000|0.000000|0.000000|664
name_00000028   name_000003 -   u   q1:MSTRG.35|MSTRG.35.1|1|0.000000|0.000000|0.000000|3875
name_00000029   name_000003 -   o   q1:MSTRG.36|MSTRG.36.1|2|0.000000|0.000000|0.000000|969
name_00000030   name_000003 -   o   q1:MSTRG.27|MSTRG.27.21|2|0.000000|0.000000|0.000000|5750
name_00000031   name_000004 -   t   q1:MSTRG.27|MSTRG.27.22|3|0.000000|0.000000|0.000000|3425
name_00000032   name_000005 -   t   q1:MSTRG.27|MSTRG.27.24|3|0.000000|0.000000|0.000000|3403
name_00000033   name_000006 -   o   q1:MSTRG.27|MSTRG.27.23|3|0.000000|0.000000|0.000000|921
name_00000034   name_000007 -   u   q1:MSTRG.38|MSTRG.38.1|2|0.000000|0.000000|0.000000|222

在第四列中有不同的代码,例如u, p, k, l, x, o, t,因此,从这个特定列中,我只想提取类似的代码u, o, t, x, p

我尝试提取第四列中的代码之一的所有行,如下所示:

cat final.txt | awk '$4=="u"{print $0}' > new.txt

在同一个命令中我怎样才能提取其他代码?

答案1

您可以使用正则表达式来匹配该字段:

awk '$4 ~ /^[uotxp]$/' final.txt > new.txt

默认操作会打印当前记录,因此您不需要编写{ print $0 }.

相关内容