如果两个字段具有相同的值,如何打印行?

如果两个字段具有相同的值,如何打印行?

我是 unix 新手,我有一个关于数据子集的问题,我将不胜感激任何人的帮助。我有 23G 输入文件,包含数百万行,但我只想保留第一列和第四列相同的行(支架名称)。这是我的数据集的前几行:

tscaffold94_798049_802097   999 NA tscaffold94_798049_802097   999 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1029 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1044 NA -0.0463767871013283
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1045 NA -0.939576278422824
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1130 NA -0.0831304705346077
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1180 NA -0.931681175211672
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1187 NA -0.940010336852543
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1202 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1224 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1269 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1313 NA -0.201478578143067
tscaffold94_798049_802097   999 NA tscaffold94_798049_802097  1384 NA 1
tscaffold94_798049_802097   999 NA tscaffold94_878564_884314  3259 NA -0.595441932439136
tscaffold94_798049_802097   999 NA tscaffold94_878564_884314  3304 NA 0.745699172241005
tscaffold94_798049_802097   999 NA tscaffold94_878564_884314  3319 NA -0.570318634275133
tscaffold94_798049_802097   999 NA tscaffold94_878564_884314  3588 NA -0.60363963711489

答案1

awk在这种情况下是你的朋友;这些列成为脚本中的变量awk,因此很容易检查是否等价,并执行打印等操作(隐含当前行)

awk '{if($1 == $4) print}' < input

相关内容