我有一张桌子
A 1
A 1
A 1
A 1
A 1
A 1
A 2
B 1
B 1
B 1
B 2
B 1
B 1
B 1
我想打印第 1 列的行,其中第 2 列的值至少比同一列中上面的第 3 步和下面的第 3 步大 2 倍。但是,仅考虑第 1 列中具有相同名称的行。
因此,输出应该是
B
我想修改 Stéphane Chazelas 编写的这个脚本,以满足上面粗体部分的附加要求。
awk -v key=1 -v value=2 '
NR > 6 {
x = saved_value[NR%6]; y = saved_value[(NR - 3) % 6]; z = $value
if (y >= 2*x && y >= 2*z) print saved_key[(NR - 3) % 6]
}
{saved_key[NR % 6] = $key; saved_value[NR % 6] = $value}' < file
(这实际上是一个帖子继续这里。由于情况比较复杂,我想在这里更好地说明一下。)
。 。 。
20171010更新:
我现在正在修改 Stéphane Chazelas 编写的脚本,但现在我选择的行的值是至少比上面第三个值和下面第三个值小 2 倍。之前我简化了示例,因此我可以自己理解并修改脚本v2 <= v1/2 && v2 <= v3/2
,但再次失败......为了让事情更直接,我现在提供真实文件如下,其中第二列中的值是无用和将比较第三列中的值:
K00188:14:H2LMFBBXX:6:1101:27440:1668 1 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 2 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 3 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 4 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 5 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 6 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 7 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 8 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 9 1
K00188:14:H2LMFBBXX:6:1101:27440:1668 10 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 1 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 2 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 3 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 4 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 5 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 6 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 7 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 8 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 9 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 10 2
如果打印整行,则预期输出为:
K00188:14:H2LMFBBXX:6:1101:6501:1686 4 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 5 1
K00188:14:H2LMFBBXX:6:1101:6501:1686 6 1
这是我失败的修改:
awk -v key=1 -v value=3 '
NR > 6 {
k1 = saved_key[NR%6]; k2 = saved_key[(NR - 3) % 6]; k3 = $key
v1 = saved_value[NR%6]; v2 = saved_value[(NR - 3) % 6]; v3 = $value
if (k1 == k2 && k2 == k3 && v2 <= v1/2 && v2 <= v3/2) print $0
}
{saved_key[NR % 6] = $key; saved_value[NR % 6] = $value}' < test
我该如何纠正它?
。 。 。
20171011更新:
我怎样才能添加一个额外的密钥,这样我就可以将第 3 列中的值与第 4 列上方和下方的第三个值(即不同的列)进行比较?请参考20171011更新。再次感谢!
K00188:14:H2LMFBBXX:6:1101:27440:1668 1 0 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 2 0 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 3 0 2
K00188:14:H2LMFBBXX:6:1101:27440:1668 4 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 5 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 6 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 7 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 8 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 9 1 0
K00188:14:H2LMFBBXX:6:1101:27440:1668 10 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 1 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 2 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 3 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 4 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 5 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 6 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 7 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 8 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 9 0 2
K00188:14:H2LMFBBXX:6:1101:6501:1686 10 0 2
如果打印整行,则预期输出为:
K00188:14:H2LMFBBXX:6:1101:6501:1686 4 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 5 1 0
K00188:14:H2LMFBBXX:6:1101:6501:1686 6 1 0
这是我的审判:
awk -v key1=1 -v key2=2 -v value1=3 -v value2=4 '
{
k1 = saved_key1[NR%6]; k2 = saved_key1[(NR - 3) % 6]; k3 = $key1
k4 = saved_key2[NR%6]; k5 = saved_key2[(NR - 3) % 6]; k6 = $key2
v1 = saved_value1[NR%6]; v2 = saved_value1[(NR - 3) % 6]; v3 = $value1
v4 = saved_value2[NR%6]; v5 = saved_value2[(NR - 3) % 6]; v6 = $value2
if (k1 == k2 && k2 == k3 && v2 <= v4/2 && v2 <= v6/2) print saved_record[(NR-3)%6]
}
{saved_key1[NR % 6] = $key1; saved_value1[NR % 6] = $value1}' < file
答案1
那么就会是:
awk -v key=1 -v value=2 '
NR > 6 { # for 7th record and over only
k1 = saved_key[NR%6]; k2 = saved_key[(NR - 3) % 6]; k3 = $key
v1 = saved_value[NR%6]; v2 = saved_value[(NR - 3) % 6]; v3 = $value
if (k1 == k2 && k2 == k3 && v2 >= 2*v1 && v2 >= 2*v3) print k2
}
# for every record, save key and value in ring buffers:
{saved_key[NR % 6] = $key; saved_value[NR % 6] = $value}'
请注意,如果值看起来像数字(因此将被视为与 相同),k1 == k2
则 和 的比较将是数字的,否则是文本的。更改为强制进行文本比较。k2 == k3
00
0
k1 "" == k2
或者保存整个记录并在检查时重新分割。喜欢你的20171010更新:
awk -v key=1 -v value=3 '
NR > 6 {
# "above" is an array with the fields of 6th last record
split(saved_record[NR%6], above)
# "text" is the 3rd last record and the one we will be looking at
text = saved_record[(NR - 3) % 6]
# "text" fields split into the "here" array.
split(text, here)
# $0 contains the current record (the one 3 lines below "here")
# and $1, $2, $3... the fields of that record.
if (above[key] == here[key] && here[key] == $key && \
here[value] <= above[value] / 2 && here[value] <= $value / 2)
print text
}
{saved_record[NR % 6] = $0}'
答案2
GNU 相对较短datamash
+awk
解决方案:
datamash -W -g1 count 2 collapse 2 <file | awk '$2==7{ split($3,a,","); k=a[4];
delete a[4]; if(k>=a[7]*2) print $1 }'