检查每个“块”是否包含少于 5 行特定字符

检查每个“块”是否包含少于 5 行特定字符

我有一个包含 5 个“块”的文件,如下所示:

AACP_AGRFC  Agrobacterium fabrum    A9CHM9  PDB; 2JQ4; NMR; -; A=1-83.
                    PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
                    PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
                    PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.

AADB1_KLEPN Klebsiella pneumoniae.  P0AE05  PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
                    PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
                    PDB; 5KQJ; NMR; -; A=1-177.

AAKB2_RAT   Rattus norvegicus   Q9QZH4  PDB; 2LU3; NMR; -; A=67-163.
                    PDB; 2LU4; NMR; -; A=67-163.
                    PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
                    PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.

AAPK2_HUMAN Homo sapiens    P54646  PDB; 2H6D; X-ray; 1.85 A; A=6-279.
                    PDB; 2LTU; NMR; -; A=282-339.
                    PDB; 2YZA; X-ray; 3.02 A; A=6-279.
                    PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
                    PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
                    PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
                    PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
                    PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
                    PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens    Q9UH17  PDB; 2NBQ; NMR; -; A=187-382.
                    PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
                    PDB; 5CQH; X-ray; 1.73 A; A=187-378.
                    PDB; 5CQI; X-ray; 1.68 A; A=187-378.
                    PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
                    PDB; 5TD5; X-ray; 1.72 A; A=187-378.
                    PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.

每行的大小不同,但我们只查找特定的列,我们查看的是X-ray和 的列NMR(它们总是在同一列),我们想检查每个“块”下是否>=5有该列下的行X-ray。如果是的话,我们想打印该块。如果不是的话,我们想删除整个块。所以预期结果应该是这样的:

AAPK2_HUMAN Homo sapiens    P54646  PDB; 2H6D; X-ray; 1.85 A; A=6-279.
                    PDB; 2LTU; NMR; -; A=282-339.
                    PDB; 2YZA; X-ray; 3.02 A; A=6-279.
                    PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
                    PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
                    PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
                    PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
                    PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
                    PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens    Q9UH17  PDB; 2NBQ; NMR; -; A=187-382.
                    PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
                    PDB; 5CQH; X-ray; 1.73 A; A=187-378.
                    PDB; 5CQI; X-ray; 1.68 A; A=187-378.
                    PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
                    PDB; 5TD5; X-ray; 1.72 A; A=187-378.
                    PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.

PS. 我们不能将;其作为列的分隔符,但我们知道这些列X-rayNMR所在的位置始终是PDB; XXXX(.pdb); X-ray or NMR

有人知道如何在 bash 中实现这一点吗?谢谢

答案1

假设你的标准可以表示为与正则表达式匹配的行数,/PDB; [^;]*; X-ray/你可以这样做

awk -vRS= -F'\n' '
  {c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0} c >= 5
'

或者(在我看来,稍微简洁一些)

perl -F'\n' -00ne 'print unless (grep { /PDB; [^;]*; X-ray/ } @F) < 5'

前任。

$ perl -F'\n' -00ne 'print unless (grep { /PDB; [^;]*; X-ray/ } @F) < 5' file
AAPK2_HUMAN Homo sapiens    P54646  PDB; 2H6D; X-ray; 1.85 A; A=6-279.
                    PDB; 2LTU; NMR; -; A=282-339.
                    PDB; 2YZA; X-ray; 3.02 A; A=6-279.
                    PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
                    PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
                    PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
                    PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
                    PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
                    PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens    Q9UH17  PDB; 2NBQ; NMR; -; A=187-382.
                    PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
                    PDB; 5CQH; X-ray; 1.73 A; A=187-378.
                    PDB; 5CQI; X-ray; 1.68 A; A=187-378.
                    PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
                    PDB; 5TD5; X-ray; 1.72 A; A=187-378.
                    PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.

相关内容