我有一个包含多个“块”(此例中为三个块)的文件,如下所示:
A4_RAT Amyloid-beta A4 protein; P08592 PDB; 1M7E; X-ray; 2.45 A; D/E/F=755-763.
PDB; 1NMJ; NMR; -; A=672-699.
PDB; 1OQN_I3P.pdb; X-ray; 2.30 A; C/D=755-763.
PDB; 2LI9; NMR; -; A/B=672-687.
AACP_AGRFC Aminoacyl carrier protein; A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
AADB1_KLEPN 2''-aminoglycoside nucleotidyltransferase; P0AE05 PDB; 4WQK; X-ray; 1.48 A; A=1-177.
PDB; 4WQL; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ_GOL.pdb; NMR; -; A=1-177.
每个“块”都有PDB;
、XXXX/XXXX_XXX.pdb;
、Xray/NMR;
列。从上一句的最后两列(XXXX;
或XXXX_XXX.pdb;
和X-ray; or NMR;
)中,我在每个块中遇到了几个选项:
XXXX;
X-ray;
XXXX_XXX.pdb;
X-ray;
XXXX;
NMR;
XXXX_XXX.pdb;
NMR;
我正在尝试仅搜索仅具有 的那些“块”,XXXX_XXX.pdb; X-ray;
并且仅搜索仅具有 的那些“块” XXXX_XXX.pdb; NMR;
。
从这里的例子来看,如果我搜索那些只有XXXX_XXX.pdb; X-ray;
我期望结果是:
A4_RAT Amyloid-beta A4 protein; P08592 PDB; 1M7E; X-ray; 2.45 A; D/E/F=755-763.
PDB; 1NMJ; NMR; -; A=672-699.
PDB; 1OQN_I3P.pdb; X-ray; 2.30 A; C/D=755-763.
PDB; 2LI9; NMR; -; A/B=672-687.
AACP_AGRFC Aminoacyl carrier protein; A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
另一方面,如果我搜索那些只有XXXX_XXX.pdb; NMR;
我期望结果是:
AADB1_KLEPN 2''-aminoglycoside nucleotidyltransferase; P0AE05 PDB; 4WQK; X-ray; 1.48 A; A=1-177.
PDB; 4WQL; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ_GOL.pdb; NMR; -; A=1-177.
有人知道如何在 bash 中做到这一点吗?
答案1
假设如您在问题中所示,每个块之间都有空行,因此:
$ awk -v RS='\n\n' '/...._...\.pdb; NMR;/' RS= infile
AADB1_KLEPN 2''-aminoglycoside nucleotidyltransferase; P0AE05 PDB; 4WQK; X-ray; 1.48 A; A=1-177.
PDB; 4WQL; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ_GOL.pdb; NMR; -; A=1-177.
$ awk -v RS='\n\n' '/...._...\.pdb; X-ray;/' RS= infile
A4_RAT Amyloid-beta A4 protein; P08592 PDB; 1M7E; X-ray; 2.45 A; D/E/F=755-763.
PDB; 1NMJ; NMR; -; A=672-699.
PDB; 1OQN_I3P.pdb; X-ray; 2.30 A; C/D=755-763.
PDB; 2LI9; NMR; -; A/B=672-687.
AACP_AGRFC Aminoacyl carrier protein; A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
要保留空行,请删除RS=
并添加语句{print $0"\n"}
:
$ awk -v RS='\n\n' '/...._...\.pdb; NMR;/{print $0"\n"}' infile
$ awk -v RS='\n\n' '/...._...\.pdb; X-ray;/{print $0"\n"}' infile