我有一个如下所示的文件:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
PDB; 5LZW; EM; 3.53 A; ii=1-385.
PDB; 5LZX; EM; 3.67 A; ii=1-385.
PDB; 5LZY; EM; 3.99 A; ii=1-385.
PDB; 5LZZ; EM; 3.47 A; ii=1-385.
我想从这个文件中匹配EM;
在 之后找到的所有元素PDB; (four letter code); EM;
。因此,在此列下X-ray;
可以找到NMR;
或。对于那些有 的行,请将其删除。是否有一些 bash 命令可用于匹配这些元素并删除这些行?EM;
EM;
重要的是,匹配时在 前面放置空格EM
,因此请用空格匹配,例如EM;
。
预期结果是:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
答案1
awk
可以这样做:
awk '{if(!($1=="PDB;"&&$3=="EM;")){print}}' <yourfile
测试当前行的第一列(默认情况下以空格作为分隔符)是否为PDB;
且第三列是否为EM;
,并且仅当两者不为真时才打印该行。
输出
$ awk '{if(!($1=="PDB;"&&$3=="EM;")){print}}' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
答案2
你可以做这样的事情 - 使用 perl 的段落模式:
$ perl -F'\n' -00le 'print join "\n", grep { !/PDB; ....; EM;/ } @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.