我有一个包含大量信息的大 .log 文件,我只想提取其中的一小部分并将其全部放入不同的输出文件中。
.log 文件的部分示例:
.....
New Water Solv 104: solv= 1.635
Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
Water Network Score Contributions:
Total OH2n: -731.606 OH2s: -368.197 CRY: -30.908 ENTR: -94.714 DG: 28.882
Average OH2n: -12.835 OH2s: -6.460 CRY: -0.542 ENTR: -1.662 DG: 0.507
Summary: 28.882 ( -10.345 39.228 )
Saved WATERFLAP_REFINED2_SCORED_OH2s_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_OH2n_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_DRY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CRY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_ENTROPY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_COMPLEX.pdb
Saved WATERFLAP_REFINED2_SCORED.PDB
Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O_ele.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O_ele.pdb
---------------------------
WaterFLAP summary of delta DG between apo and complex
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb
Saved WATERFLAP_Delta_DG_DG_WAT_H2O_ele.pdb
Saved WATERFLAP_Delta_DG_CLASS_H2O.pdb
Saved WATERFLAP_Delta_DG_DG_WAT_H2O.pdb
---------------------------
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
-------------
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
---------------------------
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
---------------------------
WARNING: Setting ATOM parms from HETATM table
Atm: CA Q: 0.08
WARNING: Setting ATOM parms from HETATM table
Atm: CA Q: 0.08
....
文件的结构始终相同,但行数、ID 和数字可能会发生变化。
由此我想获得 4 个不同的输出(也许使用一些始终存在的常量“字符串”,如“ Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
”,“ Water Network Score Contributions
”,“ WaterFLAP summary of delta DG between apo and complex
”,“ Water Network Score Contributions:
”???)
输出1:(在“ Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
”和“ Water Network Score Contributions
”之间)
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
Output2(在“ ”和以或WaterFLAP summary of delta DG between apo and complex
开头的第一行之间)WATER_USED
WATER_BOUNDARY
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
WATER_USED
Output3(从包含or的行开始,WATER_BOUNDARY
并在 之前完成Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb
)
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
输出4
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
所有输出都应该是 a.txt file
并且列之间的分隔符(由space
输入文件中的“”定义)应该是","
or 如果它也"space"
像输入中那样简单。
我不知道该怎么做。有人可以帮助我应对这个最困难的挑战吗?
答案1
$ cat tst.awk
/^Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb/ { out = "output" 1; next }
/^Water Network Score Contributions/ { out = ""; next }
/^WaterFLAP summary of delta DG/ { out = "output" 2; next }
/^Water_(USED|BOUNDARY)/ { out = ""; print > ("output" 3) }
/^(Apo|Complex|Net|DD?G)/ { print > ("output" 4) }
out && NF { print > out }
$ awk -f tst.awk file
$ head out*
==> output1 <==
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
==> output2 <==
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
==> output3 <==
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
==> output4 <==
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
答案2
grep“此处为字符串”file.log
例如:
grep "已保存 WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb" big.log
如果您想在关键字上方或下方添加一行
-A NUM 表示之后 -B NUM 表示之前 -C NUM 表示之前和之后
如果您希望它转到文件,请使用“>”将stdout输出到txt文件
例如:
grep -A 5“已保存的 WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb”big.log > text.txt