从硬.log文件中提取信息

从硬.log文件中提取信息

我有一个包含大量信息的大 .log 文件,我只想提取其中的一小部分并将其全部放入不同的输出文件中。

.log 文件的部分示例:

.....
New Water Solv 104: solv=  1.635

Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb


Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

Water Network Score Contributions:

Total       OH2n: -731.606  OH2s: -368.197  CRY: -30.908    ENTR: -94.714   DG:  28.882
Average     OH2n: -12.835   OH2s: -6.460    CRY: -0.542 ENTR: -1.662    DG:  0.507
Summary:     28.882 ( -10.345  39.228 )


Saved WATERFLAP_REFINED2_SCORED_OH2s_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_OH2n_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_ENTROPY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_COMPLEX.pdb

Saved WATERFLAP_REFINED2_SCORED.PDB

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O_ele.pdb

---------------------------
WaterFLAP summary of delta DG between apo and complex

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_CLASS_H2O.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O.pdb


---------------------------

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228

-------------
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004


---------------------------
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

---------------------------

WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
....

文件的结构始终相同,但行数、ID 和数字可能会发生变化。

由此我想获得 4 个不同的输出(也许使用一些始终存在的常量“字符串”,如“ Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb”,“ Water Network Score Contributions”,“ WaterFLAP summary of delta DG between apo and complex”,“ Water Network Score Contributions:”???)

输出1:(在“ Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb”和“ Water Network Score Contributions”之间)

Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

Output2(在“ ”和以或WaterFLAP summary of delta DG between apo and complex开头的第一行之间)WATER_USEDWATER_BOUNDARY

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

WATER_USEDOutput3(从包含or的行开始,WATER_BOUNDARY并在 之前完成Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb

Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5
   

输出4

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

所有输出都应该是 a.txt file并且列之间的分隔符(由space输入文件中的“”定义)应该是"," or 如果它也"space"像输入中那样简单。

我不知道该怎么做。有人可以帮助我应对这个最困难的挑战吗?

答案1

$ cat tst.awk
/^Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb/ { out = "output" 1; next }
/^Water Network Score Contributions/            { out = ""; next }
/^WaterFLAP summary of delta DG/                { out = "output" 2; next }
/^Water_(USED|BOUNDARY)/                        { out = ""; print > ("output" 3) }
/^(Apo|Complex|Net|DD?G)/                       { print > ("output" 4) }
out && NF { print > out }

$ awk -f tst.awk file

$ head out*
==> output1 <==
Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

==> output2 <==
Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

==> output3 <==
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

==> output4 <==
Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

答案2

grep“此处为字符串”file.log

例如:

grep "已保存 WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb" big.log

如果您想在关键字上方或下方添加一行

-A NUM 表示之后 -B NUM 表示之前 -C NUM 表示之前和之后

如果您希望它转到文件,请使用“>”将stdout输出到txt文件

例如:

grep -A 5“已保存的 WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb”big.log > text.txt

相关内容