如何从另一个文件替换文件中的第三列(直到特定字符)

如何从另一个文件替换文件中的第三列(直到特定字符)

我有两个文件:bor.mol2 和 bor.com 在 bor.mol2 中,有原子的 X、Y、Z 坐标@<TRIPOS>ATOM,并且@<TRIPOS>BOND该文件中的关键字第 3、4 和 5 列是原子的坐标。例如H1坐标为:-0.1660 2.5890 -0.2030

另一个文件是 bor.com,第 2、3 和 4 列是 X、Y、Z 坐标(在 0 1和之间1 2 1.0)。我想用 bor.com 坐标替换 bor.mol2 坐标并获得 boron.mol2 如下所示。

如何使用 awk 或 grep 来做到这一点?

硼摩尔2

@<TRIPOS>MOLECULE
MOL
   19    18     1     0     0
SMALL
resp


@<TRIPOS>ATOM
      1 H1          -0.1660     2.5890    -0.2030 H          1 MOL       0.425234
      2 O1          -0.6950     1.8160    -0.3360 O          1 MOL      -0.740851
      3 B1          -0.0040     0.6800    -0.0410 B          1 MOL       0.916675
      4 O2           1.2760     0.7930     0.3900 O          1 MOL      -0.584834
      5 C1           2.0810    -0.3200     0.7070 C          1 MOL       0.351772
      6 C2           2.8020    -0.8460    -0.5220 C          1 MOL      -0.254733
      7 H2           3.3950    -0.0620    -0.9800 H          1 MOL       0.065923
      8 H3           2.0930    -1.2150    -1.2550 H          1 MOL       0.065923
      9 H4           3.4660    -1.6630    -0.2510 H          1 MOL       0.065923
     10 H5           1.4780    -1.0990     1.1550 H          1 MOL      -0.005252
     11 H6           2.7960     0.0200     1.4450 H          1 MOL      -0.005252
     12 O3          -0.5850    -0.5340    -0.1770 O          1 MOL      -0.584834
     13 C3          -1.9220    -0.7000    -0.5940 C          1 MOL       0.351772
     14 C4          -2.8780    -0.6420     0.5840 C          1 MOL      -0.254733
     15 H7          -3.8950    -0.8350     0.2550 H          1 MOL       0.065923
     16 H8          -2.8530     0.3360     1.0520 H          1 MOL       0.065923
     17 H9          -2.6110    -1.3870     1.3260 H          1 MOL       0.065923
     18 H10         -2.1780     0.0530    -1.3290 H          1 MOL      -0.005252
     19 H11         -1.9740    -1.6700    -1.0740 H          1 MOL      -0.005252
@<TRIPOS>BOND
     1     1     2 1   
     2     2     3 1   
     3     3     4 1   
     4     3    12 1   
     5     4     5 1   
     6     5     6 1   
     7     5    10 1   
     8     5    11 1   
     9     6     7 1   
    10     6     8 1   
    11     6     9 1   
    12    12    13 1   
    13    13    14 1   
    14    13    18 1   
    15    13    19 1   
    16    14    15 1   
    17    14    16 1   
    18    14    17 1   
@<TRIPOS>SUBSTRUCTURE
     1 MOL         1 TEMP              0 ****  ****    0 ROOT

bor.com:

%nprocshared=4
%mem=1GB
# am1 geom=connectivity sp

MOL

0 1
 H                 -0.16720146    2.58919775   -0.19942423
 O                 -0.69500000    1.81600000   -0.33600000
 B                 -0.00400000    0.68000000   -0.04100000
 O                 -0.38867986   -0.48241992   -0.62214658
 C                  0.24973028   -1.71425091   -0.37253088
 C                 -0.34829932   -2.40893346    0.83855738
 H                 -1.41561875   -2.54983268    0.70799890
 H                 -0.18501334   -1.82371627    1.73688335
 H                  0.11053291   -3.38414892    0.98087325
 H                  1.31216868   -1.56088464   -0.23520857
 H                  0.10760188   -2.31766293   -1.25975262
 O                  1.04303104    0.71384972    0.81482310
 C                  1.49768870    1.90335555    1.42093890
 C                  2.50478033    2.62078365    0.54000342
 H                  2.89233808    3.49938587    1.04758216
 H                  2.04460091    2.94105867   -0.38832398
 H                  3.33724896    1.96604913    0.30506364
 H                  0.65864568    2.55057309    1.64429105
 H                  1.95688815    1.61234302    2.35819638

 1 2 1.0
 2 3 1.0
 3 4 1.0 12 1.0
 4 5 1.0
 5 6 1.0 10 1.0 11 1.0
 6 7 1.0 8 1.0 9 1.0
 7
 8
 9
 10
 11
 12 13 1.0
 13 14 1.0 18 1.0 19 1.0
 14 15 1.0 16 1.0 17 1.0
 15
 16
 17
 18
 19

结果必须是这样的 boron.mol2:

@<TRIPOS>MOLECULE
MOL
   19    18     1     0     0
SMALL
resp


@<TRIPOS>ATOM
      1 H1           -0.16720146    2.58919775   -0.19942423 H          1 MOL       0.425234
      2 O1           -0.69500000    1.81600000   -0.33600000 O          1 MOL      -0.740851
      3 B1           -0.00400000    0.68000000   -0.04100000 B          1 MOL       0.916675
      4 O2           -0.38867986   -0.48241992   -0.62214658 O          1 MOL      -0.584834
      5 C1            0.24973028   -1.71425091   -0.37253088 C          1 MOL       0.351772
      6 C2           -0.34829932   -2.40893346    0.83855738 C          1 MOL      -0.254733
      7 H2           -1.41561875   -2.54983268    0.70799890 H          1 MOL       0.065923
      8 H3           -0.18501334   -1.82371627    1.73688335 H          1 MOL       0.065923
      9 H4            0.11053291   -3.38414892    0.98087325 H          1 MOL       0.065923
     10 H5            1.31216868   -1.56088464   -0.23520857 H          1 MOL      -0.005252
     11 H6            0.10760188   -2.31766293   -1.25975262 H          1 MOL      -0.005252
     12 O3            1.04303104    0.71384972    0.81482310 O          1 MOL      -0.584834
     13 C3            1.49768870    1.90335555    1.42093890 C          1 MOL       0.351772
     14 C4            2.50478033    2.62078365    0.54000342 C          1 MOL      -0.254733
     15 H7            2.89233808    3.49938587    1.04758216 H          1 MOL       0.065923
     16 H8            2.04460091    2.94105867   -0.38832398 H          1 MOL       0.065923
     17 H9            3.33724896    1.96604913    0.30506364 H          1 MOL       0.065923
     18 H10           0.65864568    2.55057309    1.64429105 H          1 MOL      -0.005252
     19 H11           1.95688815    1.61234302    2.35819638 H          1 MOL      -0.005252
@<TRIPOS>BOND
     1     1     2 1   
     2     2     3 1   
     3     3     4 1   
     4     3    12 1   
     5     4     5 1   
     6     5     6 1   
     7     5    10 1   
     8     5    11 1   
     9     6     7 1   
    10     6     8 1   
    11     6     9 1   
    12    12    13 1   
    13    13    14 1   
    14    13    18 1   
    15    13    19 1   
    16    14    15 1   
    17    14    16 1   
    18    14    17 1   
@<TRIPOS>SUBSTRUCTURE
     1 MOL         1 TEMP              0 ****  ****    0 ROOT

答案1

我不太了解 awk,所以我使用 sed。

sed -rn '/^ [A-Z]/{H;x;s/^\n//;x};/^ *[0-9]+ +[A-Z]+[0-9]+/{G;s/^( *[^ ]+ +[^ ]+) +[^ ]+ +[^ ]+ +[^ ]+([^\n]+)\n *[^ ]+( *[^ ]+ +[^ ]+ +[^\n]+).*/\1\3\2/;x;s/^[^\n]+\n//;x};/MOLECULE/,$p' bor.com bor.mol2 > boron.mol2

输出:硼摩尔2

@<TRIPOS>MOLECULE
MOL
   19    18     1     0     0
SMALL
resp


@<TRIPOS>ATOM
      1 H1                 -0.16720146    2.58919775   -0.19942423 H          1 MOL       0.425234
      2 O1                 -0.69500000    1.81600000   -0.33600000 O          1 MOL      -0.740851
      3 B1                 -0.00400000    0.68000000   -0.04100000 B          1 MOL       0.916675
      4 O2                 -0.38867986   -0.48241992   -0.62214658 O          1 MOL      -0.584834
      5 C1                  0.24973028   -1.71425091   -0.37253088 C          1 MOL       0.351772
      6 C2                 -0.34829932   -2.40893346    0.83855738 C          1 MOL      -0.254733
      7 H2                 -1.41561875   -2.54983268    0.70799890 H          1 MOL       0.065923
      8 H3                 -0.18501334   -1.82371627    1.73688335 H          1 MOL       0.065923
      9 H4                  0.11053291   -3.38414892    0.98087325 H          1 MOL       0.065923
     10 H5                  1.31216868   -1.56088464   -0.23520857 H          1 MOL      -0.005252
     11 H6                  0.10760188   -2.31766293   -1.25975262 H          1 MOL      -0.005252
     12 O3                  1.04303104    0.71384972    0.81482310 O          1 MOL      -0.584834
     13 C3                  1.49768870    1.90335555    1.42093890 C          1 MOL       0.351772
     14 C4                  2.50478033    2.62078365    0.54000342 C          1 MOL      -0.254733
     15 H7                  2.89233808    3.49938587    1.04758216 H          1 MOL       0.065923
     16 H8                  2.04460091    2.94105867   -0.38832398 H          1 MOL       0.065923
     17 H9                  3.33724896    1.96604913    0.30506364 H          1 MOL       0.065923
     18 H10                  0.65864568    2.55057309    1.64429105 H          1 MOL      -0.005252
     19 H11                  1.95688815    1.61234302    2.35819638 H          1 MOL      -0.005252
@<TRIPOS>BOND
     1     1     2 1   
     2     2     3 1   
     3     3     4 1   
     4     3    12 1   
     5     4     5 1   
     6     5     6 1   
     7     5    10 1   
     8     5    11 1   
     9     6     7 1   
    10     6     8 1   
    11     6     9 1   
    12    12    13 1   
    13    13    14 1   
    14    13    18 1   
    15    13    19 1   
    16    14    15 1   
    17    14    16 1   
    18    14    17 1   
@<TRIPOS>SUBSTRUCTURE
     1 MOL         1 TEMP              0 ****  ****    0 ROOT

答案2

这是一种方法:

$ awk '{
  if(/@<TRIPOS>ATOM/){a=1; print; next}
  if(/@<TRIPOS>BOND/){a=0}
  if(NR==FNR){
      val[FNR][2]=$2;
      val[FNR][3]=$3;
      val[FNR][4]=$4;
  }
  else{
    if(a){
      OFS="\t"
      $3=val[a][2];
      $4=val[a][3];
      $5=val[a][4]
      a++;
    }
    print $0
  }
}' <(grep '^ [A-Z]' bor.com) bor.mol2 

以上返回:

@<TRIPOS>MOLECULE
MOL
   19    18     1     0     0
SMALL
resp


@<TRIPOS>ATOM
1   H1  -0.16720146 2.58919775  -0.19942423 H   1   MOL 0.425234
2   O1  -0.69500000 1.81600000  -0.33600000 O   1   MOL -0.740851
3   B1  -0.00400000 0.68000000  -0.04100000 B   1   MOL 0.916675
4   O2  -0.38867986 -0.48241992 -0.62214658 O   1   MOL -0.584834
5   C1  0.24973028  -1.71425091 -0.37253088 C   1   MOL 0.351772
6   C2  -0.34829932 -2.40893346 0.83855738  C   1   MOL -0.254733
7   H2  -1.41561875 -2.54983268 0.70799890  H   1   MOL 0.065923
8   H3  -0.18501334 -1.82371627 1.73688335  H   1   MOL 0.065923
9   H4  0.11053291  -3.38414892 0.98087325  H   1   MOL 0.065923
10  H5  1.31216868  -1.56088464 -0.23520857 H   1   MOL -0.005252
11  H6  0.10760188  -2.31766293 -1.25975262 H   1   MOL -0.005252
12  O3  1.04303104  0.71384972  0.81482310  O   1   MOL -0.584834
13  C3  1.49768870  1.90335555  1.42093890  C   1   MOL 0.351772
14  C4  2.50478033  2.62078365  0.54000342  C   1   MOL -0.254733
15  H7  2.89233808  3.49938587  1.04758216  H   1   MOL 0.065923
16  H8  2.04460091  2.94105867  -0.38832398 H   1   MOL 0.065923
17  H9  3.33724896  1.96604913  0.30506364  H   1   MOL 0.065923
18  H10 0.65864568  2.55057309  1.64429105  H   1   MOL -0.005252
19  H11 1.95688815  1.61234302  2.35819638  H   1   MOL -0.005252
@<TRIPOS>BOND
     1     1     2 1   
     2     2     3 1   
     3     3     4 1   
     4     3    12 1   
     5     4     5 1   
     6     5     6 1   
     7     5    10 1   
     8     5    11 1   
     9     6     7 1   
    10     6     8 1   
    11     6     9 1   
    12    12    13 1   
    13    13    14 1   
    14    13    18 1   
    15    13    19 1   
    16    14    15 1   
    17    14    16 1   
    18    14    17 1   
@<TRIPOS>SUBSTRUCTURE
     1 MOL         1 TEMP              0 ****  ****    0 ROOT

该命令将仅打印以空格开头然后是大写字母的grep '^ [A-Z]' bor.com那些行。bor.com这些是我们想要的唯一线路bor.com。使用 bash 的构造将输出grep作为文件句柄传递给 awk <()。然后,val如果脚本正在读取的文件是其输入文件中的第一个,则脚本会将值保存在数组中(NR==FNR)如果是第二个,并且如果我们位于@<TRIPOS>ATOM@<TRIPOS>BOND字符串之间,则它将替换第三个、第四个和第五个字段与数组中的值val

相关内容