删除第二列之前的空格以保持 pdb 格式

删除第二列之前的空格以保持 pdb 格式
ATOM   9996  CG  GLU   622     -13.525  -7.714 -11.215  0.0136  1.9080  0.1094
ATOM   9997 2HG  GLU   622     -12.773  -7.608 -11.999 -0.0425  1.4870  0.0157
ATOM   9998 3HG  GLU   622     -13.121  -8.370 -10.441 -0.0425  1.4870  0.0157
ATOM   9999  CD  GLU   622     -14.803  -8.348 -11.783  0.8054  1.9080  0.0860
ATOM   10000  OE1 GLU   622     -15.541  -9.019 -11.024 -0.8188  1.6612  0.2100
ATOM   10001  OE2 GLU   622     -15.105  -8.223 -12.988 -0.8188  1.6612  0.2100
ATOM   10002  C   GLU   622     -13.072  -4.215  -9.499  0.5366  1.9080  0.0860
ATOM   10003  O   GLU   622     -13.537  -3.437 -10.330 -0.5819  1.6612  0.2100
ATOM   10004  N   TYR   623     -12.988  -3.858  -8.210 -0.4157  1.8240  0.1700
ATOM   10005  H   TYR   623     -12.684  -4.551  -7.536  0.2719  0.6000  0.0157
ATOM   10006  CA  TYR   623     -13.410  -2.540  -7.700 -0.0014  1.9080  0.1094
ATOM   10007  HA  TYR   623     -13.794  -1.927  -8.513  0.0876  1.3870  0.0157
ATOM   10008  CB  TYR   623     -14.530  -2.720  -6.667 -0.0152  1.9080  0.1094
ATOM   10009 2HB  TYR   623     -14.107  -3.312  -5.863  0.0295  1.4870  0.0157
ATOM   10010 3HB  TYR   623     -14.784  -1.738  -6.265  0.0295  1.4870  0.0157
ATOM   10011  CG  TYR   623     -15.831  -3.390  -7.081 -0.0011  1.9080  0.0860
ATOM   10012  CD1 TYR   623     -16.301  -3.357  -8.410 -0.1906  1.9080  0.0860

如上面的格式所示,后面的行10000 ATOM以一个空格移位的方式显示。如何专门删除文件中第 10000 行之后第二列之前的一个空格?

输出应该是这样的:

ATOM   9995 3HB  GLU   622     -14.203  -5.702 -11.411 -0.0173  1.4870  0.0157
ATOM   9996  CG  GLU   622     -13.525  -7.714 -11.215  0.0136  1.9080  0.1094
ATOM   9997 2HG  GLU   622     -12.773  -7.608 -11.999 -0.0425  1.4870  0.0157
ATOM   9998 3HG  GLU   622     -13.121  -8.370 -10.441 -0.0425  1.4870  0.0157
ATOM   9999  CD  GLU   622     -14.803  -8.348 -11.783  0.8054  1.9080  0.0860
ATOM  10000  OE1 GLU   622     -15.541  -9.019 -11.024 -0.8188  1.6612  0.2100
ATOM  10001  OE2 GLU   622     -15.105  -8.223 -12.988 -0.8188  1.6612  0.2100
ATOM  10002  C   GLU   622     -13.072  -4.215  -9.499  0.5366  1.9080  0.0860
ATOM  10003  O   GLU   622     -13.537  -3.437 -10.330 -0.5819  1.6612  0.2100
ATOM  10004  N   TYR   623     -12.988  -3.858  -8.210 -0.4157  1.8240  0.1700
ATOM  10005  H   TYR   623     -12.684  -4.551  -7.536  0.2719  0.6000  0.0157
ATOM  10006  CA  TYR   623     -13.410  -2.540  -7.700 -0.0014  1.9080  0.1094
ATOM  10007  HA  TYR   623     -13.794  -1.927  -8.513  0.0876  1.3870  0.0157
ATOM  10008  CB  TYR   623     -14.530  -2.720  -6.667 -0.0152  1.9080  0.1094
ATOM  10009 2HB  TYR   623     -14.107  -3.312  -5.863  0.0295  1.4870  0.0157
ATOM  10010 3HB  TYR   623     -14.784  -1.738  -6.265  0.0295  1.4870  0.0157
ATOM  10011  CG  TYR   623     -15.831  -3.390  -7.081 -0.0011  1.9080  0.0860
ATOM  10012  CD1 TYR   623     -16.301  -3.357  -8.410 -0.1906  1.9080  0.0860

答案1

使用sed

sed '/ATOM *10000/,$ s/ATOM \( *\)\([0-9]*\)/ATOM\1\2/'

它使用地址范围 from/ATOM *10000/$,分别代表匹配 ATOM + 空格 + 10000 的第一行和最后一行。

对于地址范围中的每一行,它会替换 ATOM 后跟空格、数字替换为所有 ATOM、空格(不含第一个空格)和数字。

答案2

有很多方法可以做到这一点。您可以读取该文件并删除第二个字段大于 9999 的所有行上的空格:

$ awk '$2>9999{sub(/ /,"")}1;' file 
ATOM   9996  CG  GLU   622     -13.525  -7.714 -11.215  0.0136  1.9080  0.1094
ATOM   9997 2HG  GLU   622     -12.773  -7.608 -11.999 -0.0425  1.4870  0.0157
ATOM   9998 3HG  GLU   622     -13.121  -8.370 -10.441 -0.0425  1.4870  0.0157
ATOM   9999  CD  GLU   622     -14.803  -8.348 -11.783  0.8054  1.9080  0.0860
ATOM  10000  OE1 GLU   622     -15.541  -9.019 -11.024 -0.8188  1.6612  0.2100
ATOM  10001  OE2 GLU   622     -15.105  -8.223 -12.988 -0.8188  1.6612  0.2100
ATOM  10002  C   GLU   622     -13.072  -4.215  -9.499  0.5366  1.9080  0.0860
ATOM  10003  O   GLU   622     -13.537  -3.437 -10.330 -0.5819  1.6612  0.2100
ATOM  10004  N   TYR   623     -12.988  -3.858  -8.210 -0.4157  1.8240  0.1700
ATOM  10005  H   TYR   623     -12.684  -4.551  -7.536  0.2719  0.6000  0.0157
ATOM  10006  CA  TYR   623     -13.410  -2.540  -7.700 -0.0014  1.9080  0.1094
ATOM  10007  HA  TYR   623     -13.794  -1.927  -8.513  0.0876  1.3870  0.0157
ATOM  10008  CB  TYR   623     -14.530  -2.720  -6.667 -0.0152  1.9080  0.1094
ATOM  10009 2HB  TYR   623     -14.107  -3.312  -5.863  0.0295  1.4870  0.0157
ATOM  10010 3HB  TYR   623     -14.784  -1.738  -6.265  0.0295  1.4870  0.0157
ATOM  10011  CG  TYR   623     -15.831  -3.390  -7.081 -0.0011  1.9080  0.0860
ATOM  10012  CD1 TYR   623     -16.301  -3.357  -8.410 -0.1906  1.9080  0.0860

或者,您可以对齐所有内容:

$ perl -lane 'printf "%-5s%6s %-3s%4s%5d%8s%8s%8s%8s%8s\n",@F' file 
ATOM   9996 CG  GLU  622 -13.525  -7.714 -11.215  0.0136  1.9080
ATOM   9997 2HG GLU  622 -12.773  -7.608 -11.999 -0.0425  1.4870
ATOM   9998 3HG GLU  622 -13.121  -8.370 -10.441 -0.0425  1.4870
ATOM   9999 CD  GLU  622 -14.803  -8.348 -11.783  0.8054  1.9080
ATOM  10000 OE1 GLU  622 -15.541  -9.019 -11.024 -0.8188  1.6612
ATOM  10001 OE2 GLU  622 -15.105  -8.223 -12.988 -0.8188  1.6612
ATOM  10002 C   GLU  622 -13.072  -4.215  -9.499  0.5366  1.9080
ATOM  10003 O   GLU  622 -13.537  -3.437 -10.330 -0.5819  1.6612
ATOM  10004 N   TYR  623 -12.988  -3.858  -8.210 -0.4157  1.8240
ATOM  10005 H   TYR  623 -12.684  -4.551  -7.536  0.2719  0.6000
ATOM  10006 CA  TYR  623 -13.410  -2.540  -7.700 -0.0014  1.9080
ATOM  10007 HA  TYR  623 -13.794  -1.927  -8.513  0.0876  1.3870
ATOM  10008 CB  TYR  623 -14.530  -2.720  -6.667 -0.0152  1.9080
ATOM  10009 2HB TYR  623 -14.107  -3.312  -5.863  0.0295  1.4870
ATOM  10010 3HB TYR  623 -14.784  -1.738  -6.265  0.0295  1.4870
ATOM  10011 CG  TYR  623 -15.831  -3.390  -7.081 -0.0011  1.9080
ATOM  10012 CD1 TYR  623 -16.301  -3.357  -8.410 -0.1906  1.9080

相关内容