基于第一列的 Grep 行

基于第一列的 Grep 行

我有 xyz 文件格式的文件,如下所示:

   55
FINAL HEAT OF FORMATION =     0.000000
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 O    -2.879135     0.437518    -2.475109
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258

我有以下代码可将 .xyz 文件转换为分子输入格式:

CARBONS=$(grep -ow "C" $1 | wc -l)
HYDROGENS=$(grep -ow "H" $1 | wc -l)
OXYGENS=$(grep -ow "O" $1 | wc -l)

ATYPES=0
ARRAY=($CARBONS $HYDROGENS $OXYGENS)

for i in "${ARRAY[@]}"
do
        if [ $i -gt 0 ]; then
                ((ATYPES+=1))
        fi
done

echo "BASIS"
echo "co2"
echo ""
echo ""
echo "Atomtypes="$ATYPES" Generators=0 Integrals=1.00D-15 Angstrom"
echo "Charge=6.0 Atoms="$CARBONS""
grep "C" $1
echo "Charge=1.0 Atoms="$HYDROGENS""
grep "H" $1
if [ $OXYGENS -gt 0 ]; then
    echo "Charge=8.0 Atoms="$OXYGENS""
    grep "O" $1
fi

输出是:

BASIS
co2


Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
Charge=1.0 Atoms=36
FINAL HEAT OF FORMATION =     0.000000
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258
Charge=8.0 Atoms=1
FINAL HEAT OF FORMATION =     0.000000
 O    -2.879135     0.437518    -2.475109

但不应该有这些线条FINAL HEAT OF FORMATION = 0.000000。我猜 grep 命令会将它们带到那里,因为有些单词以字母H和开头O。正确的输出是:

BASIS
co2


Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
Charge=1.0 Atoms=36
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258
Charge=8.0 Atoms=1
 O    -2.879135     0.437518    -2.475109

我尝试将 grep 命令更改为grep -w "^C" $1andgrep -x "C" $1但这些都没有帮助。我怎样才能解决这个问题?

答案1

^C不起作用,因为C不在输入行的开头,前面有一个空格。grep '^ C' "$1"应该做你想做的事。

(顺便说一句,grep | wc -l您可以使用grep -c。哦,您的行中的引号echo有点奇怪,您可以简单地将变量放在引号内。)

答案2

尝试完全用 awk 来完成。例如,以下脚本使用两个数组atoms来记住给定原子的每个输入行,并count保留每个原子的计数。一旦它读取了整个输入文件,它就会以您想要的格式输出数据。

/^ [[:alpha:]]/ {
  if (count[$1] == 0) {
    atoms[$1]=$0;
  } else {
    atoms[$1]=atoms[$1] "\n" $0;
  }
    count[$1]++
}

END {
  atypes = length(count);

  print "BASIS\nco2\n\n"

  print "Atomtypes=" atypes " Generators=0 Integrals=1.00D-15 Angstrom"

  print "Charge=6.0 Atoms=" count["C"]
  print atoms["C"]

  print "Charge=1.0 Atoms=" count["H"]
  print atoms["H"]

  if (count["O"] > 0) {
    print "Charge=8.0 Atoms=" count["O"]
    print atoms["O"]
  }
}

如果它有一个与每个原子相关的电荷的查找表,那么这可以得到很大的改进,并变成一个通用的转换脚本……但没有必要。转换实用程序已经存在。它是obabel从调用的Open Babel:开源化学工具箱项目,并且可以在多种化学文件格式之间进行转换,例如从 转换xyzdalmol

obabel -i xyz input.xyz -o dalmol -O output.dalmol

运行olabel -L formats | less以获得支持格式的完整列表。

如果您正在运行 Debian 或 Debian 衍生版本(例如 Ubuntu、Mint 等),则可以安装它apt-get install openbabel。包装说明上写着:

Package: openbabel
Version: 3.1.1+dfsg-6
Installed-Size: 630
Maintainer: Debichem Team <[email protected]>
Architecture: amd64
Depends: libc6 (>= 2.14), libgcc-s1 (>= 3.0), libopenbabel7 (>= 3.1.1+dfsg), libstdc++6 (>= 5.2)
Description-en: Chemical toolbox utilities (cli)
 Open Babel is a chemical toolbox designed to speak the many languages of
 chemical data. It allows one to search, convert, analyze, or store data from
 molecular modeling, chemistry, solid-state materials, biochemistry, or related
 areas.  Features include:
 .
  * Hydrogen addition and deletion
  * Support for Molecular Mechanics
  * Support for SMARTS molecular matching syntax
  * Automatic feature perception (rings, bonds, hybridization, aromaticity)
  * Flexible atom typer and perception of multiple bonds from atomic coordinates
  * Gasteiger-Marsili partial charge calculation
 .
 File formats Open Babel supports include PDB, XYZ, CIF, CML, SMILES, MDL
 Molfile, ChemDraw, Gaussian, GAMESS, MOPAC and MPQC.
 .
 This package includes the following utilities:
  * obabel: Convert between various chemical file formats
  * obenergy: Calculate the energy for a molecule
  * obminimize: Optimize the geometry, minimize the energy for a molecule
  * obgrep: Molecular search program using SMARTS pattern
  * obgen: Generate 3D coordinates for a molecule
  * obprop: Print standard molecular properties
  * obfit: Superimpose two molecules based on a pattern
  * obrotamer: Generate conformer/rotamer coordinates
  * obconformer: Generate low-energy conformers
  * obchiral: Print molecular chirality information
  * obrotate: Rotate dihedral angle of molecules in batch mode
  * obprobe: Create electrostatic probe grid

相关内容