我有 xyz 文件格式的文件,如下所示:
55
FINAL HEAT OF FORMATION = 0.000000
C -1.602726 0.220926 0.289897
C -1.486393 1.490851 -0.581098
C -0.269002 2.434576 -0.276060
C 1.010307 1.687714 0.217781
C 1.485345 0.603160 -0.764139
C -1.564938 1.114872 -2.078306
O -2.879135 0.437518 -2.475109
C -0.550397 3.726131 0.624425
C -1.962009 3.939190 1.255790
C -2.367687 2.809316 2.219183
C 0.020100 4.998947 -0.121715
C -0.978418 5.719489 -1.074614
C -1.616282 4.795344 -2.118148
C 2.215398 2.612417 0.464811
C 0.729644 5.994046 0.844547
C 2.143005 6.393766 0.406166
C -2.045078 5.240181 2.079386
C -0.323618 6.897509 -1.813043
C -1.401212 2.359346 -2.899572
H -2.385338 2.081960 -0.396254
H 0.010153 2.832497 -1.252999
H 0.084959 3.605123 1.504509
H 0.809530 4.617245 -0.774572
H 0.128704 6.897394 0.976132
H 0.798102 5.548850 1.839117
H 2.585871 7.101059 1.112504
H 2.797551 5.521179 0.355908
H 2.147260 6.862273 -0.578875
H -1.790477 6.132728 -0.470526
H -1.045932 7.372355 -2.481810
H 0.046563 7.666682 -1.135188
H 0.516095 6.553772 -2.424710
H -2.319681 5.356939 -2.738366
H -0.857441 4.374340 -2.783635
H -2.163844 3.970151 -1.668818
H -2.716285 4.004456 0.465662
H -3.243256 3.107916 2.800711
H -2.619420 1.880831 1.722495
H -1.559685 2.604091 2.928503
H -3.049833 5.345765 2.495339
H -1.345140 5.209610 2.919004
H -1.835946 6.139276 1.507938
H 0.771267 1.206903 1.173131
H 3.062594 2.025264 0.827489
H 2.528865 3.099255 -0.462839
H 2.024108 3.390184 1.201447
H 2.402236 0.135542 -0.396501
H 0.758861 -0.189092 -0.920031
H 1.710054 1.046723 -1.738905
H -1.261942 0.377911 1.311479
H -2.639613 -0.116300 0.341732
H -1.021606 -0.605553 -0.121261
H -1.454026 2.110341 -3.938854
H -2.181199 3.050993 -2.658440
H -0.451621 2.804428 -2.687258
我有以下代码可将 .xyz 文件转换为分子输入格式:
CARBONS=$(grep -ow "C" $1 | wc -l)
HYDROGENS=$(grep -ow "H" $1 | wc -l)
OXYGENS=$(grep -ow "O" $1 | wc -l)
ATYPES=0
ARRAY=($CARBONS $HYDROGENS $OXYGENS)
for i in "${ARRAY[@]}"
do
if [ $i -gt 0 ]; then
((ATYPES+=1))
fi
done
echo "BASIS"
echo "co2"
echo ""
echo ""
echo "Atomtypes="$ATYPES" Generators=0 Integrals=1.00D-15 Angstrom"
echo "Charge=6.0 Atoms="$CARBONS""
grep "C" $1
echo "Charge=1.0 Atoms="$HYDROGENS""
grep "H" $1
if [ $OXYGENS -gt 0 ]; then
echo "Charge=8.0 Atoms="$OXYGENS""
grep "O" $1
fi
输出是:
BASIS
co2
Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
C -1.602726 0.220926 0.289897
C -1.486393 1.490851 -0.581098
C -0.269002 2.434576 -0.276060
C 1.010307 1.687714 0.217781
C 1.485345 0.603160 -0.764139
C -1.564938 1.114872 -2.078306
C -0.550397 3.726131 0.624425
C -1.962009 3.939190 1.255790
C -2.367687 2.809316 2.219183
C 0.020100 4.998947 -0.121715
C -0.978418 5.719489 -1.074614
C -1.616282 4.795344 -2.118148
C 2.215398 2.612417 0.464811
C 0.729644 5.994046 0.844547
C 2.143005 6.393766 0.406166
C -2.045078 5.240181 2.079386
C -0.323618 6.897509 -1.813043
C -1.401212 2.359346 -2.899572
Charge=1.0 Atoms=36
FINAL HEAT OF FORMATION = 0.000000
H -2.385338 2.081960 -0.396254
H 0.010153 2.832497 -1.252999
H 0.084959 3.605123 1.504509
H 0.809530 4.617245 -0.774572
H 0.128704 6.897394 0.976132
H 0.798102 5.548850 1.839117
H 2.585871 7.101059 1.112504
H 2.797551 5.521179 0.355908
H 2.147260 6.862273 -0.578875
H -1.790477 6.132728 -0.470526
H -1.045932 7.372355 -2.481810
H 0.046563 7.666682 -1.135188
H 0.516095 6.553772 -2.424710
H -2.319681 5.356939 -2.738366
H -0.857441 4.374340 -2.783635
H -2.163844 3.970151 -1.668818
H -2.716285 4.004456 0.465662
H -3.243256 3.107916 2.800711
H -2.619420 1.880831 1.722495
H -1.559685 2.604091 2.928503
H -3.049833 5.345765 2.495339
H -1.345140 5.209610 2.919004
H -1.835946 6.139276 1.507938
H 0.771267 1.206903 1.173131
H 3.062594 2.025264 0.827489
H 2.528865 3.099255 -0.462839
H 2.024108 3.390184 1.201447
H 2.402236 0.135542 -0.396501
H 0.758861 -0.189092 -0.920031
H 1.710054 1.046723 -1.738905
H -1.261942 0.377911 1.311479
H -2.639613 -0.116300 0.341732
H -1.021606 -0.605553 -0.121261
H -1.454026 2.110341 -3.938854
H -2.181199 3.050993 -2.658440
H -0.451621 2.804428 -2.687258
Charge=8.0 Atoms=1
FINAL HEAT OF FORMATION = 0.000000
O -2.879135 0.437518 -2.475109
但不应该有这些线条FINAL HEAT OF FORMATION = 0.000000
。我猜 grep 命令会将它们带到那里,因为有些单词以字母H
和开头O
。正确的输出是:
BASIS
co2
Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
C -1.602726 0.220926 0.289897
C -1.486393 1.490851 -0.581098
C -0.269002 2.434576 -0.276060
C 1.010307 1.687714 0.217781
C 1.485345 0.603160 -0.764139
C -1.564938 1.114872 -2.078306
C -0.550397 3.726131 0.624425
C -1.962009 3.939190 1.255790
C -2.367687 2.809316 2.219183
C 0.020100 4.998947 -0.121715
C -0.978418 5.719489 -1.074614
C -1.616282 4.795344 -2.118148
C 2.215398 2.612417 0.464811
C 0.729644 5.994046 0.844547
C 2.143005 6.393766 0.406166
C -2.045078 5.240181 2.079386
C -0.323618 6.897509 -1.813043
C -1.401212 2.359346 -2.899572
Charge=1.0 Atoms=36
H -2.385338 2.081960 -0.396254
H 0.010153 2.832497 -1.252999
H 0.084959 3.605123 1.504509
H 0.809530 4.617245 -0.774572
H 0.128704 6.897394 0.976132
H 0.798102 5.548850 1.839117
H 2.585871 7.101059 1.112504
H 2.797551 5.521179 0.355908
H 2.147260 6.862273 -0.578875
H -1.790477 6.132728 -0.470526
H -1.045932 7.372355 -2.481810
H 0.046563 7.666682 -1.135188
H 0.516095 6.553772 -2.424710
H -2.319681 5.356939 -2.738366
H -0.857441 4.374340 -2.783635
H -2.163844 3.970151 -1.668818
H -2.716285 4.004456 0.465662
H -3.243256 3.107916 2.800711
H -2.619420 1.880831 1.722495
H -1.559685 2.604091 2.928503
H -3.049833 5.345765 2.495339
H -1.345140 5.209610 2.919004
H -1.835946 6.139276 1.507938
H 0.771267 1.206903 1.173131
H 3.062594 2.025264 0.827489
H 2.528865 3.099255 -0.462839
H 2.024108 3.390184 1.201447
H 2.402236 0.135542 -0.396501
H 0.758861 -0.189092 -0.920031
H 1.710054 1.046723 -1.738905
H -1.261942 0.377911 1.311479
H -2.639613 -0.116300 0.341732
H -1.021606 -0.605553 -0.121261
H -1.454026 2.110341 -3.938854
H -2.181199 3.050993 -2.658440
H -0.451621 2.804428 -2.687258
Charge=8.0 Atoms=1
O -2.879135 0.437518 -2.475109
我尝试将 grep 命令更改为grep -w "^C" $1
andgrep -x "C" $1
但这些都没有帮助。我怎样才能解决这个问题?
答案1
^C
不起作用,因为C
不在输入行的开头,前面有一个空格。grep '^ C' "$1"
应该做你想做的事。
(顺便说一句,grep | wc -l
您可以使用grep -c
。哦,您的行中的引号echo
有点奇怪,您可以简单地将变量放在引号内。)
答案2
尝试完全用 awk 来完成。例如,以下脚本使用两个数组atoms
来记住给定原子的每个输入行,并count
保留每个原子的计数。一旦它读取了整个输入文件,它就会以您想要的格式输出数据。
/^ [[:alpha:]]/ {
if (count[$1] == 0) {
atoms[$1]=$0;
} else {
atoms[$1]=atoms[$1] "\n" $0;
}
count[$1]++
}
END {
atypes = length(count);
print "BASIS\nco2\n\n"
print "Atomtypes=" atypes " Generators=0 Integrals=1.00D-15 Angstrom"
print "Charge=6.0 Atoms=" count["C"]
print atoms["C"]
print "Charge=1.0 Atoms=" count["H"]
print atoms["H"]
if (count["O"] > 0) {
print "Charge=8.0 Atoms=" count["O"]
print atoms["O"]
}
}
如果它有一个与每个原子相关的电荷的查找表,那么这可以得到很大的改进,并变成一个通用的转换脚本……但没有必要。转换实用程序已经存在。它是obabel
从调用的Open Babel:开源化学工具箱项目,并且可以在多种化学文件格式之间进行转换,例如从 转换xyz
为dalmol
:
obabel -i xyz input.xyz -o dalmol -O output.dalmol
运行olabel -L formats | less
以获得支持格式的完整列表。
如果您正在运行 Debian 或 Debian 衍生版本(例如 Ubuntu、Mint 等),则可以安装它apt-get install openbabel
。包装说明上写着:
Package: openbabel
Version: 3.1.1+dfsg-6
Installed-Size: 630
Maintainer: Debichem Team <[email protected]>
Architecture: amd64
Depends: libc6 (>= 2.14), libgcc-s1 (>= 3.0), libopenbabel7 (>= 3.1.1+dfsg), libstdc++6 (>= 5.2)
Description-en: Chemical toolbox utilities (cli)
Open Babel is a chemical toolbox designed to speak the many languages of
chemical data. It allows one to search, convert, analyze, or store data from
molecular modeling, chemistry, solid-state materials, biochemistry, or related
areas. Features include:
.
* Hydrogen addition and deletion
* Support for Molecular Mechanics
* Support for SMARTS molecular matching syntax
* Automatic feature perception (rings, bonds, hybridization, aromaticity)
* Flexible atom typer and perception of multiple bonds from atomic coordinates
* Gasteiger-Marsili partial charge calculation
.
File formats Open Babel supports include PDB, XYZ, CIF, CML, SMILES, MDL
Molfile, ChemDraw, Gaussian, GAMESS, MOPAC and MPQC.
.
This package includes the following utilities:
* obabel: Convert between various chemical file formats
* obenergy: Calculate the energy for a molecule
* obminimize: Optimize the geometry, minimize the energy for a molecule
* obgrep: Molecular search program using SMARTS pattern
* obgen: Generate 3D coordinates for a molecule
* obprop: Print standard molecular properties
* obfit: Superimpose two molecules based on a pattern
* obrotamer: Generate conformer/rotamer coordinates
* obconformer: Generate low-energy conformers
* obchiral: Print molecular chirality information
* obrotate: Rotate dihedral angle of molecules in batch mode
* obprobe: Create electrostatic probe grid