我有一个看起来很丑陋的脚本,我用它来处理模拟中的评估文件,它看起来很糟糕,不,我不是编码员,虽然它通常有效,但目前还不行。
澄清一下,该脚本通常会迭代多个输入文件,并且它确实可以在我的 Mac 和我运行模拟的集群上运行。我现在尝试在运行 Ubuntu 服务器的 VPS 上运行它,它会产生一些奇怪的输出。我不知道如何解决这个问题。
这是完整的脚本:
#!/usr/bin/awk -f
FNR==1 && NR!=1 { endfile(); avgLT=totFrames=avgLTsq=avgFramessq=denom=0 }
FNR==1 { out1="analLT_"FILENAME; out2="sumLT_"FILENAME; out3="reportLT.txt"; print "-> Input file is: "FILENAME >> out3; next
}
FNR==1 { next }
{
avgLT+=$4; totFrames+=$5; ++denom;
printf "%10.4f %10.1f\n",$4,$5 > out1
}
END { endfile() }
function endfile()
{
x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
if (avgLT==0 && denom==0) {
print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
close(out1); close(out2); close(out3); return
}
if (avgLT>0) {
avgAvgLT=avgLT/denom
avgFrames=totFrames/denom
while ((getline<out1)>0) {
avgLTsq+=(($1-avgAvgLT)^2)
avgFramessq+=(($2-avgFrames)^2)
}
close(out1)
printf "\n Summary data for hbond lifetime analysis:\n\n" > out2
printf " Summed Avg Lifetime: %10.4f\n",avgLT > out2
printf " Average Lifetime: %10.4f\n",avgAvgLT > out2
printf " Summed Frames: %10.0f\n",totFrames > out2
printf " Average Frames: %10.4f\n",avgFrames > out2
printf "\n Summary data for hbond lifetime analysis:\n\n" >> out3
printf " Summed Avg Lifetime: %10.4f\n",avgLT >> out3
printf " Average Lifetime: %10.4f\n",avgAvgLT >> out3
printf " Summed Frames: %10.0f\n",totFrames >> out3
printf " Average Frames: %10.4f\n",avgFrames >> out3
if (denom>1) {
sd_avgLT=sqrt(avgLTsq/(denom-1)); semAvgLT=(sd_avgLT/(sqrt(denom))); sd_totFrames=sqrt(avgFramessq/(denom-1)); semTotFrames=(sd_totFrames/(sqrt(denom)))
printf "\n SD lifetime: %10.4f\n",sd_avgLT > out2
printf " SEM lifetime: %10.4f\n",semAvgLT > out2
printf " SD Frames: %10.4f\n",sd_totFrames > out2
printf " SEM Frames: %10.4f\n\n",semTotFrames > out2
printf "\n SD lifetime: %10.4f\n",sd_avgLT >> out3
printf " SEM lifetime: %10.4f\n",semAvgLT >> out3
printf " SD Frames: %10.4f\n",sd_totFrames > out3
printf " SEM Frames: %10.4f\n\n",semTotFrames > out3
} if (denom>1 && denom!=2) {print "----------------------------------------\n" >> out3 }
if (denom==1) { print " Single HBOND event, no SD or SEM calculation possible!" > out2;
print "\n Single HBOND event, no SD or SEM calculation possible!\n\n----------------------------------------\n" >> out3
}
if (denom==2) { print "\n 2 Hydrogen bond events found! No proper SD or SEM!" > out2;
print " 2 Hydrogen bond events found! No proper SD or SEM!\n\n----------------------------------------\n" >> out3
}
}
close(out3)
close(out2)
}
它采用 5 列输入文件,处理 2 列,并将相同的列放入单独的文件中以供以后处理 (out1)。然后应该处理该文件以计算一些统计数据,尽管这不会发生在 VPS 上,我得到的只是 0.0000 值。
问题似乎出在 while 函数上:
while ((getline<out1)>0) {
avgLTsq+=(($1-avgAvgLT)^2)
avgFramessq+=(($2-avgFrames)^2)
}
在脚本结束时,当内容打印到文件中时,我似乎得到了计算得出的总和和平均值的合理值(、avgLT
和avgAvgLT
)。当到达统计部分(、和)时,所有这些都会打印到两者,尽管所有值都是 0.0000 ,而不是它们应该的值。totFrames
avgFrames
sd_avgLT
semAvgLT
sd_totFrames
semTotFrames
out2
out3
“数学”似乎可以在out1
文件上单独运行命令:
awk ' BEGIN { avgAvgLT=1.4264 } { avgLTsq+=(($1-avgAvgLT)^2) } END { print avgLTsq }' analLT_multiple.out
awk ' BEGIN { avgFrames=4.4831 } { avgFramessq+=(($2-avgFrames)^2) } END { print avgFramessq }' analLT_multiple.out
awk ' BEGIN { avgLTsq=30.3478; denom=89 } { sd_avgLT=sqrt(avgLTsq/(denom-1)) } END { print sd_avgLT }' analLT_multiple.out
awk ' BEGIN { sd_avgLT=0.587249; denom=89 } {semAvgLT=(sd_avgLT/(sqrt(denom))) } END { print semAvgLT }' analLT_multiple.out
awk ' BEGIN { avgFramessq=2040.22; denom=89 } { sd_totFrames=sqrt(avgFramessq/(denom-1)) } END { print sd_totFrames }' analLT_multiple.out
awk ' BEGIN { sd_totFrames=4.81501; denom=89 } { semTotFrames=(sd_totFrames/(sqrt(denom))) } END { print semTotFrames }' analLT_multiple.out
给我非零值,这些值看起来很合理,但是脚本给我所有值都是 0.0000。我还尝试在运行多个文件时打印脚本中变量的值,denom
尽管sd_avgLT
、semAvgLT
、sd_totFrames
和semTotFrames
都返回零或空,但变量仍有效。
我的“结论”(我会在这里说猜测)是,正如前面所说, while 函数有问题,尽管我不明白是什么。
我把一个示例输入文件放在pastebin上https://pastebin.com/JsuTz0mD 如果您想尝试自己运行该脚本。
任何可以使该脚本在我的 VPS 系统上运行的输入/反馈或解决方案将不胜感激
答案1
要么是awk
您正在使用的(GNUawk
或)在写入文件时mawk
不会刷新写入文件的数据,要么是从您保持打开状态以进行写入的文件句柄中读取任何内容。这意味着当您从块中的该文件读取数据时,不会读取任何数据。 BSD实现似乎没有这个问题,并且您的代码在 OpenBSD 和 macOS 等上按预期工作。out1
awk
END
awk
解决办法很简单,close(out1)
无条件使用END
在block前从中读取getline
.目前,您关闭它后从中阅读。
>
另外,请考虑与您的和更加一致>>
。这段代码相信你可以>
通篇使用。
答案2
这不是答案,因为 @Kusalananda 已经告诉你出了什么问题,但让我们稍微整理一下你的脚本以使其可读并减少代码的重复:
FNR == 1 {
if ( NR != 1 ) {
endfile()
}
avgLT = totFrames = denom = 0
out1 = "analLT_" FILENAME
out2 = "sumLT_" FILENAME
out3 = "reportLT.txt"
print "-> Input file is: " FILENAME > out3
next
}
{
avgLT += $4
totFrames += $5
++denom
printf "%10.4f %10.1f\n", $4, $5 > out1
}
END {
endfile()
}
function endfile( x, avgAvgLT, avgFrames, sd_avgLT,
semAvgLT, sd_totFrames, semTotFrames )
{
if (avgLT == 0 && denom == 0 ) {
x = "\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x > out1
print x > out2
print x > out3
}
else if (avgLT > 0) {
avgAvgLT = avgLT / denom
avgFrames = totFrames / denom
close(out1)
while ((getline < out1) > 0) {
avgLTsq += (($1 - avgAvgLT) ^ 2)
avgFramessq += (($2 - avgFrames) ^ 2)
}
close(out1)
printf "\n Summary data for hbond lifetime analysis:\n\n" > out2
printf " Summed Avg Lifetime: %10.4f\n", avgLT > out2
printf " Average Lifetime: %10.4f\n", avgAvgLT > out2
printf " Summed Frames: %10.0f\n", totFrames > out2
printf " Average Frames: %10.4f\n", avgFrames > out2
printf "\n Summary data for hbond lifetime analysis:\n\n" > out3
printf " Summed Avg Lifetime: %10.4f\n", avgLT > out3
printf " Average Lifetime: %10.4f\n", avgAvgLT > out3
printf " Summed Frames: %10.0f\n", totFrames > out3
printf " Average Frames: %10.4f\n", avgFrames > out3
if (denom == 1) {
x = " Single HBOND event, no SD or SEM calculation possible!"
print x > out2
print "" > out3
print x > out3
}
else if (denom > 1) {
sd_avgLT = sqrt(avgLTsq / (denom - 1))
semAvgLT = (sd_avgLT / (sqrt(denom)))
sd_totFrames = sqrt(avgFramessq / (denom - 1))
semTotFrames = (sd_totFrames / (sqrt(denom)))
printf "\n SD lifetime: %10.4f\n", sd_avgLT > out2
printf " SEM lifetime: %10.4f\n", semAvgLT > out2
printf " SD Frames: %10.4f\n", sd_totFrames > out2
printf " SEM Frames: %10.4f\n\n", semTotFrames > out2
printf "\n SD lifetime: %10.4f\n", sd_avgLT > out3
printf " SEM lifetime: %10.4f\n", semAvgLT > out3
printf " SD Frames: %10.4f\n", sd_totFrames > out3
printf " SEM Frames: %10.4f\n\n", semTotFrames > out3
if (denom == 2) {
x = " 2 Hydrogen bond events found! No proper SD or SEM!"
print "" > out2
print x > out2
print x > out3
}
}
print "\n\n----------------------------------------\n" > out3
close(out1)
close(out2)
close(out3)
}
out1 上的 while getline 循环显然并不是真正必要的,因为您可以将数据存储在数组中,而不是在脚本主体中写入 out1,例如:
FNR == 1 {
if ( NR != 1 ) {
endfile()
}
avgLT = totFrames = denom = 0
out1 = "analLT_" FILENAME
out2 = "sumLT_" FILENAME
out3 = "reportLT.txt"
print "-> Input file is: " FILENAME > out3
next
}
{
avgLT += $4
totFrames += $5
++denom
fnr2avgLT[FNR] = avgLT
fnr2totFrames[FNR] = totFrames
}
END {
endfile()
}
function endfile( i, x, avgAvgLT, avgFrames, sd_avgLT,
semAvgLT, sd_totFrames, semTotFrames )
{
if (avgLT == 0 && denom == 0 ) {
x = "\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x > out1
print x > out2
print x > out3
}
else if (avgLT > 0) {
avgAvgLT = avgLT / denom
avgFrames = totFrames / denom
for (i=1; i<=FNR; i++) {
avgLT = fnr2avgLT[i]
totFrames = fnr2totFrames[i]
printf "%10.4f %10.1f\n", avgLT, totFrames > out1
avgLTsq += ((avgLT - avgAvgLT) ^ 2)
avgFramessq += ((totFrames - avgFrames) ^ 2)
}
printf "\n Summary data for hbond lifetime analysis:\n\n" > out2
printf " Summed Avg Lifetime: %10.4f\n", avgLT > out2
printf " Average Lifetime: %10.4f\n", avgAvgLT > out2
printf " Summed Frames: %10.0f\n", totFrames > out2
printf " Average Frames: %10.4f\n", avgFrames > out2
printf "\n Summary data for hbond lifetime analysis:\n\n" > out3
printf " Summed Avg Lifetime: %10.4f\n", avgLT > out3
printf " Average Lifetime: %10.4f\n", avgAvgLT > out3
printf " Summed Frames: %10.0f\n", totFrames > out3
printf " Average Frames: %10.4f\n", avgFrames > out3
if (denom == 1) {
x = " Single HBOND event, no SD or SEM calculation possible!"
print x > out2
print "" > out3
print x > out3
}
else if (denom > 1) {
sd_avgLT = sqrt(avgLTsq / (denom - 1))
semAvgLT = (sd_avgLT / (sqrt(denom)))
sd_totFrames = sqrt(avgFramessq / (denom - 1))
semTotFrames = (sd_totFrames / (sqrt(denom)))
printf "\n SD lifetime: %10.4f\n", sd_avgLT > out2
printf " SEM lifetime: %10.4f\n", semAvgLT > out2
printf " SD Frames: %10.4f\n", sd_totFrames > out2
printf " SEM Frames: %10.4f\n\n", semTotFrames > out2
printf "\n SD lifetime: %10.4f\n", sd_avgLT > out3
printf " SEM lifetime: %10.4f\n", semAvgLT > out3
printf " SD Frames: %10.4f\n", sd_totFrames > out3
printf " SEM Frames: %10.4f\n\n", semTotFrames > out3
if (denom == 2) {
x = " 2 Hydrogen bond events found! No proper SD or SEM!"
print "" > out2
print x > out2
print x > out3
}
}
print "\n\n----------------------------------------\n" > out3
close(out1)
close(out2)
close(out3)
}
当然,上述所有内容都未经测试,因为您没有提供任何示例输入/输出供我们测试,但希望任何错误都很容易发现和纠正。