AWK 脚本中 while 函数的问题

AWK 脚本中 while 函数的问题

我有一个看起来很丑陋的脚本,我用它来处理模拟中的评估文件,它看起来很糟糕,不,我不是编码员,虽然它通常有效,但目前还不行。

澄清一下,该脚本通常会迭代多个输入文件,并且它确实可以在我的 Mac 和我运行模拟的集群上运行。我现在尝试在运行 Ubuntu 服务器的 VPS 上运行它,它会产生一些奇怪的输出。我不知道如何解决这个问题。

这是完整的脚本:

#!/usr/bin/awk -f
FNR==1 && NR!=1 { endfile(); avgLT=totFrames=avgLTsq=avgFramessq=denom=0 }
FNR==1 { out1="analLT_"FILENAME; out2="sumLT_"FILENAME; out3="reportLT.txt"; print "-> Input file is: "FILENAME >> out3; next
       }
FNR==1 { next }

{
   avgLT+=$4; totFrames+=$5; ++denom;
   printf "%10.4f %10.1f\n",$4,$5 > out1
  }

END { endfile() }
function endfile()
{
  x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
  if (avgLT==0 && denom==0) {
    print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
    close(out1); close(out2); close(out3); return
  }
  if (avgLT>0) {
    avgAvgLT=avgLT/denom
    avgFrames=totFrames/denom
    while ((getline<out1)>0) {
      avgLTsq+=(($1-avgAvgLT)^2)
      avgFramessq+=(($2-avgFrames)^2)
    }
  close(out1)
    printf "\n   Summary data for hbond lifetime analysis:\n\n" > out2
    printf "   Summed Avg Lifetime:    %10.4f\n",avgLT > out2
    printf "   Average Lifetime:       %10.4f\n",avgAvgLT > out2
    printf "      Summed Frames:  %10.0f\n",totFrames > out2
    printf "      Average Frames:      %10.4f\n",avgFrames > out2
    printf "\n   Summary data for hbond lifetime analysis:\n\n" >> out3
    printf "   Summed Avg Lifetime:    %10.4f\n",avgLT >> out3
    printf "   Average Lifetime:       %10.4f\n",avgAvgLT >> out3
    printf "      Summed Frames:  %10.0f\n",totFrames >> out3
    printf "      Average Frames:      %10.4f\n",avgFrames >> out3

    if (denom>1) {
      sd_avgLT=sqrt(avgLTsq/(denom-1)); semAvgLT=(sd_avgLT/(sqrt(denom))); sd_totFrames=sqrt(avgFramessq/(denom-1)); semTotFrames=(sd_totFrames/(sqrt(denom)))
      printf "\n   SD lifetime:            %10.4f\n",sd_avgLT > out2
      printf "   SEM lifetime:           %10.4f\n",semAvgLT > out2
      printf "      SD Frames:           %10.4f\n",sd_totFrames > out2
      printf "      SEM Frames:          %10.4f\n\n",semTotFrames > out2
      printf "\n   SD lifetime:            %10.4f\n",sd_avgLT >> out3
      printf "   SEM lifetime:           %10.4f\n",semAvgLT >> out3
      printf "      SD Frames:           %10.4f\n",sd_totFrames > out3
      printf "      SEM Frames:          %10.4f\n\n",semTotFrames > out3
    } if (denom>1 && denom!=2) {print "----------------------------------------\n" >> out3 }
      if (denom==1) { print "   Single HBOND event, no SD or SEM calculation possible!" > out2;
             print "\n   Single HBOND event, no SD or SEM calculation possible!\n\n----------------------------------------\n" >> out3
           }
      if (denom==2) { print "\n   2 Hydrogen bond events found! No proper SD or SEM!" > out2;
             print "   2 Hydrogen bond events found! No proper SD or SEM!\n\n----------------------------------------\n" >> out3
           }
}
  close(out3)
  close(out2)
}

它采用 5 列输入文件,处理 2 列,并将相同的列放入单独的文件中以供以后处理 (out1)。然后应该处理该文件以计算一些统计数据,尽管这不会发生在 VPS 上,我得到的只是 0.0000 值。

问题似乎出在 while 函数上:

while ((getline<out1)>0) {
      avgLTsq+=(($1-avgAvgLT)^2)
      avgFramessq+=(($2-avgFrames)^2)
    }

在脚本结束时,当内容打印到文件中时,我似乎得到了计算得出的总和和平均值的合理值(、avgLTavgAvgLT)。当到达统计部分(、和)时,所有这些都会打印到两者,尽管所有值都是 0.0000 ,而不是它们应该的值。totFramesavgFramessd_avgLTsemAvgLTsd_totFramessemTotFramesout2out3

“数学”似乎可以在out1文件上单独运行命令:

awk ' BEGIN { avgAvgLT=1.4264 } { avgLTsq+=(($1-avgAvgLT)^2) } END { print avgLTsq }' analLT_multiple.out
awk ' BEGIN { avgFrames=4.4831 } { avgFramessq+=(($2-avgFrames)^2) } END { print avgFramessq }' analLT_multiple.out
awk ' BEGIN { avgLTsq=30.3478; denom=89 } { sd_avgLT=sqrt(avgLTsq/(denom-1)) } END { print sd_avgLT }' analLT_multiple.out
awk ' BEGIN { sd_avgLT=0.587249; denom=89 } {semAvgLT=(sd_avgLT/(sqrt(denom))) }  END { print semAvgLT }' analLT_multiple.out
awk ' BEGIN { avgFramessq=2040.22; denom=89 } { sd_totFrames=sqrt(avgFramessq/(denom-1)) } END { print sd_totFrames }' analLT_multiple.out
awk ' BEGIN { sd_totFrames=4.81501; denom=89 } { semTotFrames=(sd_totFrames/(sqrt(denom))) } END { print semTotFrames }' analLT_multiple.out

给我非零值,这些值看起来很合理,但是脚本给我所有值都是 0.0000。我还尝试在运行多个文件时打印脚本中变量的值,denom尽管sd_avgLTsemAvgLTsd_totFramessemTotFrames都返回零或空,但变量仍有效。

我的“结论”(我会在这里说猜测)是,正如前面所说, while 函数有问题,尽管我不明白是什么。

我把一个示例输入文件放在pastebin上https://pastebin.com/JsuTz0mD 如果您想尝试自己运行该脚本。

任何可以使该脚本在我的 VPS 系统上运行的输入/反馈或解决方案将不胜感激

答案1

要么是awk您正在使用的(GNUawk或)在写入文件时mawk不会刷新写入文件的数据,要么是从您保持打开状态以进行写入的文件句柄中读取任何内容。这意味着当您从块中的该文件读取数据时,不会读取任何数据。 BSD实现似乎没有这个问题,并且您的代码在 OpenBSD 和 macOS 等上按预期工作。out1awkENDawk

解决办法很简单,close(out1)无条件使用END在block从中读取getline.目前,您关闭它从中阅读。

>另外,请考虑与您的和更加一致>>。这段代码相信你可以>通篇使用。

答案2

这不是答案,因为 @Kusalananda 已经告诉你出了什么问题,但让我们稍微整理一下你的脚本以使其可读并减少代码的重复:

FNR == 1 {
    if ( NR != 1 ) {
        endfile()
    }
    avgLT = totFrames = denom = 0
    out1 = "analLT_" FILENAME
    out2 = "sumLT_" FILENAME
    out3 = "reportLT.txt"
    print "-> Input file is: " FILENAME > out3
    next
}

{
    avgLT += $4
    totFrames += $5
    ++denom
    printf "%10.4f %10.1f\n", $4, $5 > out1
}

END {
    endfile()
}

function endfile(       x, avgAvgLT, avgFrames, sd_avgLT,
                        semAvgLT, sd_totFrames, semTotFrames )
{
    if (avgLT == 0 && denom == 0 ) {
        x = "\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
        print x         > out1
        print x         > out2
        print x         > out3
    }
    else if (avgLT > 0) {
        avgAvgLT = avgLT / denom
        avgFrames = totFrames / denom

        close(out1)
        while ((getline < out1) > 0) {
            avgLTsq     += (($1 - avgAvgLT) ^ 2)
            avgFramessq += (($2 - avgFrames) ^ 2)
        }
        close(out1)

        printf "\n   Summary data for hbond lifetime analysis:\n\n"             > out2
        printf "   Summed Avg Lifetime:    %10.4f\n", avgLT                     > out2
        printf "   Average Lifetime:       %10.4f\n", avgAvgLT                  > out2
        printf "      Summed Frames:  %10.0f\n", totFrames                      > out2
        printf "      Average Frames:      %10.4f\n", avgFrames                 > out2

        printf "\n   Summary data for hbond lifetime analysis:\n\n"             > out3
        printf "   Summed Avg Lifetime:    %10.4f\n", avgLT                     > out3
        printf "   Average Lifetime:       %10.4f\n", avgAvgLT                  > out3
        printf "      Summed Frames:  %10.0f\n", totFrames                      > out3
        printf "      Average Frames:      %10.4f\n", avgFrames                 > out3

        if (denom == 1) {
            x = "   Single HBOND event, no SD or SEM calculation possible!"
            print x     > out2
            print ""    > out3
            print x     > out3
        }
        else if (denom > 1) {
            sd_avgLT = sqrt(avgLTsq / (denom - 1))
            semAvgLT = (sd_avgLT / (sqrt(denom)))
            sd_totFrames = sqrt(avgFramessq / (denom - 1))
            semTotFrames = (sd_totFrames / (sqrt(denom)))

            printf "\n   SD lifetime:            %10.4f\n", sd_avgLT            > out2
            printf "   SEM lifetime:           %10.4f\n", semAvgLT              > out2
            printf "      SD Frames:           %10.4f\n", sd_totFrames          > out2
            printf "      SEM Frames:          %10.4f\n\n", semTotFrames        > out2

            printf "\n   SD lifetime:            %10.4f\n", sd_avgLT            > out3
            printf "   SEM lifetime:           %10.4f\n", semAvgLT              > out3
            printf "      SD Frames:           %10.4f\n", sd_totFrames          > out3
            printf "      SEM Frames:          %10.4f\n\n", semTotFrames        > out3

            if (denom == 2) {
                x = "   2 Hydrogen bond events found! No proper SD or SEM!"
                print ""        > out2
                print x         > out2
                print x         > out3
        }
    }

    print "\n\n----------------------------------------\n"                      > out3

    close(out1)
    close(out2)
    close(out3)
}

out1 上的 while getline 循环显然并不是真正必要的,因为您可以将数据存储在数组中,而不是在脚本主体中写入 out1,例如:

FNR == 1 {
    if ( NR != 1 ) {
        endfile()
    }
    avgLT = totFrames =  denom = 0
    out1 = "analLT_" FILENAME
    out2 = "sumLT_" FILENAME
    out3 = "reportLT.txt"
    print "-> Input file is: " FILENAME > out3
    next
}

{
    avgLT += $4
    totFrames += $5
    ++denom
    fnr2avgLT[FNR] = avgLT
    fnr2totFrames[FNR] = totFrames
}

END {
    endfile()
}

function endfile(       i, x, avgAvgLT, avgFrames, sd_avgLT,
                        semAvgLT, sd_totFrames, semTotFrames )
{
    if (avgLT == 0 && denom == 0 ) {
        x = "\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
        print x         > out1
        print x         > out2
        print x         > out3
    }
    else if (avgLT > 0) {
        avgAvgLT = avgLT / denom
        avgFrames = totFrames / denom

        for (i=1; i<=FNR; i++) {
            avgLT = fnr2avgLT[i]
            totFrames = fnr2totFrames[i]
            printf "%10.4f %10.1f\n", avgLT, totFrames > out1

            avgLTsq     += ((avgLT - avgAvgLT) ^ 2)
            avgFramessq += ((totFrames - avgFrames) ^ 2)
        }

        printf "\n   Summary data for hbond lifetime analysis:\n\n"             > out2
        printf "   Summed Avg Lifetime:    %10.4f\n", avgLT                     > out2
        printf "   Average Lifetime:       %10.4f\n", avgAvgLT                  > out2
        printf "      Summed Frames:  %10.0f\n", totFrames                      > out2
        printf "      Average Frames:      %10.4f\n", avgFrames                 > out2

        printf "\n   Summary data for hbond lifetime analysis:\n\n"             > out3
        printf "   Summed Avg Lifetime:    %10.4f\n", avgLT                     > out3
        printf "   Average Lifetime:       %10.4f\n", avgAvgLT                  > out3
        printf "      Summed Frames:  %10.0f\n", totFrames                      > out3
        printf "      Average Frames:      %10.4f\n", avgFrames                 > out3

        if (denom == 1) {
            x = "   Single HBOND event, no SD or SEM calculation possible!"
            print x     > out2
            print ""    > out3
            print x     > out3
        }
        else if (denom > 1) {
            sd_avgLT = sqrt(avgLTsq / (denom - 1))
            semAvgLT = (sd_avgLT / (sqrt(denom)))
            sd_totFrames = sqrt(avgFramessq / (denom - 1))
            semTotFrames = (sd_totFrames / (sqrt(denom)))

            printf "\n   SD lifetime:            %10.4f\n", sd_avgLT            > out2
            printf "   SEM lifetime:           %10.4f\n", semAvgLT              > out2
            printf "      SD Frames:           %10.4f\n", sd_totFrames          > out2
            printf "      SEM Frames:          %10.4f\n\n", semTotFrames        > out2

            printf "\n   SD lifetime:            %10.4f\n", sd_avgLT            > out3
            printf "   SEM lifetime:           %10.4f\n", semAvgLT              > out3
            printf "      SD Frames:           %10.4f\n", sd_totFrames          > out3
            printf "      SEM Frames:          %10.4f\n\n", semTotFrames        > out3

            if (denom == 2) {
                x = "   2 Hydrogen bond events found! No proper SD or SEM!"
                print ""        > out2
                print x         > out2
                print x         > out3
        }
    }

    print "\n\n----------------------------------------\n"                      > out3

    close(out1)
    close(out2)
    close(out3)
}

当然,上述所有内容都未经测试,因为您没有提供任何示例输入/输出供我们测试,但希望任何错误都很容易发现和纠正。

相关内容