awk：使用大量输入文件计算多列数据的最小值/最大值

Question

如果要使用/awk来计算一系列文件，只需在命令行上将这些文件作为脚本的输入提供即可maxminawk

awk -F, '
    ($2+0 > POP+0 || POP == "") && $1+0 > 0 { POP = $2 }
    ($3+0 < dG+0 || dG == "") && $1+0 > 0 { dG = $3 }
    END { print POP, dG }
' file1 file2 file3...

（这也可以通过简单地将所有行连接起来写成一行，但可读性较差。）

让我们分解一条线。模式样式是表达 { 行动 }并且任何一部分都是可选的。此处的表达式是寻找POP任何行中较大的值，其中ID是非零数字

($2+0 > POP+0 || POP == "") && $1+0 > 0 { POP = $2 }

$2+0 > POP+0    # Is the numeric values of $2 more than the numeric value of POP
||              # OR
POP == ""       # Is POP the empty string (possibly unset)

如果至少有一个是，true那么我们还需要下一个条件

$1+0 > 0        # Is the numeric value of $1 greater than zero ("skip the header")

然后...

{ POP = $2 }    # Assign the numeric value of $2 to POP

然后对每个文件的每一行重复该循环。在最后一个文件结束时，END执行该构造，打印出结果的两个值。

请注意，只有在比较时，循环中的值才会awk转换为数字。在其他所有时候，它们都只是字符串，因此不会损失精度。

你bash可以很容易地将变量分配给这些输出，从而允许将不需要的空格作为副作用丢弃

read pop dg < <(awk ...)

对于大量的文件，例如 glob 扩展失败，标准find方法应该足够了，将文件的内容输入awk到标准输入而不是在命令行上列出它们

find "${storage}" -type f -name 'target_file.csv' -exec cat {} + | awk '...'

Answer 1