awk 脚本中的 if 条件

Question 1

以下假设当你说any of the size/date/repo-name/repo-path has no value你的意思时。例如，存在，而不是某些块中根本repo-name=没有线。repo-name=

以下是如何使用任何 awk 真正完成您想要做的事情，然后column设置最终的列间距：

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN { OFS="\t" }
{
    sub(/^@/,"")                  # instead of `| tr -d @`
    ++numTags
    tag = val = $0
    sub(/ *=.*/,"",tag)
    sub(/[^=]+= */,"",val)
    tags[numTags] = tag
    vals[numTags] = val
}
numTags == 4 {
    if ( !doneHdr++ ) {
        for ( i=1; i<=numTags; i++ ) {
            tag = ( tags[i] == "date" ? "creationTime" : tags[i] )  # instead of `| sed s/date/creationTime/`
            printf "%s%s", tag, (i<numTags ? OFS : ORS)
        }
    }
    vals[3] = substr(vals[3],1,10)     # instead of `| awk {$3=substr($3,0,10}1`
    for ( i=1; i<=numTags; i++ ) {
        val = ( vals[i] == "" ? 0 : vals[i] )
        printf "%s%s", val, (i<numTags ? OFS : ORS)
    }
    numTags = 0
}
' "${@:--}" |
column -s$'\t' -t

$ cat file
size=190000
date=1603278566981
repo-name=testupload
repo-path=
size=140000
date=1603278566981
repo-name=
repo-path=/home/test/testupload2
size=
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ ./tst.sh file
size    creationTime   repo-name    repo-path
190000  1603278566981  testupload   0
140000  1603278566981  0            /home/test/testupload2
0       1603278566981  testupload3  /home/test/testupload3

对现有代码的更改：

awk不再需要立即将整个文件读入内存。我怀疑column必须这样做才能找出间距。如果您没有，column那么 awk 必须将所有输入读取到内存中，因为我们在输出之前使用 2 遍方法来计算出每列中字段的最大长度printf以及这些最大字段宽度。
它不再依赖于数据中的值（除了我在当前使用 sed 管道执行的标题行中添加了dateto 的映射creationTime），它只依赖于一次有 4 行数据。如果更有用的话，可以轻松更改为触发点击特定的标签行，例如只需更改numTags == 4为tag == "repo-path".
它不再通过管道将列标题sed更改为，因为除了不需要额外的管道和命令之外，如果您的输入在任何地方包含字符串，那么这会中断，例如datecreatingTimedaterepo-path=/home/date/uploadX
它不再用作=FS 值，因为如果您的任何输入包含=, ，例如repo-path=/home/foo=bar/uploadX，这样做将会失败
如果您想@从数据中删除所有 s，则执行此操作的方法是使用gsub(/@/,"")，而不是将输出通过管道传输到tr -d @，但我认为您实际上只想对标头名称（标签）执行此操作，否则如果有任何一个，它就会中断您的数据包含@s，例如repo-path=/home/foo@bar/uploadX，所以我只包含从标签开头sub(/^@/,"")删除s 。@
如果您想将第三个字段修剪为 10 个字符，则方法是在substr(vals[3],1,10)正在打印的循环之前vals[]，而不是向第二个 awk 脚本添加管道，所以我将其包括在内。顺便说一句，第二个参数从argsubstr()开始1，而不是0.

Answer

以下假设当你说any of the size/date/repo-name/repo-path has no value你的意思时。例如，存在，而不是某些块中根本repo-name=没有线。repo-name=

以下是如何使用任何 awk 真正完成您想要做的事情，然后column设置最终的列间距：

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN { OFS="\t" }
{
    sub(/^@/,"")                  # instead of `| tr -d @`
    ++numTags
    tag = val = $0
    sub(/ *=.*/,"",tag)
    sub(/[^=]+= */,"",val)
    tags[numTags] = tag
    vals[numTags] = val
}
numTags == 4 {
    if ( !doneHdr++ ) {
        for ( i=1; i<=numTags; i++ ) {
            tag = ( tags[i] == "date" ? "creationTime" : tags[i] )  # instead of `| sed s/date/creationTime/`
            printf "%s%s", tag, (i<numTags ? OFS : ORS)
        }
    }
    vals[3] = substr(vals[3],1,10)     # instead of `| awk {$3=substr($3,0,10}1`
    for ( i=1; i<=numTags; i++ ) {
        val = ( vals[i] == "" ? 0 : vals[i] )
        printf "%s%s", val, (i<numTags ? OFS : ORS)
    }
    numTags = 0
}
' "${@:--}" |
column -s$'\t' -t

$ cat file
size=190000
date=1603278566981
repo-name=testupload
repo-path=
size=140000
date=1603278566981
repo-name=
repo-path=/home/test/testupload2
size=
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ ./tst.sh file
size    creationTime   repo-name    repo-path
190000  1603278566981  testupload   0
140000  1603278566981  0            /home/test/testupload2
0       1603278566981  testupload3  /home/test/testupload3

对现有代码的更改：

awk不再需要立即将整个文件读入内存。我怀疑column必须这样做才能找出间距。如果您没有，column那么 awk 必须将所有输入读取到内存中，因为我们在输出之前使用 2 遍方法来计算出每列中字段的最大长度printf以及这些最大字段宽度。
它不再依赖于数据中的值（除了我在当前使用 sed 管道执行的标题行中添加了dateto 的映射creationTime），它只依赖于一次有 4 行数据。如果更有用的话，可以轻松更改为触发点击特定的标签行，例如只需更改numTags == 4为tag == "repo-path".
它不再通过管道将列标题sed更改为，因为除了不需要额外的管道和命令之外，如果您的输入在任何地方包含字符串，那么这会中断，例如datecreatingTimedaterepo-path=/home/date/uploadX
它不再用作=FS 值，因为如果您的任何输入包含=, ，例如repo-path=/home/foo=bar/uploadX，这样做将会失败
如果您想@从数据中删除所有 s，则执行此操作的方法是使用gsub(/@/,"")，而不是将输出通过管道传输到tr -d @，但我认为您实际上只想对标头名称（标签）执行此操作，否则如果有任何一个，它就会中断您的数据包含@s，例如repo-path=/home/foo@bar/uploadX，所以我只包含从标签开头sub(/^@/,"")删除s 。@
如果您想将第三个字段修剪为 10 个字符，则方法是在substr(vals[3],1,10)正在打印的循环之前vals[]，而不是向第二个 awk 脚本添加管道，所以我将其包括在内。顺便说一句，第二个参数从argsubstr()开始1，而不是0.

Question 2

如果最后一个字段为空，您可以使用以下命令将其设置为零

if ($NF == "") $NF = 0

所以你会得到类似的东西

/^@repo-name/ {
  if (++count2 == 1) header = header OFS $1 ","
  if ($NF == "") $NF = 0

  repoNameArr[count] = $NF
  next
}

或者，为了避免重复代码，

$NF == "" { $NF = 0 }

# ...

/^@repo-name/ {
  if (++count2 == 1) header = header OFS $1 ","
  repoNameArr[count] = $NF
  next
}

（请注意，您的数据中没有任何行匹配^@repo-name。）

在这种情况下，我可能会采用更简单的方法。假设每条记录始终为四行，我们可以使用以下方法将数据重新排列为四个制表符分隔的列paste：

$ cat file
size=
date=1603278566981
repo-name=testupload
repo-path=/home/test/testupload
size=140000
date=
repo-name=testupload2
repo-path=/home/test/testupload2
size=170000
date=1603278566981
repo-name=
repo-path=/home/test/testupload3
size=170000
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ paste - - - - <file
size=   date=1603278566981      repo-name=testupload    repo-path=/home/test/testupload
size=140000     date=   repo-name=testupload2   repo-path=/home/test/testupload2
size=170000     date=1603278566981      repo-name=      repo-path=/home/test/testupload3
size=170000     date=1603278566981      repo-name=testupload3   repo-path=/home/test/testupload3

然后可以使用以下方法将其转换为 CSVmlr(磨坊主）：

$ paste - - - - <file | mlr --ifs tab --ocsv cat
size,date,repo-name,repo-path
,1603278566981,testupload,/home/test/testupload
140000,,testupload2,/home/test/testupload2
170000,1603278566981,,/home/test/testupload3
170000,1603278566981,testupload3,/home/test/testupload3

我们还可以用mlr零替换任何缺失值：

$ paste - - - - <file | mlr --ifs tab --ocsv put 'for (k,v in $*) { is_null(v) { $[k] = 0 } }'
size,date,repo-name,repo-path
0,1603278566981,testupload,/home/test/testupload
140000,0,testupload2,/home/test/testupload2
170000,1603278566981,0,/home/test/testupload3
170000,1603278566981,testupload3,/home/test/testupload3

您是否希望使用制表符分隔值 (TSV) 代替 CSV，然后--otsv使用--ocsv.您可以使用--opprint、或 JSON--ojson或任何您需要的方式获得“打印精美”的表格输出。

请注意，上面假设输入数据与问题中的数据类似。如果问题中的数据是结构化数据格式（例如 XML 或 JSON）中某些数据的处理变体，那么直接使用原始数据会更好。

Answer