我可以更改 awk 脚本的每一行的 FIELDWIDTHS 吗？

Question 1

与任何awk:

awk 'NR%2   { fieldwidths="4 6 5 4 9 2 2 13" } # update fieldwidths on odd line numbers
    !(NR%2) { fieldwidths="4 5 4 2 3 9 7 11" } # update fieldwidths on even line numbers
    # condition { fieldwidths="# # #  ..." }   # whatever other condition you want...

{ fields=split(fieldwidths, fldwd); startPos=1;
  for(i=1; i<=fields; i++) {
      printf "%s", (i==1?"": OFS) substr($0, startPos, fldwd[i])
      startPos+=fldwd[i]
  }
  print ""
}' infile

Answer

与任何awk:

awk 'NR%2   { fieldwidths="4 6 5 4 9 2 2 13" } # update fieldwidths on odd line numbers
    !(NR%2) { fieldwidths="4 5 4 2 3 9 7 11" } # update fieldwidths on even line numbers
    # condition { fieldwidths="# # #  ..." }   # whatever other condition you want...

{ fields=split(fieldwidths, fldwd); startPos=1;
  for(i=1; i<=fields; i++) {
      printf "%s", (i==1?"": OFS) substr($0, startPos, fldwd[i])
      startPos+=fldwd[i]
  }
  print ""
}' infile

Question 2

我认为这样的东西（使用 GNU awk for FIELDWIDTHS）就是你想要的：

BEGIN {
    type2fw[10] = "7 3 6 8 9"
    type2fw[12] = "5 5 5 5 5 5 5 5 5 5 5"
    type2fw[53] = "1 1 1 17 29 31"
    ....
}
{
    FIELDWIDTHS = type2fw[substr($0,31,2)]
    $0 = $0
    do whatever you like with the fields
}

但这有点低效，因为它进行了两次字段分割（一次是在读取记录时，第二次是在执行 $0=$0 时）。您可以通过仅在类型更改时重新拆分来提高效率：

BEGIN {
    type2fw[10] = "7 3 6 8 9"
    type2fw[12] = "5 5 5 5 5 5 5 5 5 5 5"
    type2fw[53] = "1 1 1 17 29 31"
    ....
}
{ type = substr($0,31,2) }
type != prev {
    FIELDWIDTHS = type2fw[type]
    $0 = $0
    prev = type
}
{
    do whatever you like with the fields
}

您可以首先按第 31/32 个字符类型字段（例如sort -k1.31,1.32 file | awk '...'）对输入进行排序，这样您只需为每种类型更改一次 FIELDWIDTHS 。

如果没有看到简洁的、可测试的多行/类型示例输入和预期输出，我不能比这更具体，这甚至可能是错误的方法，使用match($0,/(foo)(bar)(etc)/,a)或其他方法会更好。

Answer

我认为这样的东西（使用 GNU awk for FIELDWIDTHS）就是你想要的：

BEGIN {
    type2fw[10] = "7 3 6 8 9"
    type2fw[12] = "5 5 5 5 5 5 5 5 5 5 5"
    type2fw[53] = "1 1 1 17 29 31"
    ....
}
{
    FIELDWIDTHS = type2fw[substr($0,31,2)]
    $0 = $0
    do whatever you like with the fields
}

但这有点低效，因为它进行了两次字段分割（一次是在读取记录时，第二次是在执行 $0=$0 时）。您可以通过仅在类型更改时重新拆分来提高效率：

BEGIN {
    type2fw[10] = "7 3 6 8 9"
    type2fw[12] = "5 5 5 5 5 5 5 5 5 5 5"
    type2fw[53] = "1 1 1 17 29 31"
    ....
}
{ type = substr($0,31,2) }
type != prev {
    FIELDWIDTHS = type2fw[type]
    $0 = $0
    prev = type
}
{
    do whatever you like with the fields
}

您可以首先按第 31/32 个字符类型字段（例如sort -k1.31,1.32 file | awk '...'）对输入进行排序，这样您只需为每种类型更改一次 FIELDWIDTHS 。

如果没有看到简洁的、可测试的多行/类型示例输入和预期输出，我不能比这更具体，这甚至可能是错误的方法，使用match($0,/(foo)(bar)(etc)/,a)或其他方法会更好。

Question 3

使用 gnu awk，您可以通过分配来重新解析当前行$0 = $0。例如，

echo '1 abcdefghij
2   abcdefghij' |
awk '
/^1/{ FIELDWIDTHS = "1 1 5 5"; $0 = $0; print $3; next }
/^2/{ FIELDWIDTHS = "1 3 3 3"; $0 = $0; print $3; next }
'

或者，您可以考虑一种更像 Unix 的解决方案，通过一个处理一种字段格式的 awk 传输数据，例如使用前缀字符标记处理过的行#，然后将结果传输到第二个 awk 中。例如，

awk -v FIELDWIDTHS="1 1 5 5" '
/^1/{ print "#" $3; next }
    { print }
' |
awk -v FIELDWIDTHS="1 3 3 3" '
/^2/{ print $3; next }
/^#/{ print substr($0,2); next }
    { print }
'

Answer

使用 gnu awk，您可以通过分配来重新解析当前行$0 = $0。例如，

echo '1 abcdefghij
2   abcdefghij' |
awk '
/^1/{ FIELDWIDTHS = "1 1 5 5"; $0 = $0; print $3; next }
/^2/{ FIELDWIDTHS = "1 3 3 3"; $0 = $0; print $3; next }
'

或者，您可以考虑一种更像 Unix 的解决方案，通过一个处理一种字段格式的 awk 传输数据，例如使用前缀字符标记处理过的行#，然后将结果传输到第二个 awk 中。例如，

awk -v FIELDWIDTHS="1 1 5 5" '
/^1/{ print "#" $3; next }
    { print }
' |
awk -v FIELDWIDTHS="1 3 3 3" '
/^2/{ print $3; next }
/^#/{ print substr($0,2); next }
    { print }
'

Question 4

另一种方法使用GNU sed此处显示了/e该命令的修饰符。s///

一般想法是将空格分隔的字段宽度列表存储在名称为当前输入记录的 31-32 个字符的文件中。这取消PK向函数提供保存与当前记录相关的字段宽度的文件名。然后，它生成 sed 代码，根据这些宽度对当前记录进行切片。

#--- edit this function to add the fieldwidths corresponding to
#--- the 2 characters in the 31st/32nd
#--- positions of the input record
_init_() {
  [ -s "$1" ] && return
  case $1 in
    */12) echo '4 6 5 4 9 2 2 13' ;;
    */96) echo '5 5 5 6 7 2 2 13' ;;
  esac > "$1"
}

_unpk_() {
  _init_ "$1"
< "$1" tr -s ' \t' '[\n*]' |
sed -Ee '
  1i\
$!d;H;z;x
  s|.*|s/\\n.{&}/\&\\n/|
  s|$|;s/\\n/ /|
  $a\
s/^.|.$//g
'
}

export -f _init_ _unpk_
tmpdir=$(mktemp -d)

sed -Ee "w $tmpdir/h
  s:.{30}(..).*:_unpk_ '$tmpdir/\\1' | sed -Ef - '$tmpdir/h':e
" file

Answer

另一种方法使用GNU sed此处显示了/e该命令的修饰符。s///

一般想法是将空格分隔的字段宽度列表存储在名称为当前输入记录的 31-32 个字符的文件中。这取消PK向函数提供保存与当前记录相关的字段宽度的文件名。然后，它生成 sed 代码，根据这些宽度对当前记录进行切片。

#--- edit this function to add the fieldwidths corresponding to
#--- the 2 characters in the 31st/32nd
#--- positions of the input record
_init_() {
  [ -s "$1" ] && return
  case $1 in
    */12) echo '4 6 5 4 9 2 2 13' ;;
    */96) echo '5 5 5 6 7 2 2 13' ;;
  esac > "$1"
}

_unpk_() {
  _init_ "$1"
< "$1" tr -s ' \t' '[\n*]' |
sed -Ee '
  1i\
$!d;H;z;x
  s|.*|s/\\n.{&}/\&\\n/|
  s|$|;s/\\n/ /|
  $a\
s/^.|.$//g
'
}

export -f _init_ _unpk_
tmpdir=$(mktemp -d)

sed -Ee "w $tmpdir/h
  s:.{30}(..).*:_unpk_ '$tmpdir/\\1' | sed -Ef - '$tmpdir/h':e
" file

我可以更改 awk 脚本的每一行的 FIELDWIDTHS 吗？

答案1

答案2

答案3

答案4

相关内容