AWK：如何正确显示包含多个单词并用引号引起来的列？

Question 1

您可以使用 awk 的gsub()函数将所有出现的"and "（引号后跟一个空格 AND 空格后跟一个引号）替换为某个任意分隔符，并将 FS 设置为该分隔符并提取您想要的内容。请注意，如果更改 FS，则字段的编号也会更改。您还需要将 FS 重置回其原始值才能正确处理下一个输入行。

在您的情况下，您还想从字段中提取一些数据（日期和时间）前FS改变了。

例如，如果./file包含 5 行，每行都是您提供的示例行的精确副本：

$ grep -i 'logged in' ./file | tail | awk '
{ d=$1;
  t=$2; sub(/\..*/,"",t);

  FS="XXX";
  gsub(/" | "/,"XXX",$0);
  print $2,"logged in at", t, d;
  FS="[[:space:]]+"
}'
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21

我用了XXX作为字段分隔符，因为它不会出现在输入中的任何位置。对于这个例子，制表符也同样有效，但这并不能证明字段分隔符不必是单个字符 - 如果你不能（或不能轻易），这将很重要确定输入中任何地方都没有使用的单个字符。

如果您需要从中提取现场数据，事情会变得更加复杂后双引号字段（例如 IP 地址或 udp 端口字段） - 您无法在之前提取它们，gsub因为您无法确定它们的字段号是什么。我倾向于perl在这一点上使用（或者甚至sed像 @Wildcard 的答案一样），但一种方法awk是扩展gsub函数调用的正则表达式以适应。例如用以下内容替换awk脚本：

$ grep -i 'logged in' ./file | tail | awk '
{   d=$1;
    t=$2;
    sub(/\..*/,"",t);

    FS="XXX";
    gsub(/" | "|address: |, /,"XXX",$0);
    sub(/ .*/,"",$8);      # get rid of trailing junk after udp port

    print $2,"logged in at", t, d, "as" ,$4, "from", $6":"$8;

    FS="[[:space:]]+"
}'

会产生这样的输出：

sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp

perl为了完整起见，以下是使用 perl 核心模块的一种方法Text::ParseWords：

#!/usr/bin/perl

use strict;
use Text::ParseWords;

my $keep=1;  # keep " chars in output.  set to 0 to strip them.

while(<>) {
  my @F = quotewords('\s+', $keep, $_);

  $F[1] =~ s/\..*//;  # strip decimal fraction from time field
  $F[10] =~ s/,//;    # strip trailing comma from IP address field

  # remember: perl array indices start at zero, not one.
  printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13];
}

它使用quotewords()函数 fromText::Parsewords将每个输入行拆分为字段（存储在名为的数组中@F），对某些字段进行一些小的清理，然后使用打印所需的字段printf。

作为一句单行话，它会写成：

grep -i 'logged in' ./file | tail | perl -MText::ParseWords -n -e '
  @F = quotewords(q/\s+/, 1, $_);
  $F[1] =~ s/\..*//;
  $F[10] =~ s/,//;
  printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13]'

请注意我是如何更改'/s+'为q/\s+/- perl 有一些很棒的引用运算符这可以用来避免单引号内的单引号问题。

Answer