如何使用 awk 或类似工具来拆分不规则的空白字段?

如何使用 awk 或类似工具来拆分不规则的空白字段?

我想要按如下方式拆分数据:

$1       $2           $3               $4  $5  $6 $7       $8   .........
---------------------------------------------------------------------------------------------------
root     tty5                          Wed Dec 18 13:42:28 2019   still logged in             
                      ~~~~~~~~~~~~~    ~~~
                      ^                ^
root     tty5                          Wed Dec 18 11:23:20 2019 - Wed Dec 18 11:24:47 2019  (00:01)    
john     pts/2        xx.xxx.xx.xxx    Tue Sep  3 10:11:31 2019 - Tue Sep  3 10:21:18 2019  (00:09)    
john     pts/3        xx.xxx.xx.xxx    Mon Sep  2 14:42:29 2019 - Mon Sep  2 14:57:33 2019  (00:15)    
john     pts/2        xx.xxx.xx.xxx    Mon Sep  2 14:40:03 2019 - Mon Sep  2 14:45:27 2019  (00:05)    
john     pts/2        xx.xxx.xx.xxx    Mon Sep  2 13:52:09 2019 - Mon Sep  2 14:34:12 2019  (00:42)    
john     pts/3        xx.xxx.xx.xxx    Mon Sep  2 13:14:39 2019 - Mon Sep  2 14:03:24 2019  (00:48)    
john     pts/2        xx.xxx.xx.xxx    Mon Sep  2 13:08:11 2019 - Mon Sep  2 13:23:16 2019  (00:15)    
john     pts/2        xx.xxx.xx.xxx    Mon Sep  2 10:22:27 2019 - Mon Sep  2 11:10:48 2019  (00:48)    
john     pts/2        xx.xxx.xx.xxx    Fri Aug 30 17:25:19 2019 - Fri Aug 30 17:33:34 2019  (00:08)    
john     pts/2        xx.xxx.xx.xxx    Wed Aug 28 10:43:56 2019 - Wed Aug 28 10:52:48 2019  (00:08)    
john     pts/2        xx.xxx.xx.xxx    Tue Aug 27 16:59:30 2019 - Tue Aug 27 17:52:50 2019  (00:53)    
john     pts/2        xx.xxx.xx.xxx    Tue Aug  6 11:06:46 2019 - Tue Aug  6 11:12:05 2019  (00:05)    
john     pts/2        xx.xxx.xx.xxx    Tue Aug  6 10:48:39 2019 - Tue Aug  6 11:01:46 2019  (00:13)    
john     pts/2        xx.xxx.xx.xxx    Tue Aug  6 10:38:18 2019 - Tue Aug  6 10:43:18 2019  (00:05)    
john     pts/2        xx.xxx.xx.xxx    Tue Aug  6 10:28:02 2019 - Tue Aug  6 10:36:04 2019  (00:08)    
john     pts/2        xx.xxx.xx.xxx    Fri Aug  2 14:24:00 2019 - Fri Aug  2 14:24:16 2019  (00:00)    
root     tty5                          Fri Aug  2 14:21:30 2019 - Fri Nov 22 11:03:20 2019 (111+20:41) 
root     tty5                          Fri Jul 26 11:02:17 2019 - Fri Jul 26 11:03:58 2019  (00:01)    
john     pts/3        xx.xxx.xx.xxx    Thu Jul 25 16:24:44 2019 - Thu Jul 25 16:33:36 2019  (00:08)    
john     pts/2        xx.xxx.xx.xxx    Thu Jul 25 16:08:41 2019 - Thu Jul 25 16:33:53 2019  (00:25)   

但是,如果 $3 为空,我就无法正确获取 $3 和下一个字段的值。例如:

$ last -F | grep -E 'tty|pty|pts' | awk '{print $3}'
Wed                 <- not correct
Wed                 <- not correct
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
xx.xxx.xx.xxx
Fri                 <- not correct
Fri                 <- not correct
xx.xxx.xx.xxx
xx.xxx.xx.xxx

如何使用 awk 或类似的命令行工具正确解析它?

答案1

在这个特殊的情况下,让我们使用这个过滤器:

awk '{if ($4 !~ /^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)$/) $3="- "$3; print}'

如果一切正常,则$4MonTueWed或…空则$3包含$4JanFebMar

我们检测到了这一点。如果结果与$4我们的预期不符,我们会注入一个额外的字段来$3移动字段。改变后的输出不再是列式的,它看起来像这样(片段):

root tty5 - Wed Dec 18 13:42:28 2019 still logged in
root tty5 - Wed Dec 18 11:23:20 2019 - Wed Dec 18 11:24:47 2019 (00:01)
john     pts/2        xx.xxx.xx.xxx    Tue Sep  3 10:11:31 2019 - Tue Sep  3 10:21:18 2019  (00:09)
john     pts/3        xx.xxx.xx.xxx    Mon Sep  2 14:42:29 2019 - Mon Sep  2 14:57:33 2019  (00:15)

为了验证其有效性,我们可以将结果传输至column -t

root  tty5   -              Wed  Dec  18  13:42:28  2019  still  logged  in
root  tty5   -              Wed  Dec  18  11:23:20  2019  -      Wed     Dec  18  11:24:47  2019  (00:01)
john  pts/2  xx.xxx.xx.xxx  Tue  Sep  3   10:11:31  2019  -      Tue     Sep  3   10:21:18  2019  (00:09)
john  pts/3  xx.xxx.xx.xxx  Mon  Sep  2   14:42:29  2019  -      Mon     Sep  2   14:57:33  2019  (00:15)

但您不需要column进一步可靠地解析。

笔记:

  • last使用了英文缩写;我的也是,尽管我的 Kubuntu 大部分内容都已本地化。我不知道是否有本地化版本last,但如果有,可以通过指定C语言环境强制它使用英语:

    LC_ALL=C last -F | …
    
  • 有一列包含可能的值still-。这样您就可以轻松检测到still logged in

相关内容