如何使用 awk 对时间戳和值进行分组

如何使用 awk 对时间戳和值进行分组

我有一些复杂的日志,需要按分钟和值分组。下面是一些示例日志:

2019-08-09T19:01:53:594+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:01:53.594Z","responseTime":4}
2019-08-09T19:01:53:673+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test2","logTime":"2019-08-09T12:01:53.673Z","responseTime":4}
2019-08-09T19:14:03:773+07:00 - info: {"tag":"request /validate","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"error internal"},"metadata":"test3","logTime":"2019-08-09T12:14:03.773Z","responseTime":7}
2019-08-09T19:19:32:925+07:00 - info: {"tag":"request /validate","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"error internal"},"metadata":"test4","logTime":"2019-08-09T12:19:32.925Z","responseTime":8}

我的期望如下:

19:01  errMessage : connect ECONNREFUSED 127.0.0.1:7000  10
19:02  errMessage : error internal  20
19:03  errMessage : error internal  10

著名的 :

19:01 = hour minutes
errMessage : error internal = value
20 = count message err

我已经尝试过下面的 awk 但仍然没有完成分组结果

cat file.log | strings | grep "errMessage" | awk -F'[{,]' '{print $1,$3,$4,$5,$8}' | awk -F'[-,"]' '{print $3,$11,$12,$13,$15,$16,$17}' 

你能帮我找到结果如何按时间戳和计数值对结果进行分组吗?

谢谢

答案1

由于问题中给出的数据有点太稀疏,所以我对其进行了扩展。这使我们能够更好地按分钟和错误消息以及计数来演示/验证分组逻辑。

2019-08-09T19:02:00:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:02:00.000Z","responseTime":4}
2019-08-09T19:02:03:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:02:00.000Z","responseTime":4}
2019-08-09T19:02:10:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test2","logTime":"2019-08-09T12:02:10.000Z","responseTime":4}
2019-08-09T19:02:15:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"TypeError: Cannot read property 'name' of undefined"},"metadata":"test2","logTime":"2019-08-09T12:02:10.000Z","responseTime":10}
2019-08-09T19:02:20:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test3","logTime":"2019-08-09T12:02:20.000Z","responseTime":4}
2019-08-09T19:02:25:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test3","logTime":"2019-08-09T12:02:20.000Z","responseTime":4}
2019-08-09T19:02:30:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test4","logTime":"2019-08-09T12:02:30.000Z","responseTime":4}
2019-08-09T19:02:35:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"ReferenceError: foo is not defined"},"metadata":"test4","logTime":"2019-08-09T12:02:30.000Z","responseTime":20}
2019-08-09T19:02:40:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test5","logTime":"2019-08-09T12:02:40.000Z","responseTime":4}
2019-08-09T19:02:45:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test5","logTime":"2019-08-09T12:02:40.000Z","responseTime":4}
2019-08-09T19:02:50:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test6","logTime":"2019-08-09T12:02:50.000Z","responseTime":4}
2019-08-09T19:02:55:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test6","logTime":"2019-08-09T12:02:50.000Z","responseTime":4}
2019-08-09T19:03:00:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test7","logTime":"2019-08-09T12:03:00.000Z","responseTime":4}
2019-08-09T19:03:05:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test7","logTime":"2019-08-09T12:03:00.000Z","responseTime":4}
2019-08-09T19:03:10:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test8","logTime":"2019-08-09T12:03:10.000Z","responseTime":4}
2019-08-09T19:03:15:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"SyntaxError: Unexpected token ':'"},"metadata":"test8","logTime":"2019-08-09T12:03:10.000Z","responseTime":15}
2019-08-09T19:03:20:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test9","logTime":"2019-08-09T12:03:20.000Z","responseTime":4}
2019-08-09T19:03:25:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test9","logTime":"2019-08-09T12:03:20.000Z","responseTime":4}
2019-08-09T19:03:30:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test10","logTime":"2019-08-09T12:03:30.000Z","responseTime":4}
2019-08-09T19:03:35:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"TypeError: Cannot read property 'length' of undefined"},"metadata":"test10","logTime":"2019-08-09T12:03:30.000Z","responseTime":25}
2019-08-09T19:03:45:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test11","logTime":"2019-08-09T12:03:40.000Z","responseTime":4}

至此,我的TXR 口齿不清程序产生这样的结果:

$ txr group.tl  < xlog
19:02  errMessage : connect ECONNREFUSED 127.0.0.1:7000  10
19:02  errMessage : ReferenceError: foo is not defined  1
19:02  errMessage : TypeError: Cannot read property 'name' of undefined  1
19:03  errMessage : connect ECONNREFUSED 127.0.0.1:7000  7
19:03  errMessage : SyntaxError: Unexpected token ':'  1
19:03  errMessage : TypeError: Cannot read property 'length' of undefined  1

中的程序group.tl是:

(defstruct log ()
  hour minute
  err-message)

(build
  (let (minute-batch cur-hh cur-mm)
    (flet ((flush ()
             (let ((by-message-hash (group-by .err-message (get))))
               (oust)
               (dohash (msg group by-message-hash)
                 (let ((lead (first group)))
                   (put-line `@{lead.hour}:@{lead.minute}  errMessage : @msg  @(len group)`))))))
      (whilet ((line (get-line)))
        (when-match `@{nil}T@hh:@mm:@nil - info: @json` line
          (let ((jobj (get-json json)))
            (if (or (nequal hh cur-hh)
                    (nequal mm cur-mm))
              (flush))
            (set cur-hh hh cur-mm mm)
            (add (new log
                      hour hh minute mm
                      err-message [[jobj "tagName"] "errMessage"])))))
      (flush))))

该计划由两个顶级形式组成:

  • adefstruct定义了log保存有关时间和错误消息的信息的结构。
  • build包含所有逻辑的表达式。

build宏包含一个用于列表的程序构造和操作的环境。它提供了许多本地运营商。在这里,我们使用其中的三个:add将一个项目添加到隐式列表中。get检索列表。oust将列表替换为另一个列表(如果未给出参数,则默认为空)。

该策略是扫描数据并将每一行转换为一个log对象。每当小时或分钟发生变化时,我们都会调用本地flush函数来处理累积的组。flush用于(get)检索累积对象的列表log,并(oust)清除该列表。flush当我们用完输入时,我们还会再次调用以取出最后一批。

flush函数按错误消息对对象进行分组以形成哈希表,然后转储信息:对于每条错误消息,它都会打印时间、错误消息以及共享该错误消息的组的大小。

解析条目是使用模式匹配完成的。此外,我们将 JSON 部分提取为一个单元,并使用 TXR Lisp 中的 JSON 解析器。生成 JSON 对象作为哈希表;我们走过两层桌子到达errMessage

相关内容