我有一些复杂的日志,需要按分钟和值分组。下面是一些示例日志:
2019-08-09T19:01:53:594+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:01:53.594Z","responseTime":4}
2019-08-09T19:01:53:673+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test2","logTime":"2019-08-09T12:01:53.673Z","responseTime":4}
2019-08-09T19:14:03:773+07:00 - info: {"tag":"request /validate","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"error internal"},"metadata":"test3","logTime":"2019-08-09T12:14:03.773Z","responseTime":7}
2019-08-09T19:19:32:925+07:00 - info: {"tag":"request /validate","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"error internal"},"metadata":"test4","logTime":"2019-08-09T12:19:32.925Z","responseTime":8}
我的期望如下:
19:01 errMessage : connect ECONNREFUSED 127.0.0.1:7000 10
19:02 errMessage : error internal 20
19:03 errMessage : error internal 10
著名的 :
19:01 = hour minutes
errMessage : error internal = value
20 = count message err
我已经尝试过下面的 awk 但仍然没有完成分组结果
cat file.log | strings | grep "errMessage" | awk -F'[{,]' '{print $1,$3,$4,$5,$8}' | awk -F'[-,"]' '{print $3,$11,$12,$13,$15,$16,$17}'
你能帮我找到结果如何按时间戳和计数值对结果进行分组吗?
谢谢
答案1
由于问题中给出的数据有点太稀疏,所以我对其进行了扩展。这使我们能够更好地按分钟和错误消息以及计数来演示/验证分组逻辑。
2019-08-09T19:02:00:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:02:00.000Z","responseTime":4}
2019-08-09T19:02:03:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test1","logTime":"2019-08-09T12:02:00.000Z","responseTime":4}
2019-08-09T19:02:10:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test2","logTime":"2019-08-09T12:02:10.000Z","responseTime":4}
2019-08-09T19:02:15:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"TypeError: Cannot read property 'name' of undefined"},"metadata":"test2","logTime":"2019-08-09T12:02:10.000Z","responseTime":10}
2019-08-09T19:02:20:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test3","logTime":"2019-08-09T12:02:20.000Z","responseTime":4}
2019-08-09T19:02:25:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test3","logTime":"2019-08-09T12:02:20.000Z","responseTime":4}
2019-08-09T19:02:30:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test4","logTime":"2019-08-09T12:02:30.000Z","responseTime":4}
2019-08-09T19:02:35:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"ReferenceError: foo is not defined"},"metadata":"test4","logTime":"2019-08-09T12:02:30.000Z","responseTime":20}
2019-08-09T19:02:40:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test5","logTime":"2019-08-09T12:02:40.000Z","responseTime":4}
2019-08-09T19:02:45:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test5","logTime":"2019-08-09T12:02:40.000Z","responseTime":4}
2019-08-09T19:02:50:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test6","logTime":"2019-08-09T12:02:50.000Z","responseTime":4}
2019-08-09T19:02:55:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test6","logTime":"2019-08-09T12:02:50.000Z","responseTime":4}
2019-08-09T19:03:00:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test7","logTime":"2019-08-09T12:03:00.000Z","responseTime":4}
2019-08-09T19:03:05:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test7","logTime":"2019-08-09T12:03:00.000Z","responseTime":4}
2019-08-09T19:03:10:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test8","logTime":"2019-08-09T12:03:10.000Z","responseTime":4}
2019-08-09T19:03:15:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"SyntaxError: Unexpected token ':'"},"metadata":"test8","logTime":"2019-08-09T12:03:10.000Z","responseTime":15}
2019-08-09T19:03:20:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test9","logTime":"2019-08-09T12:03:20.000Z","responseTime":4}
2019-08-09T19:03:25:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test9","logTime":"2019-08-09T12:03:20.000Z","responseTime":4}
2019-08-09T19:03:30:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test10","logTime":"2019-08-09T12:03:30.000Z","responseTime":4}
2019-08-09T19:03:35:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Internal Server Error","data":{},"errMessage":"TypeError: Cannot read property 'length' of undefined"},"metadata":"test10","logTime":"2019-08-09T12:03:30.000Z","responseTime":25}
2019-08-09T19:03:45:000+07:00 - info: {"tag":"request /browse","tagName":{"status":"05","message":"Unrecognise Request","data":{},"errMessage":"connect ECONNREFUSED 127.0.0.1:7000"},"metadata":"test11","logTime":"2019-08-09T12:03:40.000Z","responseTime":4}
至此,我的TXR 口齿不清程序产生这样的结果:
$ txr group.tl < xlog
19:02 errMessage : connect ECONNREFUSED 127.0.0.1:7000 10
19:02 errMessage : ReferenceError: foo is not defined 1
19:02 errMessage : TypeError: Cannot read property 'name' of undefined 1
19:03 errMessage : connect ECONNREFUSED 127.0.0.1:7000 7
19:03 errMessage : SyntaxError: Unexpected token ':' 1
19:03 errMessage : TypeError: Cannot read property 'length' of undefined 1
中的程序group.tl
是:
(defstruct log ()
hour minute
err-message)
(build
(let (minute-batch cur-hh cur-mm)
(flet ((flush ()
(let ((by-message-hash (group-by .err-message (get))))
(oust)
(dohash (msg group by-message-hash)
(let ((lead (first group)))
(put-line `@{lead.hour}:@{lead.minute} errMessage : @msg @(len group)`))))))
(whilet ((line (get-line)))
(when-match `@{nil}T@hh:@mm:@nil - info: @json` line
(let ((jobj (get-json json)))
(if (or (nequal hh cur-hh)
(nequal mm cur-mm))
(flush))
(set cur-hh hh cur-mm mm)
(add (new log
hour hh minute mm
err-message [[jobj "tagName"] "errMessage"])))))
(flush))))
该计划由两个顶级形式组成:
- a
defstruct
定义了log
保存有关时间和错误消息的信息的结构。 build
包含所有逻辑的表达式。
该build
宏包含一个用于列表的程序构造和操作的环境。它提供了许多本地运营商。在这里,我们使用其中的三个:add
将一个项目添加到隐式列表中。get
检索列表。oust
将列表替换为另一个列表(如果未给出参数,则默认为空)。
该策略是扫描数据并将每一行转换为一个log
对象。每当小时或分钟发生变化时,我们都会调用本地flush
函数来处理累积的组。flush
用于(get)
检索累积对象的列表log
,并(oust)
清除该列表。flush
当我们用完输入时,我们还会再次调用以取出最后一批。
该flush
函数按错误消息对对象进行分组以形成哈希表,然后转储信息:对于每条错误消息,它都会打印时间、错误消息以及共享该错误消息的组的大小。
解析条目是使用模式匹配完成的。此外,我们将 JSON 部分提取为一个单元,并使用 TXR Lisp 中的 JSON 解析器。生成 JSON 对象作为哈希表;我们走过两层桌子到达errMessage
。