使用 sed -e 解析日志文件。需要计算唯一的类名

2024-5-22 • tag-icon

text-processing sed regular-expression

使用 sed -e 解析日志文件。需要计算唯一的类名

我有一个文件，我们称之为 filename.log，其中有类似的内容

(2014-11-18 14:09:21,766), , xxxxxx.local, EventSystem, DEBUG FtpsFile delay secs is 5 [pool-3-thread-7] 
(2014-11-18 14:09:21,781), , xxxxxx.local, EventSystem, DEBUG FtpsFile disconnected from ftp server [pool-3-thread-7] 
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File  Process@serverStatus on exit  - 113 [pool-3-thread-7] 
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File  Process@serverStatus on exit  - 114 [pool-3-thread-7] 
(2014-11-18 14:09:21,799), , xxxxxx.local, EventSystem, DEBUG JobQueue $_Runnable Finally of consume() :: [pool-3-thread-7]

我试图找到产生最频繁的调试消息的类。

在这个例子中你可以看到FTPS文件和作业队列是产生消息的两个类。

我有这个

cat filename.log | sed -n -e 's/^.*\(DEBUG \)/\1/p' | sort | uniq -c | sort -rn | head -10

这将生成班级名称并显示最常见的班级（前 10 名）。

问题是这并没有给我班级的人数FTPS文件为 4。它将每个 FtpsFile 日志文件视为不同的唯一实体。

如何更改上面的命令以基本上说抓取 DEBUG 后的第一个单词并忽略其余的计数？

理想情况下我应该得到 4 FtpsFile 1 JobQueue

答案1

使用 GNU sed：

sed 's/.*DEBUG \(\w*\).*/\1/' | uniq -c
      4 FtpsFile
      1 JobQueue

和grep：

grep -Po 'DEBUG \K\w+' | uniq -c
      4 FtpsFile
      1 JobQueue

和awk：

awk '$6=="DEBUG"{print $7}' | uniq -c
      4 FtpsFile
      1 JobQueue

最后一个可以用 pure 来完成awk，但为了相似，我将其通过管道传输到uniq.

答案2

快速修复 - 我添加了以下剪切命令来挑出该字段：

[host:~]$ cat logfile | cut -d" " -f7 | sort | uniq -c | sort -rn | head -10
      4 FtpsFile
      1 JobQueue

由于我对 KISS 的渴望，这不适用于名称中带有空格的类。

答案3

您可以使用 awk（而不是 sed）来避免查看您感兴趣的字段之前的字段，然后剪切您想要查看的部分：

[hunter@apollo: ~]$ cat filename.log | awk -F, '{ print $6 }' | cut -c 1-15 | uniq -c | sort -rn | head -10
      4  DEBUG FtpsFile
      1  DEBUG JobQueue

（注意：您还排序了两次，这似乎没有必要）

编辑：如果您不知道课程将持续多长时间，您可以添加一个额外的 awk 命令（而不是剪切）：

[hunter@apollo: ~]$ cat filename.log | awk -F, '{ print $6 }' | awk '{ print $1, $2 }' | uniq -c | sort -rn | head -10
      4 DEBUG FtpsFile
      1 DEBUG JobQueue

相关内容