我有一个巨大的日志文件需要过滤。在此日志中,我想显示包含该字符串的所有日志dns
,并且我只想查看它们一次。
即:
Dec 9 07:24:02 94.15.218.140 syslog: ssk:548.049:is_dns_hijack:1451:isDnsHijack=0
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.217:cmsLck_acquireLockWithTimeoutTraced:98:acquired lock. callerFuncName is_dns_hijack; timeout 12000 milliseconds
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.218:cmsLck_releaseLockTraced:144:lock hold time=0ms, acquiring lock callerFuncName is_dns_hijack; releasing lock callerFuncName is_dns_hijack;
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.225:is_dns_hijack:1425:isDnsHijack=0
Dec 9 07:24:17 94.15.218.140 syslog: ssk:563.048:cmsLck_acquireLockWithTimeoutTraced:95:acquired lock. callerFuncName is_dns_hijack; timeout 12000 milliseconds
Dec 9 07:24:17 94.15.218.140 syslog: ssk:563.048:cmsLck_releaseLockTraced:141:lock hold time=0ms, acquiring lock callerFuncName is_dns_hijack; releasing lock callerFuncName is_dns_hijack;
Dec 9 07:24:17 94.15.218.140 syslog: ssk:563.049:is_dns_hijack:1451:isDnsHijack=0
对此:
Dec 9 07:24:02 94.15.218.140 syslog: ssk:548.049:is_dns_hijack:1451:isDnsHijack=0
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.217:cmsLck_acquireLockWithTimeoutTraced:98:acquired lock. callerFuncName is_dns_hijack; timeout 12000 milliseconds
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.218:cmsLck_releaseLockTraced:144:lock hold time=0ms, acquiring lock callerFuncName is_dns_hijack; releasing lock callerFuncName is_dns_hijack;
基本上,由于时间戳不同,同一日志重复了多次。
我试图使用uniq
,但为了做到这一点,我需要删除第三列中的时间戳(可通过 实现 awk '{ $3=""; print }'
),但正如您从日志中看到的,前 11 个字符是不同的(即ssk:563.048
- ssk:563.049
)。我正在考虑对grep
单词执行 a 操作dns
并尝试忽略前 11 个值。
我怎样才能做到这一点?有没有更好的办法?
答案1
使用awk
and:
作为字段分隔符。然后,您可以将每个唯一的错误消息(这将是第 6 个字段)保存在数组中,并仅打印第一次出现的错误消息:
$ awk -F: '!a[$6]++' file
Dec 9 07:24:02 94.15.218.140 syslog: ssk:548.049:is_dns_hijack:1451:isDnsHijack=0
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.218:cmsLck_releaseLockTraced:144:lock hold time=0ms, acquiring lock callerFuncName is_dns_hijack; releasing lock callerFuncName is_dns_hijack;
Dec 9 07:24:10 90.192.172.112 syslog: ssk:363.217:cmsLck_acquireLockWithTimeoutTraced:98:acquired lock. callerFuncName is_dns_hijack; timeout 12000 milliseconds
a
仅当尚未为第 6 个字段 ( ) 保存值时,上面的脚本才会将每一行保存为关联数组中的条目!a[$6]
。因为默认操作awk
是在某些结果为 true 时进行打印,所以这将导致仅打印第一个唯一出现的情况。
如果您只想对匹配的行执行此操作dns
,请使用:
awk -F: '!a[$6]++ && /dns/' file
至于忽略第 11 行,您可以这样做:
grep dns file | tail -n +12