我获取了来自以下来源的 Nginx 日志条目http://researchscan1.eecs.berkeley.edu/(以及其他),请求中有很多特殊字符,我正试图过滤掉它们。例如:
2016/07/19 09:54:49 [error] 2006#2006: *5878 testing "//http" existence failed (2: No such file or directory) while logging request, client: 169.229.3.91, server: common.example.co.uk, request: "J/¤nkb=© 2]rµÐ[‘lç¢î/€@I"-
2016/07/19 11:29:05 [error] 2007#2007: *5945 testing "//http" existence failed (2: No such file or directory) while logging request, client: 169.229.3.91, server: common.example.co.uk, request: "i•jœ»@d‹˜þˆ¿–j•c|B‹¤¯Dñ½°|ôáV*Õ8ÓãÎð€í)ÑYCæôì £¶›¬Dxîoÿv.N"
我通常针对此类请求使用 Logcheck 正则表达式:
^[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2} [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2} \[error\] [#[:digit:]]+: \*[[:digit:]]+ testing .+ existence failed \(2: No such file or directory\) while logging request, .+$
没有捕捉到它们。我尝试过:
^[[:digit:]]{4}/[[:digit:]]{2}/[[:digit:]]{2} [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2} \[error\] [#[:digit:]]+: \*[[:digit:]]+ testing .+ existence failed \(2: No such file or directory\) while logging request, (.|[[:cntrl:]])+$
但运气不佳。两种变体都与 RegexBuddy 中设置为 POSIX ERE 的日志条目相匹配。有没有 Logcheck/正则表达式专家可以帮我?
答案1
您需要转义斜线。我指的是分隔日期的斜线。
^[[:digit:]]{4}\/[[:digit:]]{2}\/[[:digit:]]{2} [[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2} \[error\] [#[:digit:]]+: \*[[:digit:]]+ testing .+ existence failed \(2: No such file or directory\) while logging request, .+$
然后,即使最后有特殊字符,您的通常表达对我来说仍然很好用。