我使用此命令从下面的原始日志中提取了以下信息:
echo -e "Timestamp\t\tEmailTo:\t\tEmailFrom:\t\t\t\t\tIPAddress:\tErrorCodes:" && sed -n -e 's/.*\([0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9]*\) .*\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]*\).*/\1 \2 /p' logs
输出:
Timestamp EmailTo: EmailFrom: IPAddress: ErrorCodes:
2017-01-02 12:50:00
2017-01-02 13:10:25
原始日志:
2017-01-02 12:50:00 1cNxNS-001NKu-9B == [email protected] R=dkim_lookuphost T=dkim_remote_smtp defer (-45) H=mta6.am0.yahoodns.net [98.138.112.38]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=1772: 421 4.7.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html
2017-01-02 13:10:25 1cNxhD-001VZ3-0f == [email protected] ([email protected]) <[email protected]> R=lookuphost T=remote_smtp defer (-45) H=mta7.am0.yahoodns.net [98.138.112.34]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=87839: 500 5.9.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html
但我无法提取我需要的其他信息;它应该看起来像:
Timestamp EmailTo: mailFrom: IPAddress: ErrorCodes:
2017-01-02 12:50:00 [email protected] [email protected] 192.168.1.269 421 4.7.0
2017-01-02 13:10:25 [email protected] [email protected] 192.168.1.269 500 5.9.0
如何使用 提取所有信息sed
?
答案1
你可以试试这个sed
表达式:
sed -e 's/^\(.* .* \).* .*== \([^ ]* \).*MAIL FROM:<\([^ ]*\)> [^ ]* \([0-9 .]*\)\[.*Messages from \([^ ]*\).*$/\1\t\2\t\3\t\5\t\4/'
以你的例子来说,它对我有用。
解释
该sed
表达式仅包含一个命令 - s/.../.../
.
第一部分s///
:
'^\(.* .* \)' -- Timestamp, two first space-separated blocks of text, \1.
'.* .*== ' -- Uninteresting text after timestamp.
'\([^ ]* \)' -- Block of test between spaces, first email address, \2.
'.*MAIL FROM:<' -- Position before second email.
'\([^ ]*\)>' -- Second email addr, non-space characters, ended by '>', \3.
' [^ ]* ' -- SIZE=...:
'\([0-9 .]*\)\[' -- Error codes: digits, spaces and dots ended by '[', \4.
'.*Messages from ' -- Position before IP.
'\([^ ]*\)' -- Non-space characters, ended by space, IP. \5.
'.*$' -- Text before end of string, not interesting.
正如你所看到的,这只是原始日志的直接描述,没有什么有趣的。
第二部分s///
只是\N
按照正确的顺序放置\t
(制表符)作为分隔符。
答案2
我对 awk 没有太多经验,但我想尝试一下。我想这是非常脆弱的,因为我不知道你想用它获得多少日志行。
无论如何,这使用BEGIN
块来设置要挑选的变量,以及用于在显示标题之前打印的格式字符串。时间和 EmailTo 是可预测的,因此可以在三组正则表达式之前使用编号字段($1
、$2
和$5
),这只是非常粗略的。任何改进建议将不胜感激!
awk 'BEGIN {
from=""; ip=""; error=""; fstr="%-24s%-24s%-40s%-16s%s\n";
printf(fstr, "Timestamp:", "EmailTo:", "EmailFrom:", "IPAddress:", "ErrorCodes:");
}
{ for (i=6; i<NF; i++)
{
# From Address
if ($i ~ /FROM:<[^ ]*>/)
from=substr($i, 7, length($i)-7);
# Errors found in two adjacent fields.
if ($(i-1) ~ /[[:digit:]]{3}/ && $i ~ /[[:digit:]]\.[[:digit:]]\.[[:digit:]]/)
error=$(i-1) " " $i;
# From address after predictable string.
if ($(i-2) " " $(i-1) == "Messages from" && $i ~ /[[:digit:].]{7,15}/)
ip=$i;
}
printf(fstr, $1" "$2, $5, from, ip, error);
}' logs