从日志中提取特定信息

Question 1

你可以试试这个sed表达式：

sed -e 's/^\(.* .* \).* .*== \([^ ]* \).*MAIL FROM:<\([^ ]*\)> [^ ]* \([0-9 .]*\)\[.*Messages from \([^ ]*\).*$/\1\t\2\t\3\t\5\t\4/'

以你的例子来说，它对我有用。

解释

该sed表达式仅包含一个命令 - s/.../.../.

第一部分s///：

'^\(.* .* \)'      -- Timestamp, two first space-separated blocks of text, \1.
'.* .*== '         -- Uninteresting text after timestamp.
'\([^ ]* \)'       -- Block of test between spaces, first email address, \2.
'.*MAIL FROM:<'    -- Position before second email.
'\([^ ]*\)>'       -- Second email addr, non-space characters, ended by '>', \3.
' [^ ]* '          -- SIZE=...:
'\([0-9 .]*\)\['   -- Error codes: digits, spaces and dots ended by '[', \4.
'.*Messages from ' -- Position before IP.
'\([^ ]*\)'        -- Non-space characters, ended by space, IP. \5.
'.*$'              -- Text before end of string, not interesting.

正如你所看到的，这只是原始日志的直接描述，没有什么有趣的。

第二部分s///只是\N按照正确的顺序放置\t（制表符）作为分隔符。

Answer

你可以试试这个sed表达式：

sed -e 's/^\(.* .* \).* .*== \([^ ]* \).*MAIL FROM:<\([^ ]*\)> [^ ]* \([0-9 .]*\)\[.*Messages from \([^ ]*\).*$/\1\t\2\t\3\t\5\t\4/'

以你的例子来说，它对我有用。

解释

该sed表达式仅包含一个命令 - s/.../.../.

第一部分s///：

'^\(.* .* \)'      -- Timestamp, two first space-separated blocks of text, \1.
'.* .*== '         -- Uninteresting text after timestamp.
'\([^ ]* \)'       -- Block of test between spaces, first email address, \2.
'.*MAIL FROM:<'    -- Position before second email.
'\([^ ]*\)>'       -- Second email addr, non-space characters, ended by '>', \3.
' [^ ]* '          -- SIZE=...:
'\([0-9 .]*\)\['   -- Error codes: digits, spaces and dots ended by '[', \4.
'.*Messages from ' -- Position before IP.
'\([^ ]*\)'        -- Non-space characters, ended by space, IP. \5.
'.*$'              -- Text before end of string, not interesting.

正如你所看到的，这只是原始日志的直接描述，没有什么有趣的。

第二部分s///只是\N按照正确的顺序放置\t（制表符）作为分隔符。

Question 2

我对 awk 没有太多经验，但我想尝试一下。我想这是非常脆弱的，因为我不知道你想用它获得多少日志行。

无论如何，这使用BEGIN块来设置要挑选的变量，以及用于在显示标题之前打印的格式字符串。时间和 EmailTo 是可预测的，因此可以在三组正则表达式之前使用编号字段（$1、$2和$5），这只是非常粗略的。任何改进建议将不胜感激！

awk 'BEGIN {
        from=""; ip=""; error=""; fstr="%-24s%-24s%-40s%-16s%s\n";
        printf(fstr, "Timestamp:", "EmailTo:", "EmailFrom:", "IPAddress:", "ErrorCodes:");
    }
{   for (i=6; i<NF; i++)
    {   
    # From Address
    if ($i ~ /FROM:<[^ ]*>/)  
        from=substr($i, 7, length($i)-7);
    # Errors found in two adjacent fields.
    if ($(i-1) ~ /[[:digit:]]{3}/ && $i ~ /[[:digit:]]\.[[:digit:]]\.[[:digit:]]/)
        error=$(i-1) " " $i;
    # From address after predictable string.
    if ($(i-2) " " $(i-1) == "Messages from" && $i ~ /[[:digit:].]{7,15}/)
        ip=$i;
    }
    printf(fstr, $1" "$2, $5, from, ip, error);
}' logs

Answer

我对 awk 没有太多经验，但我想尝试一下。我想这是非常脆弱的，因为我不知道你想用它获得多少日志行。

无论如何，这使用BEGIN块来设置要挑选的变量，以及用于在显示标题之前打印的格式字符串。时间和 EmailTo 是可预测的，因此可以在三组正则表达式之前使用编号字段（$1、$2和$5），这只是非常粗略的。任何改进建议将不胜感激！

awk 'BEGIN {
        from=""; ip=""; error=""; fstr="%-24s%-24s%-40s%-16s%s\n";
        printf(fstr, "Timestamp:", "EmailTo:", "EmailFrom:", "IPAddress:", "ErrorCodes:");
    }
{   for (i=6; i<NF; i++)
    {   
    # From Address
    if ($i ~ /FROM:<[^ ]*>/)  
        from=substr($i, 7, length($i)-7);
    # Errors found in two adjacent fields.
    if ($(i-1) ~ /[[:digit:]]{3}/ && $i ~ /[[:digit:]]\.[[:digit:]]\.[[:digit:]]/)
        error=$(i-1) " " $i;
    # From address after predictable string.
    if ($(i-2) " " $(i-1) == "Messages from" && $i ~ /[[:digit:].]{7,15}/)
        ip=$i;
    }
    printf(fstr, $1" "$2, $5, from, ip, error);
}' logs

从日志中提取特定信息

答案1

解释

答案2

相关内容