多字段提取

多字段提取

尝试从包含多行的文件中提取字段,例如:

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Client Login Packet"; flowbits:isset,ET.gadu.welcome; flow:established,to_server; dsize:<50; content:"|15 00 00 00|"; depth:4; flowbits:set,ET.gadu.loginsent; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008298; classtype:policy-violation; sid:2008298; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp any [21,25,110,143,443,465,587,636,989:995,5061,5222] -> $HOME_NET any (msg:"ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204)"; flow:established,from_server; content:"|16 03|"; depth:2; byte_test:1,<,4,0,relative; content:"|02|"; distance:3; within:1; byte_jump:1,37,relative; content:"|00 19|"; within:2; fast_pattern; threshold:type limit,track by_dst,count 1,seconds 1200; reference:url,blog.cryptographyengineering.com/2015/03/attack-of-week-freak-or-factoring-nsa.html; reference:cve,2015-0204; reference:cve,2015-1637; classtype:bad-unknown; sid:2020661; rev:3; metadata:created_at 2015_03_10, updated_at 2015_03_10;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Send Message"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|0b 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008302; classtype:policy-violation; sid:2008302; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $EXTERNAL_NET 8074 -> $HOME_NET any (msg:"ET CHAT GaduGadu Chat Receive Message"; flowbits:isset,ET.gadu.loggedin; flow:established,from_server; content:"|0a 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008303; classtype:policy-violation; sid:2008303; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Keepalive PING"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|08 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008304; classtype:policy-violation; sid:2008304; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ET EXPLOIT CVE-2016-0189 Common Construct M2"; flow:established,from_server; file_data; content:"triggerBug"; nocase; content:"Dim "; nocase; distance:0; content:".resize"; nocase; pcre:"/^\s*\x28/Rs";  content:"Mid"; pcre:"/^\s*?\(x\s*,\s*1,\s*24000\s*\x29/Rs"; reference:url,theori.io/research/cve-2016-0189; reference:cve,2016-0189; classtype:attempted-user; sid:2022972; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2016_07_15, performance_impact Low, updated_at 2016_07_15;)

尽管我能够提取单个字段,但我不知道如何提取,例如,、 、 和sidmsg内容classtype,将其列在以逗号分隔的行中,并对文件中的其他行执行相同的操作。metadata:created_atupdated_at

基于第一个条目的预期输出:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30

created_at并且updated_at将始终出现在 后面metadata,但可能处于不同的位置/顺序。

在 GNU/Linux 中的 Bash 上运行。

答案1

实现所需输出的简单脚本:

#!/usr/bin/env bash

# Assumptions: the file name is always passed, and points to a valid file,
# hence no error handling has been implemented. (for script simplicity)

# let the first argument to the script be the file name.
filename="$1"

# read one line at a time, extracting the required fields
while read -r line
do
    # skip blank lines
    if [[ ${#line} -gt 0 ]]; then
        sid=$(echo "$line"|grep -o 'sid[^;]*'| awk -F ':' '{print $2}')
        msg=$(echo "$line"|grep -o 'msg:[^;]*'| awk -F '"' '{print $2}')
        classType=$(echo "$line"|grep -o 'classtype:[^;]*'| awk -F ':' '{print $2}')
        cDate=$(echo "$line"|grep -o "created_at[^,]*"|awk '{print $2}')
        uDate=$(echo "$line"|grep -o "updated_at[^';']*"|awk '{print $2}')

        echo "$sid,$msg,$classType,$cDate,$uDate"
    fi
done < "$filename"

运行脚本:

./scriptName fileName

输出:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204),bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

答案2

以下是使用 GNU awk 执行 FPAT 操作的一般方法:

$ cat tst.awk
BEGIN {
    FPAT="[[:alnum:]_]+:(\"[^\"]+\"|[^;]+)"
    OFS = ","
}
{
    delete f
    for (i=1; i<=NF; i++) {
        tag = val = $i
        sub(/:.*/,"",tag)
        sub(/[^:]+:/,"",val)
        gsub(/"/,"",val)
        f[tag] = val
        if ( tag == "metadata" ) {
            numSubFlds = split(val,md,/, */)
            for (j=1; j<=numSubFlds; j++) {
                subTag = subVal = md[j]
                sub(/ .*/,"",subTag)
                sub(/[^ ]+ /,"",subVal)
                f[tag":"subTag] = subVal
            }
        }
    }

    # uncomment this to see all tags and values
    # for (idx in f) { print idx "=" f[idx] }

    # print
    print f["sid"], f["msg"], f["classtype"], f["metadata:created_at"], f["metadata:updated_at"]
}

$ gawk -f tst.awk file
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,,bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

看起来您的第二个输入行与其他输入行的格式不同,因此输出不同。

相关内容