尝试从包含多行的文件中提取字段,例如:
alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Client Login Packet"; flowbits:isset,ET.gadu.welcome; flow:established,to_server; dsize:<50; content:"|15 00 00 00|"; depth:4; flowbits:set,ET.gadu.loginsent; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008298; classtype:policy-violation; sid:2008298; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert tcp any [21,25,110,143,443,465,587,636,989:995,5061,5222] -> $HOME_NET any (msg:"ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204)"; flow:established,from_server; content:"|16 03|"; depth:2; byte_test:1,<,4,0,relative; content:"|02|"; distance:3; within:1; byte_jump:1,37,relative; content:"|00 19|"; within:2; fast_pattern; threshold:type limit,track by_dst,count 1,seconds 1200; reference:url,blog.cryptographyengineering.com/2015/03/attack-of-week-freak-or-factoring-nsa.html; reference:cve,2015-0204; reference:cve,2015-1637; classtype:bad-unknown; sid:2020661; rev:3; metadata:created_at 2015_03_10, updated_at 2015_03_10;)
alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Send Message"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|0b 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008302; classtype:policy-violation; sid:2008302; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert tcp $EXTERNAL_NET 8074 -> $HOME_NET any (msg:"ET CHAT GaduGadu Chat Receive Message"; flowbits:isset,ET.gadu.loggedin; flow:established,from_server; content:"|0a 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008303; classtype:policy-violation; sid:2008303; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Keepalive PING"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|08 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008304; classtype:policy-violation; sid:2008304; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ET EXPLOIT CVE-2016-0189 Common Construct M2"; flow:established,from_server; file_data; content:"triggerBug"; nocase; content:"Dim "; nocase; distance:0; content:".resize"; nocase; pcre:"/^\s*\x28/Rs"; content:"Mid"; pcre:"/^\s*?\(x\s*,\s*1,\s*24000\s*\x29/Rs"; reference:url,theori.io/research/cve-2016-0189; reference:cve,2016-0189; classtype:attempted-user; sid:2022972; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2016_07_15, performance_impact Low, updated_at 2016_07_15;)
尽管我能够提取单个字段,但我不知道如何提取,例如,、 、 和sid
的msg
内容classtype
,将其列在以逗号分隔的行中,并对文件中的其他行执行相同的操作。metadata:created_at
updated_at
基于第一个条目的预期输出:
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
created_at
并且updated_at
将始终出现在 后面metadata
,但可能处于不同的位置/顺序。
在 GNU/Linux 中的 Bash 上运行。
答案1
实现所需输出的简单脚本:
#!/usr/bin/env bash
# Assumptions: the file name is always passed, and points to a valid file,
# hence no error handling has been implemented. (for script simplicity)
# let the first argument to the script be the file name.
filename="$1"
# read one line at a time, extracting the required fields
while read -r line
do
# skip blank lines
if [[ ${#line} -gt 0 ]]; then
sid=$(echo "$line"|grep -o 'sid[^;]*'| awk -F ':' '{print $2}')
msg=$(echo "$line"|grep -o 'msg:[^;]*'| awk -F '"' '{print $2}')
classType=$(echo "$line"|grep -o 'classtype:[^;]*'| awk -F ':' '{print $2}')
cDate=$(echo "$line"|grep -o "created_at[^,]*"|awk '{print $2}')
uDate=$(echo "$line"|grep -o "updated_at[^';']*"|awk '{print $2}')
echo "$sid,$msg,$classType,$cDate,$uDate"
fi
done < "$filename"
运行脚本:
./scriptName fileName
输出:
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204),bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15
答案2
以下是使用 GNU awk 执行 FPAT 操作的一般方法:
$ cat tst.awk
BEGIN {
FPAT="[[:alnum:]_]+:(\"[^\"]+\"|[^;]+)"
OFS = ","
}
{
delete f
for (i=1; i<=NF; i++) {
tag = val = $i
sub(/:.*/,"",tag)
sub(/[^:]+:/,"",val)
gsub(/"/,"",val)
f[tag] = val
if ( tag == "metadata" ) {
numSubFlds = split(val,md,/, */)
for (j=1; j<=numSubFlds; j++) {
subTag = subVal = md[j]
sub(/ .*/,"",subTag)
sub(/[^ ]+ /,"",subVal)
f[tag":"subTag] = subVal
}
}
}
# uncomment this to see all tags and values
# for (idx in f) { print idx "=" f[idx] }
# print
print f["sid"], f["msg"], f["classtype"], f["metadata:created_at"], f["metadata:updated_at"]
}
。
$ gawk -f tst.awk file
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,,bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15
看起来您的第二个输入行与其他输入行的格式不同,因此输出不同。