我有一个日志文件,必须使用 unix 命令对其进行解析。
我需要计算行之间的时间差,最后我需要显示事务之间的 MIN、MAX 和 AVG 时间以及 MIN 的 ID 号。
我的脚本正在执行我编写的所有内容,接受 MIN 的 ID 号,但我不明白为什么。
- 日志文件示例:
03/22 08:51:01.050 INFO :1000 :.main: *************** RSVP Agent started *************** 03/22 08:51:01.532 INFO :1001 :...locate_configFile: Specified configuration file: /u/user10/rsvpd1.conf WARNING 03/22 08:51:01.405 INFO :1002 :.main: Using log level 511 03/22 08:51:01.970 INFO :1003 :..settcpimage: Get TCP images rc - EDC8112I Operation not supported on socket. 03/22 08:51:01.837 INFO :1004 :..settcpimage: Associate with TCP/IP image name = TCPCS 03/22 08:51:02.100 INFO :1005 :..reg_process: registering WARNING process with the system 03/22 08:51:02.524 INFO :1006 :..reg_process: attempt OS/390 registration 03/22 08:51:02.748 INFO :1007 :..reg_process: return from registration rc=0 03/22 08:51:06.624 TRACE :1008 :.....starting_transaction: calling API: status: START 03/22 08:51:06.123 INFO :1009 :...read_physical_netif: index #0, interface VLINK1 has address 129.1.1.1, ifidx 0 03/22 08:51:06.524 INFO :1010 :...read_physical_netif: index #1, interface TR1 has address 9.37.65.139, ifidx 1 03/22 08:51:06.367 INFO :1011 :...read_physical_netif: index #2, interface LINK11 has address 9.67.100.1, ifidx 2 03/22 08:51:06.748 INFO :1012 :...read_physical_netif: index #3, interface LINK12 has address 9.67.101.1, ifidx 3 03/22 08:51:06.965 INFO :1013 :...read_physical_netif: index #4, interface CTCD0 has address 9.67.116.98, ifidx 4 03/22 08:51:06.010 INFO :1014 :...read_physical_netif: index #5, interface CTCD2 has address 9.67.117.98, ifidx 5 03/22 08:51:06.050 INFO :1015 :...read_physical_netif: index #6, interface LOOPBACK has address 127.0.0.1, ifidx 0 03/22 08:51:06.100 INFO :1016 :....mailslot_create: creating mailslot for timer 03/22 08:51:06.724 INFO :1017 :.....ending_transaction: calling API: status: END 03/22 08:51:06.970 INFO :1018 :.....mailslot_create: creating mailslot for RSVP 03/22 08:51:06.160 INFO :1019 :....mailbox_register: mailbox allocated for rsvp
- 我的脚本:
for i in log-file.txt do cat log-file.txt | grep -E "starting_transaction|ending_transaction" >> transactions.txt | awk '{print $2}' <transactions.txt >global-time.txt awk -F: '{ print ($1 * 3600) + ($2 * 60) + $3 }' <global-time.txt >seconds-time.txt awk 'NR > 1 { print $0 - prev } { prev = $0 }' <seconds-time.txt >difference-time.txt awk '{print $4}' <transactions.txt >trans-id.txt | paste difference-time.txt trans-id.txt > diff-transid.txt awk '{if(min==""){min=max=$1 $2}; if($1>max) {max=$1 $2}; if($1<min) {min=$1 $2}; total+=$1; count+=1} END {print "avg " total/count," | max " max," | min " min " | minID " $2}' <diff-transid.txt >final-answer.txt done
- 我得到的结果:
avg 11.1467 | max 99.1 | min 0.1 | minID
- 我需要的结果:
avg 11.1467 | max 99.1 | min 0.1 | minID 1017
答案1
您想要实现的目标可以完全在awk
脚本中实现,这比使用 shell 循环进行文本处理要高效得多。我会推荐以下程序(我们称之为analyze_timing.awk
):
#!/usr/bin/awk -f
function timediff(start,end, stfld,endfld,diff) {
split(start,stfld, /:/)
split(end, endfld,/:/)
if (endfld[1]<stfld[1]) {
diff=(3600*(endfld[1]+24) + 60*endfld[2] + endfld[3])
}
else {
diff=(3600*endfld[1] + 60*endfld[2] + endfld[3])
}
diff -= (3600*stfld[1] + 60*stfld[2] + stfld[3])
return diff
}
$5 ~ /^:\.+starting_transaction/ {laststart=$2;next}
$5 ~ /^:\.+ending_transaction/ {
n_transact++
duration=timediff(laststart, $2)
avg+=duration
if (n_transact==1) {
shortest=duration
longest=duration
min_id=substr($4,2)
}
else {
if (duration<shortest) {
shortest=duration
min_id=substr($4,2)
} else if (duration>longest) {
longest=duration
}
}
}
END {
printf("avg: %f | max: %f | min: %f | minID: %d\n", avg/n_transact, longest, shortest, min_id)
}
这将首先定义一个函数timediff()
来计算两个时间戳之间经过的时间,如示例所示。为简单起见,假设一笔交易需要不到 24 小时。
然后,它将检查一行的第 5 个字段是否以starting_transaction
a:
和任意数量的开头,.
并将时间记录在变量中laststart
。如果第五个字段同样以 开头ending_transaction
,它将计算差异laststart
并填充用于计算最小/最大/平均值的变量。如果是迄今为止最短的交易,则 ID 将记录在 中min_id
。
最后,程序根据需要打印摘要。
你会称其为
awk -f analyze_timing.awk log-file.txt