我有如下所示的 fb.csv 文件。
"Source","Time"
"192.168.137.174","12:26:25"
"10.0.138.163","12:26:25"
"157.240.10.13","12:26:36"
"157.240.10.13","12:26:36"
"157.240.10.23","12:26:41"
"157.240.10.23","12:26:41"
"10.0.138.163","12:26:52"
"192.168.137.174","12:26:52"
"157.240.10.18","12:26:52"
"157.240.10.18","12:26:52"
"157.240.10.23","12:26:53"
"157.240.10.23","12:26:53"
"192.168.137.174","12:27:02"
"10.0.138.163","12:27:02"
"192.168.137.174","12:27:07"
我想找出同一“源”的最长时间与最短时间之间的差值。
期望输出;
"Source","Duration Time"
"192.168.137.174","00:01:22"
"10.0.138.163","00:01:17"
"157.240.10.13","00:00:00"
"157.240.10.23","00:00:00"
"157.240.10.18","00:00:00"
有什么方法吗?谢谢
答案1
又是我,那个用很长的awk
单行命令的人...这个甚至更长:
awk -F, 'BEGIN{print"\"Source\",\"Duration Time\""}NR>1{gsub(/"/,"",$2);split($2,hms,":");s=hms[1]*3600+hms[2]*60+hms[3];if(!(($1,"MAX")in a)||a[$1,"MAX"]<s)a[$1,"MAX"]=s;if(!(($1,"MIN")in a)||a[$1,"MIN"]>s)a[$1,"MIN"]=s}END{for(idx in a){split(idx,ipm,SUBSEP);if(ipm[2]=="MAX"){d=a[idx]-a[ipm[1],"MIN"];h=int(d/3600);m=int((d-h*3600)/60);s=d%60;printf("%s,\"%02d:%02d:%02d\"\n",ipm[1],h,m,s)}}}' fb.csv
使用fb.csv
问题中给出的输入文件,输出如下所示:
"Source","Duration Time"
"157.240.10.23","00:00:12"
"157.240.10.18","00:00:00"
"157.240.10.13","00:00:00"
"10.0.138.163","00:00:37"
"192.168.137.174","00:00:42"
命令解释:
我们在这里像这样运行awk
,设置分隔列的字段分隔符,
并使用文件fb.csv
作为输入:
awk -F, '<COMMAND>' fb.csv
经过正确格式化后,命令awk
(<COMMAND>
上面的占位符)如下:
BEGIN {
print "\"Source\",\"Duration Time\""
}
NR>1 {
gsub(/"/, "", $2)
split($2, hms, ":")
s = hms[1]*3600 + hms[2]*60 + hms[3]
if ( !(($1,"MAX") in a) || a[$1,"MAX"] < s )
a[$1,"MAX"] = s
if ( !(($1,"MIN") in a) || a[$1,"MIN"] > s )
a[$1,"MIN"] = s
}
END {
for (idx in a) {
split(idx, ipm, SUBSEP)
if (ipm[2]=="MAX") {
d = a[idx] - a[ipm[1],"MIN"]
h = int(d / 3600)
m = int((d - h * 3600) / 60)
s = d%60
printf("%s,\"%02d:%02d:%02d\"\n", ipm[1] ,h ,m ,s)
}
}
}
该
BEGIN
块只是打印新的 CSV 标题。该
NR>1
块在输入文件中每行运行一次,第一行除外,因为第一行包含标题。每行被分成 IP 列 ($1
) 和时间列 ($2
)。我们通过删除引号
gsub
并在冒号处将其拆分hms
为包含小时、分钟和秒的数组来处理时间列。这用于将时间戳转换为自午夜以来的秒数,并存储在s
此块中。接下来,我们检查关联数组是否尚未包含具有当前行 IP 的条目,或者该条目是否具有较小的 MAX 或较大的 MIN 时间值,在这种情况下将进行相应的更新。
最后,在
END
块中评估创建的数组,并针对其中的每个 IP,计算 MAX 和 MIN 时间戳之间的差异并将其保存为d
。这会将其转换回小时、分钟和秒并以正确的格式输出。