在 Raspberry Pi 上的 Ubuntu 上使用bash
shell 脚本,我尝试从(逗号分隔的)CSV 列表中删除行,其中 {第 1 场比赛和第 3 场距离第 1 场第一场比赛不到 5 分钟(300 秒)}。
这是一个示例输入文件。我用 # 注释了所需的输出,以解释为什么保留或删除一行。我想要的不是注释,而只是删除说“删除”的行。实际的输入和过滤后的输出文件将如下所示:
A11EEA,@N171WT,2021/03/06 12:37:25,700,0.1
A0FC0A,@N1624K,2021/03/06 13:37:33,1975,2.0
...et cetera
带有注释的所需输出的输入文件:
A11EEA,@N171WT,2021/03/06 12:37:25,700,0.1 # Keep - 1st occurrence of Field-1
A0FC0A,@N1624K,2021/03/06 13:37:33,1975,2.2 # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:45:43,4500,1.3 # Keep - 1st occurrence of Field-1
A55325,@N442MG,2021/03/06 15:28:06,600,0.4 # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:50:46,4500,1.5 # Keep - more than 5 mins from line 3
AB0ED6,@UAL1470,2021/03/06 13:51:23,4925,1.6 # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:52:48,4500,1.7 # Delete - less than than 5 mins from line 5
AB0ED6,@UAL1470,2021/03/06 13:56:30,4925,1.8 # Keep - more than 5 mins from line 6
AB0ED6,@UAL1470,2021/03/06 13:56:40,4925,1.9 # Delete - less than than 5 mins from line 8
AB8C37,@AAL2386,2021/03/06 13:56:49,4500,1.0 # Delete - less than than 5 mins from line 5**
** Line 7 of the original record is not considered because it is slated for deletion
理想情况下,我想要一个使用 awk/sed/sort/uniq 的解决方案,而不是递归地执行如下操作:
while IFS= read -r line
do
IFS=, read -ra record <<< "$line"
# ... do a bunch of stuff
done < "inputfile.csv"
我尝试过这个,awk
但由于任务的复杂性和潜在的递归性,我很快就陷入了困境。
帮助?请问漂亮吗?
答案1
您可以创建一个函数来awk
获取两个日期之间的秒数差异,然后只需将最后一个“有效”日期存储在awk
由第一个字段索引的数组中,以便您可以在比较中使用它,例如:
awk '
function getDateDifference(a,b) {
gsub(/[:/]/, " ", a)
startDate = mktime(a)
gsub(/[:/]/, " ", b)
endDate = mktime(b)
return int(endDate - startDate)
}
BEGIN { FS=OFS="," }
dates[$1]=="" || (dates[$1]!="" && getDateDifference(dates [$1],$3) > 300){
print $0;
dates[$1] = $3
}' input.txt
请注意,在比较日期之前,您必须检查特定第一个字段是否存在索引数组值,以确保打印第一个匹配项。