我有一个 CSV 文件,其中包含一些列,其中有开始时间和结束时间。我的需求是获取时间之间的差异并将差异添加到新列中。
我得到了时间之间的差异,但无法将其正确地添加到每行的新列中。
这是我的示例 csv。
4,ganesh-28,2019-09-26T16:56:40Z,已关闭,harshavardhanc,2019-09-26T16:57:02Z,1,1 3,ganesh-28,2019-09-26T16:54:25Z,已关闭,harshavardhanc,2019-09-26T16:54:55Z,1,1 2,ganesh-28,2019-09-26T16:52:59Z,已关闭,harshavardhanc,2019-09-26T16:55:19Z,1,1 1,ganesh-28,2019-09-26T16:46:52Z,已关闭,harshavardhanc,2019-09-26T16:47:25Z,1,1
这是脚本。
#!/bin/bash
cat a.csv | while read line
do
created_at=$(date -d $(echo $line | awk -F "," '{print $3}') +%s)
merged_at=$(date -d $(echo $line | awk -F "," '{print $6}') +%s)
echo $created_at $merged_at
diff=$(( $merged_at - $created_at ))
h=`expr $diff / 3600`
m=`expr $diff % 3600 / 60`
s=`expr $diff % 60`
diff=$(printf "%02d:%02d:%02d\n" $h $m $s)
echo $diff
awk -v v1="$diff" -F"," 'BEGIN { OFS = "," } {$9=v1; print}' a.csv >> b.csv
done
我得到了类似这样的输出。
4,ganesh-28,2019-09-26T16:56:40Z,closed,harshavardhanc,2019-09-26T16:57:02Z,1,1,00:00:22
3,ganesh-28,2019-09-26T16:54:25Z,closed,harshavardhanc,2019-09-26T16:54:55Z,1,1,00:00:22
2,ganesh-28,2019-09-26T16:52:59Z,closed,harshavardhanc,2019-09-26T16:55:19Z,1,1,00:00:22
1,ganesh-28,2019-09-26T16:46:52Z,closed,harshavardhanc,2019-09-26T16:47:25Z,1,1,00:00:22
4,ganesh-28,2019-09-26T16:56:40Z,closed,harshavardhanc,2019-09-26T16:57:02Z,1,1,00:00:30
3,ganesh-28,2019-09-26T16:54:25Z,closed,harshavardhanc,2019-09-26T16:54:55Z,1,1,00:00:30
2,ganesh-28,2019-09-26T16:52:59Z,closed,harshavardhanc,2019-09-26T16:55:19Z,1,1,00:00:30
1,ganesh-28,2019-09-26T16:46:52Z,closed,harshavardhanc,2019-09-26T16:47:25Z,1,1,00:00:30
4,ganesh-28,2019-09-26T16:56:40Z,closed,harshavardhanc,2019-09-26T16:57:02Z,1,1,00:02:20
3,ganesh-28,2019-09-26T16:54:25Z,closed,harshavardhanc,2019-09-26T16:54:55Z,1,1,00:02:20
2,ganesh-28,2019-09-26T16:52:59Z,closed,harshavardhanc,2019-09-26T16:55:19Z,1,1,00:02:20
1,ganesh-28,2019-09-26T16:46:52Z,closed,harshavardhanc,2019-09-26T16:47:25Z,1,1,00:02:20
4,ganesh-28,2019-09-26T16:56:40Z,closed,harshavardhanc,2019-09-26T16:57:02Z,1,1,00:00:33
3,ganesh-28,2019-09-26T16:54:25Z,closed,harshavardhanc,2019-09-26T16:54:55Z,1,1,00:00:33
2,ganesh-28,2019-09-26T16:52:59Z,closed,harshavardhanc,2019-09-26T16:55:19Z,1,1,00:00:33
1,ganesh-28,2019-09-26T16:46:52Z,closed,harshavardhanc,2019-09-26T16:47:25Z,1,1,00:00:33
即将差异附加到所有行。
但我的要求只是获取该行的时间差。输出应该是这样的。
4,ganesh-28,2019-09-26T16:56:40Z,closed,harshavardhanc,2019-09-26T16:57:02Z,1,1,00:00:22
3,ganesh-28,2019-09-26T16:54:25Z,closed,harshavardhanc,2019-09-26T16:54:55Z,1,1,00:00:30
2,ganesh-28,2019-09-26T16:52:59Z,closed,harshavardhanc,2019-09-26T16:55:19Z,1,1,00:02:20
1,ganesh-28,2019-09-26T16:46:52Z,closed,harshavardhanc,2019-09-26T16:47:25Z,1,1,00:00:33
请有人帮助我实现这一目标。
答案1
循环内的最后一个 awk 命令
awk -v v1="$diff" -F"," 'BEGIN { OFS = "," } {$9=v1; print}' a.csv >> b.csv
在每次循环迭代时处理整个文件a.csv
,并将整个结果附加到b.csv
每次循环中。
假设您的意图是仅将命令应用于$line
变量的当前内容 -bash
您可以使用这里是字符串
awk -v v1="$diff" -F"," 'BEGIN { OFS = "," } {$9=v1; print}' <<<"$line" >> b.csv
但是,通常不建议在 shell 循环中逐行处理 CSV 文件 - 您可能需要考虑使用本机提供日期时间处理的实用程序(Perl、Python、GNU Awk)或磨坊主前任。
mlr --csvlite --implicit-csv-header --headerless-csv-output put -S '
$9 = strftime(strptime($6,"%Y-%m-%dT%H:%M:%SZ") - strptime($3,"%Y-%m-%dT%H:%M:%SZ"),"%T")
' a.csv > b.csv
(--implicit-csv-header --headerless-csv-output
如果您的 CSV 文件确实有标题,请删除)。
也可以看看