根据日期字段拆分 CSV 文件

根据日期字段拆分 CSV 文件

如何根据年份拆分文件。我的文件有 2019 年和 2020 年的数据,下面提到了文件的几行

hash,block_timestamp,addresses
1b8fb81b9c4db4cf3659d2553e7c1d5a4dac21400e331ea3deecdfa45e2eb7d7,2020-05-08 13:43:38 UTC,32UNEwo4UtXrD8xjAVDGapBcWQ9B7HBQNb
daeac50f989f0d31bcc412ca47e2e082f7d0599d8e577a9a310f7ab4e9d474d2,2020-05-08 13:21:33 UTC,3BMEXPLQqB9rkR5SMhJdA4Xm98ntT5xuw8
56777decb012d60f36f9cd4b9acfe13215f670bbe192f261db21e64f98e212be,2019-05-08 13:39:39 UTC,1AMtkH4riMpxSe7YMbs6h2aaDXVdxnmMFy
f5a1d52f013f1ee49a6cad971a5782c1c9905030d35ac28e23a2113fd1941421,2019-04-10 18:36:01 UTC,1LBBNap7kLswvgYbzmfLeskAfEMToiinkB

我试过

awk -F',' '{print >((substr($2,1,4)<=2020)?"2019":"2020")}' combined-out.csv

结果是两个空文件。如何解决这个问题

答案1

看起来您正在使用<=而不是<

BEGIN {
   FS = ","
}
{
   s1 = substr($2, 1, 4)
   if (s1 < 2020) {
      print > 2019
   } else {
      print > 2020
   }
}

答案2

要打印两个输出文件中的标题行:

$ awk -F, 'NR>1{print > ($2+0)}' file

$ cat 2019
56777decb012d60f36f9cd4b9acfe13215f670bbe192f261db21e64f98e212be,2019-05-08 13:39:39 UTC,1AMtkH4riMpxSe7s6h2aaDXVdxnmMFy
f5a1d52f013f1ee49a6cad971a5782c1c9905030d35ac28e23a2113fd1941421,2019-04-10 18:36:01 UTC,1LBBNap7kLswvgYfLeskAfEMToiinkB

$ cat 2020
1b8fb81b9c4db4cf3659d2553e7c1d5a4dac21400e331ea3deecdfa45e2eb7d7,2020-05-08 13:43:38 UTC,32UNEwo4UtXrD8xDGapBcWQ9B7HBQNb
daeac50f989f0d31bcc412ca47e2e082f7d0599d8e577a9a310f7ab4e9d474d2,2020-05-08 13:21:33 UTC,3BMEXPLQqB9rkR5JdA4Xm98ntT5xuw8

或在两个输出文件中打印它:

$ awk -F, 'NR==1{hdr=$0; next} {out=($2+0); if (!seen[out]++) print hdr > out; print > out}' file

$ cat 2019
hash,block_timestamp,addresses
56777decb012d60f36f9cd4b9acfe13215f670bbe192f261db21e64f98e212be,2019-05-08 13:39:39 UTC,1AMtkH4riMpxSe7YMbs6h2aaDXVdxnmMFy
f5a1d52f013f1ee49a6cad971a5782c1c9905030d35ac28e23a2113fd1941421,2019-04-10 18:36:01 UTC,1LBBNap7kLswvgYbzmfLeskAfEMToiinkB

$ cat 2020
hash,block_timestamp,addresses
1b8fb81b9c4db4cf3659d2553e7c1d5a4dac21400e331ea3deecdfa45e2eb7d7,2020-05-08 13:43:38 UTC,32UNEwo4UtXrD8xjAVDGapBcWQ9B7HBQNb
daeac50f989f0d31bcc412ca47e2e082f7d0599d8e577a9a310f7ab4e9d474d2,2020-05-08 13:21:33 UTC,3BMEXPLQqB9rkR5SMhJdA4Xm98ntT5xuw8

相关内容