如何在Linux终端中从长尺寸的csv文件中分离数据?

如何在Linux终端中从长尺寸的csv文件中分离数据?

我有一个.csv大约 30GB 大小的文件。我想 grep 一些应该遵循多个字符串匹配条件的行。使用或 来执行此操作的正确方法是什么grep?我尝试执行以下命令,该命令返回结果,但它也显示较早的日期数据。awksed

grep -w "for-outbound-sports\|2019-05-16" Master.csv

有没有其他方法可以使用awksed或其他东西更快地完成它?

更新

更具体地说,示例输入:

"","22288","1990353330","for-outbound-STARZONE","22288","Local/1990353330@for-outbound-STARZONE-00042f49;2","DAHDI/i15/01990353330-c237","Dial","DAHDI/G0/01990353330,30","2019-01-17 13:45:05","2019-01-17 13:45:17","2019-01-17 13:45:32",27,15,"ANSWERED","DOCUMENTATION","1547732705.828852",""
"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-15 03:57:02","2019-05-15 03:57:10","2019-05-15 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""
"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-16 03:57:02","2019-05-16 03:57:10","2019-05-16 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""

示例输出:

"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-16 03:57:02","2019-05-16 03:57:10","2019-05-16 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""

答案1

grep已经是一种非常快速的方式来浏览大文件并在行中查找单词或字符,也许单词-w正则表达式使它有点慢。通常速度慢的并不是其grep本身,而是终端上的输出。您可以通过将输出定向到文件来简单地测试它:

grep -w "for-outbound-sports\|2019-05-16" Master.csv > greped_master.csv

您始终可以使用该程序parallel来分割大文件并利用多线程。如 parallel --pipe --block 2M grep foo < bigfile 你所见这里

相关内容