我有一个.csv
大约 30GB 大小的文件。我想 grep 一些应该遵循多个字符串匹配条件的行。使用或 来执行此操作的正确方法是什么grep
?我尝试执行以下命令,该命令返回结果,但它也显示较早的日期数据。awk
sed
grep -w "for-outbound-sports\|2019-05-16" Master.csv
有没有其他方法可以使用awk
或sed
或其他东西更快地完成它?
更新
更具体地说,示例输入:
"","22288","1990353330","for-outbound-STARZONE","22288","Local/1990353330@for-outbound-STARZONE-00042f49;2","DAHDI/i15/01990353330-c237","Dial","DAHDI/G0/01990353330,30","2019-01-17 13:45:05","2019-01-17 13:45:17","2019-01-17 13:45:32",27,15,"ANSWERED","DOCUMENTATION","1547732705.828852",""
"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-15 03:57:02","2019-05-15 03:57:10","2019-05-15 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""
"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-16 03:57:02","2019-05-16 03:57:10","2019-05-16 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""
示例输出:
"","22020","1990353330","for-outbound-sports","22020","Local/1990353330@for-outbound-sports-001b223f;2","DAHDI/i14/01990353330-553f8","Dial","DAHDI/G0/01990353330,30","2019-05-16 03:57:02","2019-05-16 03:57:10","2019-05-16 03:57:44",42,34,"ANSWERED","DOCUMENTATION","1557979022.5390225",""
答案1
grep
已经是一种非常快速的方式来浏览大文件并在行中查找单词或字符,也许单词-w
正则表达式使它有点慢。通常速度慢的并不是其grep
本身,而是终端上的输出。您可以通过将输出定向到文件来简单地测试它:
grep -w "for-outbound-sports\|2019-05-16" Master.csv > greped_master.csv
您始终可以使用该程序parallel
来分割大文件并利用多线程。如
parallel --pipe --block 2M grep foo < bigfile
你所见这里