CSV - 只保留某些条目

CSV - 只保留某些条目

我里面有一堆数据,并用- 下面两行的示例""分隔,

"stampthisandthat.com","GANDI SAS","[email protected]","whois.gandi.net","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|","16-feb-2012","28-feb-2013","16-feb-2014","2012-02-16 00:00:00 UTC","2013-02-28 00:00:00 UTC","2014-02-16 00:00:00 UTC","clientTransferProhibited","2013-11-12 08:00:00 UTC","[email protected]","Laura VOGT","","Gandi, 63-65 boulevard Massena","","","","(Gandi) Paris","","(Gandi) 75013","(Gandi) FR","33143730576","","33170377666","","[email protected]","Laura VOGT","","Gandi, 63-65 boulevard Massena","","","","(Gandi) Paris","","(Gandi) 75013","(Gandi) FR","33143730576","","33170377666",""|
"salochinbd.com","FASTDOMAIN, INC.","[email protected]","whois.fastdomain.com","NS1.IPAGE.COM|NS2.IPAGE.COM|","17-feb-2012","03-feb-2013","17-feb-2014","2012-02-17 00:00:00 UTC","2013-02-03 00:00:00 UTC","2014-02-17 00:00:00 UTC","ok","2013-11-12 08:00:00 UTC","[email protected]","","","","","","","Cedar Rapids","Iowa","52402","UNITED STATES","","","13192100679","","[email protected]","","","","","","","Cedar Rapids","Iowa","52402","UNITED STATES","","","13192100679",""|

我将如何只保留某些数据?例如,如何只保留第一个、第二个和第五个中的数据""

答案1

cut -d\" -f2,4,10 <in | tr \" , >out

...只会抓取字段 1,2 和 5 的引用位,然后确保它们用逗号分隔。

或者,因为引号内可能出现一些逗号......

 cut -d\" -f-5,10-11 <in | sed s/,$// >out

...甚至...

cut -d\" -f-5,10 <in | paste -d\" - /dev/null >out

...可能会将其设置为正确的。

第一个打印:

stampthisandthat.com,GANDI SAS,A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|
salochinbd.com,FASTDOMAIN, INC.,NS1.IPAGE.COM|NS2.IPAGE.COM|

……还有第二个、第三个……

"stampthisandthat.com","GANDI SAS","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|"
"salochinbd.com","FASTDOMAIN, INC.","NS1.IPAGE.COM|NS2.IPAGE.COM|"

以下示例演示了如何对字段 1,3,17,21,22,23,24 执行类似操作:

printf '"%s"\n' "$(seq -s\",\" 35)" |
cut -d\" -f-3,6-7,34-35,42-48       |
paste -d\" - /dev/null

"1","3","17","21","22","23","24"

...仅将那些字段拉出输出,seq如下所示:

"1","2","3",..."35"

答案2

有一个鲜为人知的程序,名叫csv报价这使得可以使用cutsed、 和等标准工具awk来处理 CSV 文件。它的工作原理是将引号内的特殊字符映射到一些不可打印的字符,然后将它们映射回来。有了这个程序,就这么简单:

csvquote file.csv | cut -d , -f 1,2,5 | csvquote -u

输出:

"stampthisandthat.com","GANDI SAS","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|"
"salochinbd.com","FASTDOMAIN, INC.","NS1.IPAGE.COM|NS2.IPAGE.COM|"

答案3

awk -F',' '{print $1 $2 $5}'- 这就是你要找的吗?

答案4

要解决字段中包含逗号的问题,请将字段分隔符更改为引号+逗号;假设您在字段中包含的逗号不在字段的开头和/或结尾。

$ awk -F'(\",)' '{print $1 $2 $17}' test.txt

只要确保你逃脱将引号和包围字段分隔符放在单引号中以保护您的 shell。

笔记gawk我相信这是在 Fedora 20 上使用的。

相关内容