我里面有一堆数据,并用- 下面两行的示例""
分隔,
"stampthisandthat.com","GANDI SAS","[email protected]","whois.gandi.net","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|","16-feb-2012","28-feb-2013","16-feb-2014","2012-02-16 00:00:00 UTC","2013-02-28 00:00:00 UTC","2014-02-16 00:00:00 UTC","clientTransferProhibited","2013-11-12 08:00:00 UTC","[email protected]","Laura VOGT","","Gandi, 63-65 boulevard Massena","","","","(Gandi) Paris","","(Gandi) 75013","(Gandi) FR","33143730576","","33170377666","","[email protected]","Laura VOGT","","Gandi, 63-65 boulevard Massena","","","","(Gandi) Paris","","(Gandi) 75013","(Gandi) FR","33143730576","","33170377666",""|
"salochinbd.com","FASTDOMAIN, INC.","[email protected]","whois.fastdomain.com","NS1.IPAGE.COM|NS2.IPAGE.COM|","17-feb-2012","03-feb-2013","17-feb-2014","2012-02-17 00:00:00 UTC","2013-02-03 00:00:00 UTC","2014-02-17 00:00:00 UTC","ok","2013-11-12 08:00:00 UTC","[email protected]","","","","","","","Cedar Rapids","Iowa","52402","UNITED STATES","","","13192100679","","[email protected]","","","","","","","Cedar Rapids","Iowa","52402","UNITED STATES","","","13192100679",""|
我将如何只保留某些数据?例如,如何只保留第一个、第二个和第五个中的数据""
。
答案1
cut -d\" -f2,4,10 <in | tr \" , >out
...只会抓取字段 1,2 和 5 的引用位,然后确保它们用逗号分隔。
或者,因为引号内可能出现一些逗号......
cut -d\" -f-5,10-11 <in | sed s/,$// >out
...甚至...
cut -d\" -f-5,10 <in | paste -d\" - /dev/null >out
...可能会将其设置为正确的。
第一个打印:
stampthisandthat.com,GANDI SAS,A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|
salochinbd.com,FASTDOMAIN, INC.,NS1.IPAGE.COM|NS2.IPAGE.COM|
……还有第二个、第三个……
"stampthisandthat.com","GANDI SAS","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|"
"salochinbd.com","FASTDOMAIN, INC.","NS1.IPAGE.COM|NS2.IPAGE.COM|"
以下示例演示了如何对字段 1,3,17,21,22,23,24 执行类似操作:
printf '"%s"\n' "$(seq -s\",\" 35)" |
cut -d\" -f-3,6-7,34-35,42-48 |
paste -d\" - /dev/null
"1","3","17","21","22","23","24"
...仅将那些字段拉出输出,seq
如下所示:
"1","2","3",..."35"
答案2
有一个鲜为人知的程序,名叫csv报价这使得可以使用cut
、sed
、 和等标准工具awk
来处理 CSV 文件。它的工作原理是将引号内的特殊字符映射到一些不可打印的字符,然后将它们映射回来。有了这个程序,就这么简单:
csvquote file.csv | cut -d , -f 1,2,5 | csvquote -u
输出:
"stampthisandthat.com","GANDI SAS","A.DNS.GANDI.NET|B.DNS.GANDI.NET|C.DNS.GANDI.NET|"
"salochinbd.com","FASTDOMAIN, INC.","NS1.IPAGE.COM|NS2.IPAGE.COM|"
答案3
awk -F',' '{print $1 $2 $5}'
- 这就是你要找的吗?
答案4
要解决字段中包含逗号的问题,请将字段分隔符更改为引号+逗号;假设您在字段中包含的逗号不在字段的开头和/或结尾。
$ awk -F'(\",)' '{print $1 $2 $17}' test.txt
只要确保你逃脱将引号和包围字段分隔符放在单引号中以保护您的 shell。
笔记gawk
我相信这是在 Fedora 20 上使用的。