如何从第三个字段打印 CSV

如何从第三个字段打印 CSV

我想捕获第三个字段中的 csv 行,直到双引号 (")

more test

"linux02","PLD26","net2-thrift-netconf","net.driver.memory","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.cores","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.instances","2"
"linux02","PLD26","net2-thrift-netconf","net.executor.memory","2"
"linux02","PLD26","net2-thrift-netconf","net.sql.shuffle.partitions","141"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.enabled","true"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.initialExecutors","2"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.minExecutors","2"
"linux02","PLD26","net2-thrift-netconf","net.dynamicAllocation.maxExecutors","20"

我试过这个

sed s'/,/ /g' test | awk '{print $3","$4","$5}' | sed s'/"//g'
,,
net2-thrift-netconf,net.driver.memory
net2-thrift-netconf,net.executor.cores
net2-thrift-netconf,net.executor.instances
net2-thrift-netconf,net.executor.memory
net2-thrift-netconf,net.sql.shuffle.partitions
net2-thrift-netconf,net.dynamicAllocation.enabled
net2-thrift-netconf,net.dynamicAllocation.initialExecutors
net2-thrift-netconf,net.dynamicAllocation.minExecutors
net2-thrift-netconf,net.dynamicAllocation.maxExecutors
,,

但我的语法有问题,因为此语法还打印“,,”,而第二种语法并不优雅。

预期输出:

net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20

答案1

看起来这只是一个问题,或者删除引号然后从第三个字段打印直到行尾:

$ tr -d \" < file | cut -d, -f3-
net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20

因此,删除从第三个到最后一个分隔字段的tr -d \"引号并打印。cut -d, -f3-,

答案2

仅与sed

sed -E 's/"//g; s/^([^,]*,){2}//' infile
  • s/"//g,删除所有双引号。
  • ^([^,]*,){2},从行乞开始,删除所有后面跟有逗号的内容,最多重复两次。

或者与awk

awk -F\" '{$1=$2=$3=$4=$5=""}1' OFS="" infile

答案3

您确实应该对 CSV 数据使用合适的 CSV 解析器。这是一种使用红宝石的方法

ruby -rcsv -e '
  CSV.foreach(ARGV.shift) do |row|
    wanted = row.drop(2)   # ignore first 2 fields
    puts CSV.generate_line(wanted, :force_quotes=>false)
  end
' test
net2-thrift-netconf,net.driver.memory,2
net2-thrift-netconf,net.executor.cores,2
net2-thrift-netconf,net.executor.instances,2
net2-thrift-netconf,net.executor.memory,2
net2-thrift-netconf,net.sql.shuffle.partitions,141
net2-thrift-netconf,net.dynamicAllocation.enabled,true
net2-thrift-netconf,net.dynamicAllocation.initialExecutors,2
net2-thrift-netconf,net.dynamicAllocation.minExecutors,2
net2-thrift-netconf,net.dynamicAllocation.maxExecutors,20

或作为单行

ruby -rcsv -e 'CSV.foreach(ARGV.shift) {|r| puts CSV.generate_line(r.drop(2), :force_quotes=>false)}' test

相关内容