忽略引号内的分隔符

忽略引号内的分隔符

我有一个aa.csv文件如下:

"ID0054XX","PT. SUMUT","18 JL.BONJOL","SUMATERA UTARA, NORTH","MEDAN","","ID9856","PDSUIDSAXXX","","","","Y"
"ID00037687","PAN INDONESIA, PT.","JALAN JENDERAL, SUDIRMAN, SENAYAN","","INDIA","","ID566543","PINBIDJAXXX","","0601","","Y"

我有一个脚本,它将每个逗号分隔值分配给用作,分隔符的唯一变量。

脚本部分如下:

IFS=,

[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }

while read Key  Name    Address1        Address2        City    State   Country SwiftCode       Nid     Chips   Aba     IsSwitching
do
          echo "-------------------------------------------------------------------"

     echo "From Key : $Key"

    echo "-------------------------------------------------------------------"
          echo "-------------------------------------------------------------------"

     echo "From Name : $Name"

它的作用是将引号内有逗号的值与我想要的输出分隔开,即唯一地将每个值与其各自的变量分隔开。

我尝试替换逗号,IFS=[","]但没有成功。非常感谢任何建议/帮助。

答案1

你在这里做错了几件事:

  1. 您正在使用 shell 来解析文本。

    虽然这是可能的,但效率非常低。它很慢,很难写,很难读,而且很难正确执行。 shell 并不是为这类事情而设计的。

  2. 您正在尝试在没有 csv 解析器的情况下解析 csv 文件。

    CSV 不是一种简单的格式。您可以像此处一样拥有包含分隔符的字段。您还可以拥有跨越多行的字段。尝试使用简单的模式匹配来解析任意 CSV 数据是非常非常复杂的,而且极难正确执行。

糟糕的、hacky 的解决方案是这样做:

$ sed 's/","/"|"/g' file.csv | 
    while IFS='|' read -r Key Name Address1 Address2 City \
     State Country SwiftCode Nid Chips Aba IsSwitching; do 
        echo "From Key : $Key"; echo "From Name : $Name"; 
    done
From Key : "ID0054XX"
From Name : "PT. SUMUT"
From Key : "ID00037687"
From Name : "PAN INDONESIA, PT."

这将替换所有",""|"然后用作|分隔符。当然,如果您的任何字段可以包含|.

好的、干净的方法是使用适当的脚本语言(而不是 shell)和 csv 解析器。例如,在 Perl 1中:

$ cat file.csv | perl -MText::CSV -le '
    $csv = Text::CSV->new({binary=>1}); 
    while ($row = $csv->getline(STDIN)){ my ($Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching) = @$row;
print "From Key: $Key\nFrom Name: $Name";}' 
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
    

或者,作为脚本:

#!/usr/bin/perl -l
use strict;
use warnings;
use Text::CSV;

open(my $fh, "file.csv");
my $csv = Text::CSV->new({binary=>1}); 
while (my $row = $csv->getline($fh)){
    my (
            $Key, $Name, $Address1, $Address2, $City,
            $State, $Country, $SwiftCode, $Nid, $Chips,
            $Aba, $IsSwitching
         ) = @$row;
    print "From Key: $Key\nFrom Name: $Name";
}

请注意,您必须Text::CSV先安装该模块 ( cpanm Text::CSV),并且您可能需要安装cpanmcpanminus大多数发行版上的软件包)

或者,在 Python 3 中:

#!/usr/bin/env python3

import csv
with open('file.csv', newline='') as csvfile:
    linereader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in linereader:
        print("From Key: %s\nFrom Name: %s" % (row[0], row[1]))
    

将上面的 Python 代码保存为脚本并在文件上执行将打印:

$ foo.py
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
    

1是的,我知道这是一个 UUoC,但用这种方式写成一行更简单。

相关内容