AWK 命令 - 将 CSV 中的空白“单元格”编辑为文本值

AWK 命令 - 将 CSV 中的空白“单元格”编辑为文本值

如果我的问题更多的是印刷错误,我很抱歉,但我已经尝试解决这个问题有一段时间了,遗憾的是,我无法让它发挥作用。也许我应该使用 sed 命令,但我还没有弄清楚如何在 sed 中指定列,尽管我是初学者,但我对 awk 命令有更多的经验。

这就是目标;我有一个 CSV 文件 file1,其中有一列 (14),其中某些行具有 null(空白)值,而其他行具有值。我仍然想要输出中的所有其他列,但只是将第 14 列中的空白(空 [null])列更改为新值不适用

例子:

第14栏
值1
值2

值3

我尝试使用 awk 命令来定位第 14 列中的任何空白行,如果找到,请在单元格中输入新的文本值 NA。

这是我正在尝试的命令,但我的新文件第 14 列中仍然有空白单元格。如果有任何帮助,我将不胜感激。谢谢。

命令:

awk -F"," 'BEGIN {OFS=","} $14 == "" { $14 = "NA" }  {print}' file1 > file2


目标:

例子:

第14栏
值 1

2 不适用
值 3

感谢大家花时间阅读和提供帮助。

更新

根据要求,这里是一些示例数据。

"employee_number","employee_login","is_active","send_pkg_email","send_na_email","last_name","first_name","department","title","phone_number","employee_type","email","charge_code","area_code","mailstop","roomid"
"103293","[email protected]","Y","","","Smith","Jessica","","","+1 (650) 3530975","Employee","[email protected]","","LOC0028.03","","03.C.01H"
"103295","[email protected]","Y","","","Long","Fred","","","+1 (415) 9449428","Employee","[email protected]","","LOC0025.01","","01.D.04B"
"103297","[email protected]","Y","","","Cheng","Laura","","","+1 (650) 8623342","Contingent","[email protected]","","","",""
"103307","[email protected]","Y","","","Brown","Chris","","","+1 (512) 9644927","Employee","[email protected]","","ATX0607.16","","16.B.10D"
"103310","[email protected]","Y","","","Williams","Stan","","","+1 (650) 8048591","Employee","[email protected]","","LOC0061.03","","03.D.01B"

答案1

$ perl -MText::CSV=csv -e '
  $csv = Text::CSV->new();
  while(my $row = $csv->getline(ARGV)) {
    $row->[13] = "NA" if ($row->[13] eq "");
    $csv->say(STDOUT, $row);
  };' input.csv

请注意,perl 数组从 0 开始,而不是 1 - 因此第 14 个字段是$rowarrrayref 的元素 13。

employee_number,employee_login,is_active,send_pkg_email,send_na_email,last_name,first_name,department,title,phone_number,employee_type,email,charge_code,area_code,mailstop,roomid
103293,[email protected],Y,,,Smith,Jessica,,,"+1 (650) 3530975",Employee,[email protected],,LOC0028.03,,03.C.01H
103295,[email protected],Y,,,Long,Fred,,,"+1 (415) 9449428",Employee,[email protected],,LOC0025.01,,01.D.04B
103297,[email protected],Y,,,Cheng,Laura,,,"+1 (650) 8623342",Contingent,[email protected],,NA,,
103307,[email protected],Y,,,Brown,Chris,,,"+1 (512) 9644927",Employee,[email protected],,ATX0607.16,,16.B.10D
103310,[email protected],Y,,,Williams,Stan,,,"+1 (650) 8048591",Employee,[email protected],,LOC0061.03,,03.D.01B

包含employee_number 的行103297现在有不适用在第14场。

顺便说一句,这里的输出字段仅在必要时才用双引号引起来(例如,当它们包含空格时。或者如果其中任何一个包含逗号,它们也会被引用)。如果您希望输出中的所有字段都像输入文件中那样被引用,请将该$csv = Text::CSV->new();行更改为:

$csv = Text::CSV->new({always_quote => 1});

文本::CSV还有许多其他选择。例如,如果您使用$csv = Text::CSV->new({always_quote => 1, strict => 1});它,如果任何输入行具有不同数量的字段,也会触发错误。man Text::CSV详情请参阅。


或者,对 awk 脚本有一个简单的修复:

awk -F"," 'BEGIN {OFS=","}; $14 == "\"\"" { $14 = "\"NA\"" };1' input.csv

这凸显了仅使用逗号分割 CSV 文件的问题。不可能区分"作为字段数据包装的字符和"作为字段数据一部分的字符......因为这种简单的分割方法没有这样的区别。

场 14不是当您只是用逗号分隔输入行时为空。它包含两个引号字符 ( "")。

如果任何字段包含逗号字符,此 awk 单行语句也会中断。这是最好使用 CSV 解析器的另一个原因。

有没有强大的命令行工具来处理 csv 文件?

还有一个很好的 awk csv 解析器https://github.com/geoffroy-aubry/awk-csv-parser

答案2

第 14 列不包含空字符串,但是"",因此您需要检查两个引号(必须转义):

awk -F"," 'BEGIN {OFS=","} $14 == "\"\"" { $14 = "NA" } {print}' file1 > file2

相关内容