如果我的问题更多的是印刷错误,我很抱歉,但我已经尝试解决这个问题有一段时间了,遗憾的是,我无法让它发挥作用。也许我应该使用 sed 命令,但我还没有弄清楚如何在 sed 中指定列,尽管我是初学者,但我对 awk 命令有更多的经验。
这就是目标;我有一个 CSV 文件 file1,其中有一列 (14),其中某些行具有 null(空白)值,而其他行具有值。我仍然想要输出中的所有其他列,但只是将第 14 列中的空白(空 [null])列更改为新值不适用。
例子:
第14栏
值1
值2
值3
我尝试使用 awk 命令来定位第 14 列中的任何空白行,如果找到,请在单元格中输入新的文本值 NA。
这是我正在尝试的命令,但我的新文件第 14 列中仍然有空白单元格。如果有任何帮助,我将不胜感激。谢谢。
命令:
awk -F"," 'BEGIN {OFS=","} $14 == "" { $14 = "NA" } {print}' file1 > file2
目标:
例子:
第14栏
值 1
值
2 不适用
值 3
感谢大家花时间阅读和提供帮助。
更新
根据要求,这里是一些示例数据。
"employee_number","employee_login","is_active","send_pkg_email","send_na_email","last_name","first_name","department","title","phone_number","employee_type","email","charge_code","area_code","mailstop","roomid"
"103293","[email protected]","Y","","","Smith","Jessica","","","+1 (650) 3530975","Employee","[email protected]","","LOC0028.03","","03.C.01H"
"103295","[email protected]","Y","","","Long","Fred","","","+1 (415) 9449428","Employee","[email protected]","","LOC0025.01","","01.D.04B"
"103297","[email protected]","Y","","","Cheng","Laura","","","+1 (650) 8623342","Contingent","[email protected]","","","",""
"103307","[email protected]","Y","","","Brown","Chris","","","+1 (512) 9644927","Employee","[email protected]","","ATX0607.16","","16.B.10D"
"103310","[email protected]","Y","","","Williams","Stan","","","+1 (650) 8048591","Employee","[email protected]","","LOC0061.03","","03.D.01B"
答案1
$ perl -MText::CSV=csv -e '
$csv = Text::CSV->new();
while(my $row = $csv->getline(ARGV)) {
$row->[13] = "NA" if ($row->[13] eq "");
$csv->say(STDOUT, $row);
};' input.csv
请注意,perl 数组从 0 开始,而不是 1 - 因此第 14 个字段是$row
arrrayref 的元素 13。
employee_number,employee_login,is_active,send_pkg_email,send_na_email,last_name,first_name,department,title,phone_number,employee_type,email,charge_code,area_code,mailstop,roomid
103293,[email protected],Y,,,Smith,Jessica,,,"+1 (650) 3530975",Employee,[email protected],,LOC0028.03,,03.C.01H
103295,[email protected],Y,,,Long,Fred,,,"+1 (415) 9449428",Employee,[email protected],,LOC0025.01,,01.D.04B
103297,[email protected],Y,,,Cheng,Laura,,,"+1 (650) 8623342",Contingent,[email protected],,NA,,
103307,[email protected],Y,,,Brown,Chris,,,"+1 (512) 9644927",Employee,[email protected],,ATX0607.16,,16.B.10D
103310,[email protected],Y,,,Williams,Stan,,,"+1 (650) 8048591",Employee,[email protected],,LOC0061.03,,03.D.01B
包含employee_number 的行103297
现在有不适用在第14场。
顺便说一句,这里的输出字段仅在必要时才用双引号引起来(例如,当它们包含空格时。或者如果其中任何一个包含逗号,它们也会被引用)。如果您希望输出中的所有字段都像输入文件中那样被引用,请将该$csv = Text::CSV->new();
行更改为:
$csv = Text::CSV->new({always_quote => 1});
文本::CSV还有许多其他选择。例如,如果您使用$csv = Text::CSV->new({always_quote => 1, strict => 1});
它,如果任何输入行具有不同数量的字段,也会触发错误。man Text::CSV
详情请参阅。
或者,对 awk 脚本有一个简单的修复:
awk -F"," 'BEGIN {OFS=","}; $14 == "\"\"" { $14 = "\"NA\"" };1' input.csv
这凸显了仅使用逗号分割 CSV 文件的问题。不可能区分"
作为字段数据包装的字符和"
作为字段数据一部分的字符......因为这种简单的分割方法没有这样的区别。
场 14不是当您只是用逗号分隔输入行时为空。它包含两个引号字符 ( ""
)。
如果任何字段包含逗号字符,此 awk 单行语句也会中断。这是最好使用 CSV 解析器的另一个原因。
还有一个很好的 awk csv 解析器https://github.com/geoffroy-aubry/awk-csv-parser
答案2
第 14 列不包含空字符串,但是""
,因此您需要检查两个引号(必须转义):
awk -F"," 'BEGIN {OFS=","} $14 == "\"\"" { $14 = "NA" } {print}' file1 > file2