我正在使用该xls2csv
二进制文件在我的 Red Hat Linux 计算机上将 XLS 文档转换为 CSV。
例如:(来自手册页):
xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8
但我注意到以下几点导致我的 Bash 脚本出现问题:
CSV 输出包含不必要的空格(在单词的左侧或单词的右侧)
CSV 中错误语法的示例:
,"/var/adm/sys ldd/all /Comm/logs ","WORD "," WORD"
CSV 中正确语法的示例:
,"/var/adm/sys ldd/all /Comm/logs",WORD,WORD
当不需要时,引号会出现在 CSV 中:
CSV 中错误语法的示例:
," WORD ",
csv 中正确语法的示例
,WORD,
如何更改输出以创建“干净”的 CSV 文件?
我正在寻找 awk/sed/perl oneliner,或者任何其他可以在 Bash 脚本中工作的解决方案。
修复前的 CSV 文件示例:
1,"/var/adm/sys ldd/all /Comm/logs",34356,"234245 ",24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt"," 45356",435," 578 58976 "
3," add this line in crontab :",34356,"234245 ",24245
4,"1.0348 54 35.5"," 45356"," 435","578 "
4,"1 2 "," 45356 95857 "," 435","578 "
5,"1 2 "," 45356 95857 "," "435","578" "
6,"1.0348 54 35.5"," 45356"," "4""" ""35","578 "
7,"1.0348 54 35.5",""45356",""4"""""35,"578 "
更正后的 CSV 文件示例(修复后):
1,"/var/adm/sys ldd/all /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab :",34356,234245,24245
4,"1.0348 54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"
6,"1.0348 54 35.5",45356,"4""" ""35,578
7,"1.0348 54 35.5",""45356",""4"""""35,578
字段中不能出现逗号。
请注意 字段中包含的显式换行符line 2
。
当字段位于双引号内并且不包含空格(例如第 7 行""45356"
)时,不得删除这些双引号,因为包括这些引号的整个字段都是编码密码。
答案1
此 Perl 代码产生几乎完全符合预期的输出:
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1, eol => $/, allow_loose_quotes => 1, escape_char => undef });
open my $io, "<", $ARGV[0] or die;
while (my $row = $csv->getline ($io)) {
my @o = map { $_ =~ s,^\s*,,; $_ =~ s,\s*$,,; $_; } @{$row};
$csv->print(STDOUT, \@o);
}
输出是
1,"/var/adm/sys ldd/all /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab :",34356,234245,24245
4,"1.0348 54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857",""435","578""
6,"1.0348 54 35.5",45356,""4""" ""35",578
7,"1.0348 54 35.5",""45356",""4"""""35,"578"