我有一个带有单个标题和许多记录的平面文件(txt/csv)。我想使用 awk/sed/unix 工具仅在记录上从字段 3 和 5 中去除前导零(可以是一个或多个前导零)。查看了几种解决方案,但大多数似乎没有考虑双引号的值。例子:
"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......
尝试了一些 awk 和 sed,以及 printf、正则表达式等。我是否遗漏了已经发布的内容?想法?谢谢。
期望的输出:
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE",......
答案1
使用awk
:
awk -F, '{OFS=","; sub(/"0+/, "\"", $3); sub(/"0+/, "\"", $5)}1'
除非您的标头实际上有前导零,否则不应该成为问题,但如果需要,您可以这样做:
awk -F, 'NR > 1{OFS=","; sub(/^"0+/, "\"", $3); sub(/^"0+/, "\"", $5)}1'
这将"
仅用引号替换字段 3 和 5 中的所有前导零。
答案2
使用磨坊主,给定一个带有标题的 CSV 文件
$ cat file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......
(选择愚蠢的字段名称来证明标头中的前导零不受影响)然后
$ mlr --csv --ofmt '%.0f' --quote-all put '$000003=$000003; $000005=$000005' file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"
或者,使用csvformat
(来自 python csvkit)和numfmt
(来自 GNU Coreutils):
$ csvformat file.csv | numfmt -d, --header --field 3,5 --format '%.0f' | csvformat -U2
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"
答案3
使用 GNU sed:
$ sed -re '
s/","/\n/4;s//\n/2
s/\n0*([0-9])/","\1/g
' file.csv
假设所有字段都被引用。
用换行符标记第三个和第五个字段并删除所有前导零。如果全部为零,它将保留最后而不是使场消失。