![从 unix 平面文件(awk、sed)中选择的双引号字段中删除前导零](https://linux22.com/image/165531/%E4%BB%8E%20unix%20%E5%B9%B3%E9%9D%A2%E6%96%87%E4%BB%B6%EF%BC%88awk%E3%80%81sed%EF%BC%89%E4%B8%AD%E9%80%89%E6%8B%A9%E7%9A%84%E5%8F%8C%E5%BC%95%E5%8F%B7%E5%AD%97%E6%AE%B5%E4%B8%AD%E5%88%A0%E9%99%A4%E5%89%8D%E5%AF%BC%E9%9B%B6.png)
我有一个带有单个标题和许多记录的平面文件(txt/csv)。我想使用 awk/sed/unix 工具仅在记录上从字段 3 和 5 中去除前导零(可以是一个或多个前导零)。查看了几种解决方案,但大多数似乎没有考虑双引号的值。例子:
"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......
尝试了一些 awk 和 sed,以及 printf、正则表达式等。我是否遗漏了已经发布的内容?想法?谢谢。
期望的输出:
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE",......
答案1
使用awk
:
awk -F, '{OFS=","; sub(/"0+/, "\"", $3); sub(/"0+/, "\"", $5)}1'
除非您的标头实际上有前导零,否则不应该成为问题,但如果需要,您可以这样做:
awk -F, 'NR > 1{OFS=","; sub(/^"0+/, "\"", $3); sub(/^"0+/, "\"", $5)}1'
这将"
仅用引号替换字段 3 和 5 中的所有前导零。
答案2
使用磨坊主,给定一个带有标题的 CSV 文件
$ cat file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......
(选择愚蠢的字段名称来证明标头中的前导零不受影响)然后
$ mlr --csv --ofmt '%.0f' --quote-all put '$000003=$000003; $000005=$000005' file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"
或者,使用csvformat
(来自 python csvkit)和numfmt
(来自 GNU Coreutils):
$ csvformat file.csv | numfmt -d, --header --field 3,5 --format '%.0f' | csvformat -U2
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"
答案3
使用 GNU sed:
$ sed -re '
s/","/\n/4;s//\n/2
s/\n0*([0-9])/","\1/g
' file.csv
假设所有字段都被引用。
用换行符标记第三个和第五个字段并删除所有前导零。如果全部为零,它将保留最后而不是使场消失。