从 unix 平面文件(awk、sed)中选择的双引号字段中删除前导零

从 unix 平面文件(awk、sed)中选择的双引号字段中删除前导零

我有一个带有单个标题和许多记录的平面文件(txt/csv)。我想使用 awk/sed/unix 工具仅在记录上从字段 3 和 5 中去除前导零(可以是一个或多个前导零)。查看了几种解决方案,但大多数似乎没有考虑双引号的值。例子:

"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......

尝试了一些 awk 和 sed,以及 printf、正则表达式等。我是否遗漏了已经发布的内容?想法?谢谢。

期望的输出:

"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE",......

答案1

使用awk

awk -F, '{OFS=","; sub(/"0+/, "\"", $3); sub(/"0+/, "\"", $5)}1'

除非您的标头实际上有前导零,否则不应该成为问题,但如果需要,您可以这样做:

awk -F, 'NR > 1{OFS=","; sub(/^"0+/, "\"", $3); sub(/^"0+/, "\"", $5)}1'

这将"仅用引号替换字段 3 和 5 中的所有前导零。

答案2

使用磨坊主,给定一个带有标题的 CSV 文件

$ cat file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","022000046977525","REAL","00000220000488","ONLINE",......

(选择愚蠢的字段名称来证明标头中的前导零不受影响)然后

$ mlr --csv --ofmt '%.0f' --quote-all put '$000003=$000003; $000005=$000005' file.csv
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"

或者,使用csvformat(来自 python csvkit)和numfmt(来自 GNU Coreutils):

$ csvformat file.csv | numfmt -d, --header --field 3,5 --format '%.0f' | csvformat -U2
"000001","000002","000003","000004","000005","000006","000007"
"ACCOUNT","REAL","22000046977525","REAL","220000488","ONLINE","......"

答案3

使用 GNU sed:

$ sed -re '
   s/","/\n/4;s//\n/2
   s/\n0*([0-9])/","\1/g
' file.csv

假设所有字段都被引用。

用换行符标记第三个和第五个字段并删除所有前导零。如果全部为零,它将保留最后而不是使场消失。

相关内容