我有多个包含许多记录的 CSV 文件。总行数应该是 134。我有很多文件,每行都有自己的列数(从 15 到 200)。我需要根据列数对它们进行排序。
我能够使用以下方法计算文件的列数:
$ awk -F"," '{print NF}' file # 1.csv
...这给出了类似的东西:
134
134
134
5
25
133
...
现在,我想将这些数字添加到每一行,以便稍后可以根据它对行进行排序。如何在每个信息的开头添加这些信息并进行排序?
我还想将 value=134 的文件按各自的计数拆分为 1 个其他文件。
小输入文件示例(共 3 行):
2,"A.B.C.D",50,"SDf3oa701-ab73-a0pcs90","7012218969217-1413752517-32448","SDf3oa701-ab73-a0pcs90","SIP",,"<[email protected]>;tag=70122","<[email protected]>",17,0,"00:01:57.827 GMT Oct 20 2014","00:00:00.000 UTC Jan 01 1970","00:01:57.870 GMT Oct 20 2014",3,"sp3",1904,"sp3",1904,"realm_IN","realmTERM_OUT",,,,"::",0,"::",0,,"::",0,"::",0,0,0,0,0,0,0,0,0,0,0,,,,"::",0,"::",0,,"::",0,"::",0,0,0,0,0,0,0,0,0,0,0,,,,"::",0,"::",0,,"::",0,"::",0,0,0,0,0,0,0,0,0,0,0,,,,"::",0,"::",0,,"::",0,"::",0,0,0,0,0,0,0,0,0,0,0,,,"Sw-buildabcd","GMT-03:00",0,"[email protected]",,,,,,"X.Y.Z.W:50","A.S.D.F:50","A.S.D.F:50","A.S.D.F:50",,1,2,1,404,"[email protected]",,,4493101
2,"A.B.C.D",50,,,,4493105
2,"A.B.C.D",50,,"[email protected]",,,4493106
答案1
认为这就是您想要
添加-F,
的逗号分隔。
例如awk -F, '$(NF+1)=NF' file
将数字添加到行尾
awk '$(NF+1)=NF' file
输入
1
1 2 3
1 2
1 2 3 4 5 6
a b
输出
1 1
1 2 3 3
1 2 2
1 2 3 4 5 6 6
a b 2
对行进行排序
awk '{a[NF]=a[NF]?a[NF]"\n"$0:$0;x=x<NF?NF:x}END{for(i=1;i<=x;i++)if(i in a)print a[i]}'
输入
1
1 2 3
1 2
1 2 3 4 5 6
a b
输出
1
1 2
a b
1 2 3
1 2 3 4 5 6
打印到不同的文件
例如,使用字段长度 4,更改为 134 或任何您想要的值
awk '{print > (NF>=4?"LargeFile.txt":"SmallFile.txt")}' file
输入
1
1 2 3
1 2
1 2 3 4 5 6
a b
输出
LargeFile.txt
1 2 3 4 5 6
SmallFile.txt
1
1 2 3
1 2
a b
答案2
与@terdon的答案类似,但包含sed
:
{ seq -s, 10; seq -s, 5; seq -s, 15; } |
tee - -
这是我的 infile - 它看起来像:
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
然后我可以这样做:
sed 'h;s/[^,]*//g;G;s/\n/ /' | sort -t\ -nk1,1
...这得到...
,,,, 1,2,3,4,5
,,,, 1,2,3,4,5
,,,, 1,2,3,4,5
,,,,,,,,, 1,2,3,4,5,6,7,8,9,10
,,,,,,,,, 1,2,3,4,5,6,7,8,9,10
,,,,,,,,, 1,2,3,4,5,6,7,8,9,10
,,,,,,,,,,,,,, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,,,,,,,,,,,,,, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,,,,,,,,,,,,,, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
数字不存在,但我想计数是肯定的。要删除前导逗号,我可以这样做:
PIPELINE | sed 's/,* //'
...这得到...
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
也许不是迄今为止最吉祥的答案,但我决定写这篇文章的主要原因是您提到您想要将包含 134 个逗号分隔条目的行写入另一个文件。碰巧,这对于 来说是一件简单的事情sed
。例如 - 假设我想将上述序列中包含 10 个字段的行写入 a file2
:
PIPELINE | sed '/^\([^,]*,[^,]*\)\{9\}$/w file2'
cat file2
输出
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
我使用\{9\}
上面的方法是因为它指定了该模式的 9 个实例 - 这使得 9 个分隔符成为 10 个分隔字段。范围也可以简单处理:
PIPELINE | sed '/^\([^,]*,[^,]*\)\{4,9\}$/w file2'
cat file2
输出
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
1,2,3,4,5,6,7,8,9,10
答案3
这会将(逗号分隔)字段的数量添加到每行的开头,打印该行,然后对所有内容进行排序:
awk -F"," '{print NF,$0}' *csv | sort -nk1,1
这-n
是数字排序,并-k1,1
确保它仅在第一个字段上排序。要删除排序后的字段数,请使用:
awk -F"," 'print NF,$0' *csv | sort -nk1,1 | cut -d ' ' -f 2-
笔记:根据您的实际数据,这很容易损坏。字段内可以有逗号吗?可以有跨多行的字段吗?这是一种非常幼稚的方法,无法解决任何问题。