我有一个示例 CSV 文件,其中包含以下内容:
$ cat SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483645,0,"10000"
-2147483644,0,"1002"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"
-2147468087,1,"76245"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"
我想按第三列进行排序,tagno
但我希望它尊重该列中的字母数字值。
期望结果应如下所示:
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"
我尝试了以下方法:
$ sort -t'"' -k2n SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"
但你可以看到1361B
,1361B.1
和1361B.2
几乎是反向排序的。
答案1
使用--version-sort
中的选项sort
。
如果你看一下手册(man sort
),sort
有一个按版本号排序的选项。以下是条目:
-V, --version-sort
Sort version numbers. The input lines are treated as file
names in form PREFIX VERSION SUFFIX, where SUFFIX matches
the regular expression "(.([A-Za-z~][A-Za-z0-9~]*)?)*". The
files are compared by their prefixes and versions (leading
zeros are ignored in version numbers, see example below).
If an input string does not match the pattern, then it is
compared using the byte compare function. All string com-
parisons are performed in C locale, the locale environment
setting is ignored.
这似乎比-n
或-g
排序更好地尊重字母数字值。
使用-V
第三列的标志,您可以获得所需的结果:
$ sort -t'"' -k2V SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"