按包含字母数字值的列对 CSV 文件进行排序

按包含字母数字值的列对 CSV 文件进行排序

我有一个示例 CSV 文件,其中包含以下内容:

$ cat SAMPLE.CSV 
compid,active,tagno
-2147483646,1,"1"
-2147483645,0,"10000"
-2147483644,0,"1002"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"
-2147468087,1,"76245"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"

我想按第三列进行排序,tagno但我希望它尊重该列中的字母数字值。

期望结果应如下所示:

compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

我尝试了以下方法:

$ sort -t'"' -k2n SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

但你可以看到1361B1361B.11361B.2几乎是反向排序的。

答案1

使用--version-sort中的选项sort

如果你看一下手册(man sort),sort有一个按版本号排序的选项。以下是条目:

-V, --version-sort
             Sort version numbers.  The input lines are treated as file
             names in form PREFIX VERSION SUFFIX, where SUFFIX matches
             the regular expression "(.([A-Za-z~][A-Za-z0-9~]*)?)*".  The
             files are compared by their prefixes and versions (leading
             zeros are ignored in version numbers, see example below).
             If an input string does not match the pattern, then it is
             compared using the byte compare function.  All string com-
             parisons are performed in C locale, the locale environment
             setting is ignored.

这似乎比-n-g排序更好地尊重字母数字值。

使用-V第三列的标志,您可以获得所需的结果:

$ sort -t'"' -k2V SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

相关内容