多列字母和数字排序

多列字母和数字排序

我想排序

  1. 基于文件名。
  2. 对于文件名前缀匹配且文件以数字结尾的情况,我希望根据文件名末尾的数字对它们进行数字排序。

下列

cat /tmp/foo.txt | sort -t/ -k3,3 -k3,3n

完成 1,但未完成 2。

输入/tmp/foo.txt

dirA/catA/apple.txt
dirA/catA/addition.txt
dirA/catA/difference
dirA/catB/binary.txt
dirA/catB/carry.txt
dirA/catB/digit
dirA/catC/test-10.txt
dirA/catC/test-100.txt
dirA/catC/test-1000.txt
dirA/catC/test-11.txt
dirA/catC/test-2.txt
dirA/catC/test-20.txt
dirA/catC/test-25.txt
dirA/catC/test-5.txt
dirA/catC/test-50.txt
dirA/catC/test-500.txt
dirA/catC/test-7.txt
dirA/catC/test-75.txt
dirA/catC/test-8.txt
dirA/catC/abc-test-9.txt
dirA/catC/abc-test-999.txt
dirA/catC/abc-test-75.txt
dirA/catC/abc-test-8.txt

所需输出

dirA/catC/abc-test-8.txt
dirA/catC/abc-test-9.txt
dirA/catC/abc-test-75.txt
dirA/catC/abc-test-999.txt
dirA/catA/addition.txt
dirA/catA/apple.txt
dirA/catB/binary.txt
dirA/catB/carry.txt
dirA/catA/difference
dirA/catB/digit
dirA/catC/test-2.txt
dirA/catC/test-5.txt
dirA/catC/test-7.txt
dirA/catC/test-8.txt
dirA/catC/test-10.txt
dirA/catC/test-11.txt
dirA/catC/test-20.txt
dirA/catC/test-25.txt
dirA/catC/test-50.txt
dirA/catC/test-75.txt
dirA/catC/test-100.txt
dirA/catC/test-500.txt
dirA/catC/test-1000.txt

答案1

Perl 来救援!

perl -e '
    print for sort { (($a =~ m{.*/([^0-9]*)})[0] cmp ($b =~ m{.*/([^0-9]*)})[0])
                     ||
                     (($a =~ /-([0-9]+)/)[0] <=> ($b =~ /-([0-9]+)/)[0]) } <>
' -- /tmp/foo.txt
  • <>读取输入行
  • 种类根据给定的代码对列表进行排序
  • m{.*/([^0-9]*)}提取基本名称最多一个数字(如果存在)
  • cmp进行字符串比较
  • 如果它们相等,则||“或”应用第二个比较,其中:
  • /-([0-9]+)/提取数字
  • <=>进行数值比较
  • 需要该(...)[0]构造,因为匹配返回匹配列表(对应于$1$2等)。需要列表上下文来获取匹配项。我们只对第一场比赛感兴趣(因为没有其他比赛)。

答案2

awk '
    BEGIN {FS = "[-/.]"; OFS = "\t"}
    {n = 0}
    $(NF-1) ~ /^[0-9]+$/ {n = $(NF-1)}
    {print $3, n, $0}
' foo.txt \
| sort -k1,1 -k2,2n \
| cut -f3-

这是一个施瓦茨变换

  • awk 程序将文件名的第一个单词和文件编号作为列放在文件路径之前
  • 数据按名称排序,然后按数字排序
  • 然后新列被删除。

输出

dirA/catC/abc-test-8.txt
dirA/catC/abc-test-9.txt
dirA/catC/abc-test-75.txt
dirA/catC/abc-test-999.txt
dirA/catA/addition.txt
dirA/catA/apple.txt
dirA/catB/binary.txt
dirA/catB/carry.txt
dirA/catA/difference
dirA/catB/digit
dirA/catC/test-2.txt
dirA/catC/test-5.txt
dirA/catC/test-7.txt
dirA/catC/test-8.txt
dirA/catC/test-10.txt
dirA/catC/test-11.txt
dirA/catC/test-20.txt
dirA/catC/test-25.txt
dirA/catC/test-50.txt
dirA/catC/test-75.txt
dirA/catC/test-100.txt
dirA/catC/test-500.txt
dirA/catC/test-1000.txt

与 Perl 单行代码相同的过程(除了您“自下而上”阅读 Perl 语句)

perl -e '
  print join "",
        map  { $_->[2] }
        sort { $a->[0] cmp $b->[0] || $a->[1] <=> $b->[1] }
        map  { [m{.*/(\D+)(\d*)}, $_] }
        <>;
' foo.txt

答案3

使用 sed:

cat /tmp/foo.txt | sed "s/[[:alnum:]-]*\/[[:alnum:]-]*\/\([[:alpha:]-]*\)\([[:digit:]]*\).*/\0|\1|\2 /"|sort -t"|" -k2,2 -k3n|sed "s/\([^|]*\).*/\1/"

诀窍是暂时将所需的字段放在行尾。

哎呀:这样更好:

cat source | sed "s/[^/]*\/[^/]*\/\([^[:digit:]]*\)\([[:digit:]]*\).*/\0|\1|\2 /"|sort -t"|" -k2,2 -k3n|sed "s/\([^|]*\).*/\1/"

我对原来的问题做了一些改动。按姓氏排序,不带数字。

dirA/catC/abc-test-8.txt
dirA/catC/abc-test-9.txt
dirA/catC/abc-test-75.txt
dirA/catC/abc-test-999.txt
dirA/catA/addition.txt
dirA/catA/apple.txt
dirA/catB/binary.txt
dirA/catB/carry.txt
dirA/catA/difference
dirA/catB/digit
dirA/catC/test-2.txt
dirA/catC/test-5.txt
dirA/catC/test-7.txt
dirA/catC/test-8.txt
dirA/catC/test-10.txt
dirA/catC/subdir/test-11.txt
dirA/catC/test-11.txt
dirA/cat C/subdir/test-12.txt
dirA/catC/test-20.txt
dirA/catC/test-25.txt
dirA/catC/test-50.txt
dirA/catC/test-75.txt
dirA/catC/test-100.txt
dirA/catC/test-500.txt
dirA/catC/test-1000.txt
cat /tmp/foo.txt | sed "s/\([^/]*\/\)\+\([^[:digit:]]*\)\([[:digit:]]*\)\(.*\)/\0|\2\4|\3 /"|sort -t"|" -k2,2 -k3n|sed "s/\([^|]*\).*/\1/"

输出:

dirA/catC/abc-test-8.txt
dirA/catC/abc-test-9.txt
dirA/catC/abc-test-75.txt
dirA/catC/abc-test-999.txt
dirA/catA/addition.txt
dirA/catA/apple.txt
dirA/catB/binary.txt
dirA/catB/carry.txt
dirA/catA/difference
dirA/catB/digit
dirA/catC/test-2.txt
dirA/catC/test-5.txt
dirA/catC/test-7.txt
dirA/catC/test-8.txt
dirA/catC/test-10.txt
dirA/catC/subdir/test-11.txt
dirA/catC/test-11.txt
dirA/cat C/subdir/test-12.txt
dirA/catC/test-20.txt
dirA/catC/test-25.txt
dirA/catC/test-50.txt
dirA/catC/test-75.txt
dirA/catC/test-100.txt
dirA/catC/test-500.txt
dirA/catC/test-1000.txt

解释: \([^/]*\/\)\+切断整个路径。 =>\1

\([^[:digit:]]*\)文件名部分不带数字 =>\2

\([[:digit:]]*\)数字 =>3 \(.*\)扩展名 =>4

\0|\2\4|\3打印整行 |文件名第一部分和扩展名 |数字

sort -t"|" -k2,2 -k3n|sed "s/\([^|]*\).*/\1/整理,并剪掉不需要的部分。

而不是最后一个 sedcut -d "|" -f1也可以工作

相关内容