jq 内部排序比 GNU 排序慢吗？

2024-5-16 • tag-icon

在过滤的同时这个json文件我做了一个基准发现利用jq的内部sort方法unique实际上是慢 25%比sort --unique！

命令	平均值 [毫秒]	最短[毫秒]	最大[毫秒]	相对的
`jq "[.[].category] \\| sort \\| unique" channels.json`	172.0±2.6	167.8	176.8	1.25±0.06
`jq "[.[].category \\| select((. != null) and (. != \"XXX\"))] \\| sort \\| unique" channels.json`	151.9±4.1	146.5	163.9	1.11±0.06
`jq ".[].category" channels.json \\| sort -u`	137.2±6.6	131.8	156.6	1.00

Summary
  'jq ".[].category" channels.json | sort -u' ran
    1.11 ± 0.06 times faster than 'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json'
    1.25 ± 0.06 times faster than 'jq "[.[].category] | sort | unique" channels.json'

测试命令：

hyperfine --warmup 3 \
    'jq "[.[].category] | sort | unique" channels.json'  \
    'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json' \
    'jq ".[].category" channels.json | sort -u'

如果我们只测试排序（没有唯一性），那么 jq 又是慢 9%比排序：

命令	平均值 [毫秒]	最短[毫秒]	最大[毫秒]	相对的
`jq "[.[].category] \\| sort" channels.json`	133.9±1.6	131.1	138.2	1.09±0.02
`jq ".[].category" channels.json \\| sort`	123.0±1.3	120.5	125.7	1.00

Summary
  'jq ".[].category" channels.json | sort' ran
    1.09 ± 0.02 times faster than 'jq "[.[].category] | sort" channels.json'

版本：

jq-1.5-1-a5b5cbe
sort (GNU coreutils) 8.28

我预计使用 jq 的内部函数会比通过管道传输到本身应该生成的外部应用程序更快的处理速度。难道我jq用得不好？

更新刚刚在具有 FLASH 存储、Arm CPU 和以下版本的主机上重复了此实验：

jq-1.6
sort (GNU coreutils) 8.32

结果：

Benchmark #1: jq "[.[].category] | sort" channels.json
  Time (mean ± σ):     587.8 ms ±   3.9 ms    [User: 539.5 ms, System: 44.2 ms]
  Range (min … max):   582.8 ms … 594.2 ms    10 runs
 
Benchmark #2: jq ".[].category" channels.json | sort
  Time (mean ± σ):     606.0 ms ±   8.6 ms    [User: 569.5 ms, System: 49.0 ms]
  Range (min … max):   589.6 ms … 616.2 ms    10 runs
 
Summary
  'jq "[.[].category] | sort" channels.json' ran
    1.03 ± 0.02 times faster than 'jq ".[].category" channels.json | sort'

现在 jq 排序的运行速度比 GNU 排序快 3% :D

相关内容