如何参数化“jq”表达式以返回选择或其补集？

Question 1

将布尔值传递到jq表达式中，并使用if- 语句在返回选定集或其补集之间切换：

$ jq -n --argjson yes true '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
3
4
5

$ jq -n --argjson yes false '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
1
2

在更复杂的jq表达式中，修改if语句：

# Match the pathnames given as positional command line arguments
# against the computed pathnames in the "tmp_paths" array in
# each message.  Depending on the $yes boolean variable, extract
# or discard matching messages.
JOIN(
        INDEX($ARGS.positional[]; .);
        .[];
        .tmp_paths[];
        if (.[1:] | any | if $yes then . else not end) then
                .[0]
        else
                empty
        end
)

请注意，它if $yes then . else not end允许变量$yes充当我们是否想要一个集合或其补集的“切换”。在简化的select()和更复杂的中JOIN()，此if语句作用于布尔测试结果，确定元素是否应该成为结果集的一部分。

Answer

将布尔值传递到jq表达式中，并使用if- 语句在返回选定集或其补集之间切换：

$ jq -n --argjson yes true '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
3
4
5

$ jq -n --argjson yes false '$ARGS.positional[] | select( . > 2 | if $yes then . else not end )' --jsonargs 1 2 3 4 5
1
2

在更复杂的jq表达式中，修改if语句：

# Match the pathnames given as positional command line arguments
# against the computed pathnames in the "tmp_paths" array in
# each message.  Depending on the $yes boolean variable, extract
# or discard matching messages.
JOIN(
        INDEX($ARGS.positional[]; .);
        .[];
        .tmp_paths[];
        if (.[1:] | any | if $yes then . else not end) then
                .[0]
        else
                empty
        end
)

请注意，它if $yes then . else not end允许变量$yes充当我们是否想要一个集合或其补集的“切换”。在简化的select()和更复杂的中JOIN()，此if语句作用于布尔测试结果，确定元素是否应该成为结果集的一部分。

Question 2

这@Kusalananda 描述的解决方案可以说是处理所有常见情况的最佳方法，特别是对于偶尔的情况，因为它简单、可读、紧凑，而且相当快。

如果您经常使用这种切换行为以保证稳定的设置，或者如果您愿意付出额外的努力来获得一些速度，您可能会考虑采用不同的方法。

事实上，这种漂亮而简单的方法if ... then ... else ... end有一个缺点，即为流的每个对象添加额外的比较。这种比较恰好有点浪费，因为它的结果总是提前知道的，是来自命令行的静态输入，在执行过程中永远不会改变。

消除这种比较的一种可能方法是使用模块中定义的函数，然后在命令行中选择该函数。

考虑：

# let's set the thing up
$ mkdir dot && echo 'def dot_or_not: .;' > dot/.jq
$ mkdir not && echo 'def dot_or_not: not;' > not/.jq
# now let's use it
$ seq 5 | jq 'include "./"; select ( . > 2 | dot_or_not )' -Ldot
3
4
5
$ seq 5 | jq 'include "./"; select ( . > 2 | dot_or_not )' -Lnot
1
2

在单处理器虚拟机上的一些简单基准测试中，这种方法的平均速度比裸方法快 5 倍if ... then ... else ... end，尽管它可能不会影响您所展示的更大计算的“经济性”。

我自己可能不会为了这样一个简单的操作走这么远……因为在其他可能的（更有价值的）模块之上，每个额外的“开关”都会变得越来越麻烦和笨拙。事实上，我宁愿使用模块来进行真正不同的计算变体……但仍然如此。

为了完整起见，在频谱的另一端，另一种方法可能如下所示：

$ seq 5 | jq 'select( . > 2 | [not,.][$yes] )' --argjson yes 1
3
4
5
$ seq 5 | jq 'select( . > 2 | [not,.][$yes] )' --argjson yes 0
1
2

或其表亲变体：

$ seq 5 | jq 'select( . > 2 | {(tostring):1}[$yes] )' --arg yes true 
3
4
5
$ seq 5 | jq 'select( . > 2 | {(tostring):1}[$yes] )' --arg yes false 
1
2

尽管这些方法看起来很紧凑，但可惜的是，它们也恰好比裸方法慢得多（在我的简单基准测试中慢了 2-4 倍）if ... then ... else ... end，因为它们每个都添加了 2 个易失性对象的构造和一个查找。

Answer