查找搜索文本和 jsonl 文件之间的交集

Question

拥有json您所拥有的内容：

示例.json

{"text": "Alice goes to market"}

使用这个grep命令似乎有效：

grep -Fo -f <(echo  "Alice goes to school" | xargs -n1) <(jq -r '.text' < example.json) | xargs

在哪里grep：

并将jq -r输出显示为原始字符串，而不是 JSON 文本。因此，"Alice goes to market"您将得到：而不是得到：Alice goes to market

关于<(echo "Alice goes to school" | xargs -n1)那个叫做流程替代我使用它而不是传递文件。
此命令：echo "Alice goes to school" | xargs -n1显示以下内容：

Alice
goes
to
school

我也用过流程替代这里：<(jq -r '.text' < example.json)获取 json 密钥的内容text。那么jq -r '.text' < example.json显示的是：

"Alice goes to the market"

基本上， full 的作用grep是搜索string 中的每个单词： Alice, goes, to, 。school"Alice goes to the market"

最后我将输出通过管道传输以xargs获得以下输出：

Alice goes to

如果您不使用管道 ( | xargs)，您将输出为分隔线：

Alice
goes
to

其他案例

如果您的json文件包含以下内容：

[
   {"text": "Alice goes to the market"}
]
[
   {"text": "Alice went to the market"}
]

使用上面的代码将导致失败。所以在这里，由于text键位于第一个位置（索引 0），您可以轻松使用：

grep -Fo -f <(echo  "Alice goes to school" | xargs -n1) <(jq -r '.[0].text' < example2.json) | sort -u | xargs

注意我sort -u在通过管道传输到 xargs ( | xargs) 之前使用过。那是因为grep会显示字符串重复项因为上面的json。如果删除，sort -u您将得到：

Alice goes to Alice to

您comm也可以使用命令来获取交叉点。但你必须订购文件（线条）能够利用这一点：

comm -12 <(echo "Alice goes to school" | xargs -n1 | sort) <(jq -r '.text' < example.json | xargs -n1 | sort)  | xargs

其中仅打印和comm -12中存在的行（其中 file1 和 file2 代表进程替换）file1file2<(code...)

Answer 1