awk 脚本删除 json 中的块

awk 脚本删除 json 中的块

我有一个换行符分隔的 JSON 文件,其中包含如下条目:

{"id":"eprints.ulster.ac.uk/view/year/2015.html","title":"Items where Year is 2015 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2015.html"}
{"id":"eprints.ulster.ac.uk/view/year/2016.html","title":"Items where Year is 2016 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2016.html"}
{"id":"eprints.ulster.ac.uk/view/year/2017.html","title":"Items where Year is 2017 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2017.html"}
{"id":"eprints.ulster.ac.uk/10386/","title":"Structural performance of rotationally restrained steel columns in fire - Ulster Institutional Repos","url":"eprints.ulster.ac.uk/10386/"}
{"id":"eprints.ulster.ac.uk/10387/","title":"Determining the Effective Length of Fixed End Steel Columns in Fire - Ulster Institutional Repositor","url":"eprints.ulster.ac.uk/10387/"}

我只想.id要不以 开头的块"eprints.ulster.ac.uk/view/"

因此,如果脚本在上面的代码片段上运行,前 3 个块将被删除,剩下的块将是:

{"id":"eprints.ulster.ac.uk/10386/","title":"Structural performance of rotationally restrained steel columns in fire - Ulster Institutional Repos","url":"eprints.ulster.ac.uk/10386/"}
{"id":"eprints.ulster.ac.uk/10387/","title":"Determining the Effective Length of Fixed End Steel Columns in Fire - Ulster Institutional Repositor","url":"eprints.ulster.ac.uk/10387/"}

有人可以帮忙写一个awk脚本来做到这一点吗?

答案1

鉴于您特别要求 Awk 解决方案:

awk -F\" '$4 !~ /eprints.ulster.ac.uk\/view/' file > newfile

答案2

解决方案与jq

cat test.json| jq 'select(.id|startswith("eprints.ulster.ac.uk/view/")|not )'

如果您熟悉管道,语法非常简单。

例如

.id|startswith("eprints.ulster.ac.uk/view/")|not

意味着获取.id每个对象的字段并将其通过管道传递startswith,这会返回一个布尔值,并且该布尔值被否定。

看看手动的fromjq获取更多运算符和选择器。

答案3

对于任何感兴趣的人,我使用以下命令将 Raphael 的解决方案输出到新的 JSON 文件:

cat uir-index.json| jq 'select(.id|startswith("eprints.ulster.ac.uk/view/")|not )' > cleaned-uir-index.json

输出格式又回到了多行代码块。我使用“--compact-output / -c”选项运行了相同的 jq 命令,如下所示:

cat uir-index.json| jq -c 'select(.id|startswith("eprints.ulster.ac.uk/view/")|not )' > cleaned-uir-index.json

这会以换行格式输出清理后的文件。

相关内容