我有一个换行符分隔的 JSON 文件,其中包含如下条目:
{"id":"eprints.ulster.ac.uk/view/year/2015.html","title":"Items where Year is 2015 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2015.html"}
{"id":"eprints.ulster.ac.uk/view/year/2016.html","title":"Items where Year is 2016 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2016.html"}
{"id":"eprints.ulster.ac.uk/view/year/2017.html","title":"Items where Year is 2017 - Ulster Institutional Repository","url":"eprints.ulster.ac.uk/view/year/2017.html"}
{"id":"eprints.ulster.ac.uk/10386/","title":"Structural performance of rotationally restrained steel columns in fire - Ulster Institutional Repos","url":"eprints.ulster.ac.uk/10386/"}
{"id":"eprints.ulster.ac.uk/10387/","title":"Determining the Effective Length of Fixed End Steel Columns in Fire - Ulster Institutional Repositor","url":"eprints.ulster.ac.uk/10387/"}
我只想.id
要不以 开头的块"eprints.ulster.ac.uk/view/"
因此,如果脚本在上面的代码片段上运行,前 3 个块将被删除,剩下的块将是:
{"id":"eprints.ulster.ac.uk/10386/","title":"Structural performance of rotationally restrained steel columns in fire - Ulster Institutional Repos","url":"eprints.ulster.ac.uk/10386/"}
{"id":"eprints.ulster.ac.uk/10387/","title":"Determining the Effective Length of Fixed End Steel Columns in Fire - Ulster Institutional Repositor","url":"eprints.ulster.ac.uk/10387/"}
有人可以帮忙写一个awk
脚本来做到这一点吗?
答案1
鉴于您特别要求 Awk 解决方案:
awk -F\" '$4 !~ /eprints.ulster.ac.uk\/view/' file > newfile
答案2
答案3
对于任何感兴趣的人,我使用以下命令将 Raphael 的解决方案输出到新的 JSON 文件:
cat uir-index.json| jq 'select(.id|startswith("eprints.ulster.ac.uk/view/")|not )' > cleaned-uir-index.json
输出格式又回到了多行代码块。我使用“--compact-output / -c”选项运行了相同的 jq 命令,如下所示:
cat uir-index.json| jq -c 'select(.id|startswith("eprints.ulster.ac.uk/view/")|not )' > cleaned-uir-index.json
这会以换行格式输出清理后的文件。