我已经搜索过,我不知道我做错了什么,但我找不到这个问题的答案。
我有一个文件,所有文本都存储为一行。我需要找到一个模式并删除该文本之前和之后的所有文本,直到分隔符。
前任。文件
[{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something":false,"more":"abc","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}]
请记住,这是具有多条记录的单行。我试图找到“abc”并删除上一条记录和下一条记录之间的所有内容。
预期的结果应该是这样的。
[{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}]
我一直在尝试但无法弄清楚这一点,任何帮助将不胜感激。
答案1
正如已经指出的,jq
是用于此类数据的工具。然而,jq 确实施加了某些语法约束,例如“对象列表需要位于用方括号表示的数组中”。
如果您无法确保文件已经是有效的 json,您可以使用 sed 对其进行预处理(我们将通过 jq 进行初始运行,因为结果更容易看到,同时还会检查正确性。 )
$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[]'
{
"something": false,
"more": "123",
"moresamerecord": "otherstuff"
}
{
"something": false,
"more": "abc",
"moresamerecord": "otherstuff"
}
{
"something2": false,
"more": "def",
"moresamerecord": "otherstuff"
}
{
"something2": false,
"more": "456",
"moresamerecord": "otherstuff"
}
现在,让我们修改 jq 命令以删除任何匹配的对象"more": "abc"
:
$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[] | select(.more != "abc")'
{
"something": false,
"more": "123",
"moresamerecord": "otherstuff"
}
{
"something2": false,
"more": "def",
"moresamerecord": "otherstuff"
}
{
"something2": false,
"more": "456",
"moresamerecord": "otherstuff"
}
最后,您似乎还需要一个后处理步骤,将其压缩回带有逗号分隔符且没有空格的一行:
$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[] | select(.more != "abc")' | sed 's/}$/},/' | tr -d ' \n'
{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"},
答案2
基本思想是将模式扩展到分隔符,而不是进一步扩展。
因此,要从最接近的匹配{
,"abc"
您可以查找 a{
后跟任何不是 的字符{
。同样,您可以通过查找后面没有“}”的字符来扩展"abc"
至最近的后续字符。}
}
然后有一些边缘情况来处理逗号。
sed 's/{[^{]*"abc"[^}]*}//;s/,,/,;s/,$//;s/^,//'
如果您的数据比您显示的更复杂,特别是如果{
and}
可以嵌套,那么您可能需要切换到解析。正则表达式“无法计数”,因此虽然您可以编写处理任何特定有限深度(例如 3)的模式,但您无法处理任意深度。
使用评论中的建议jq
当然值得尝试,而不是使用 sed。
答案3
如果jq
这不是解决方案,我建议:
# Instead of a single line pattern matching,
# make the "records" one per line
# then delete the line with the pattern
# finally get everything again to a single line
sed -e 's:,{:\n{:g;s:,$::' file | sed '/abc/d' | tr '\n' ','
一步步:
$ sed -e 's:,{:\n{:g;s:,$::' file
{"something":false,"more":"123","moresamerecord":"otherstuff"}
{"something":false,"more":"abc","moresamerecord":"otherstuff"}
{"something2":false,"more":"def","moresamerecord":"otherstuff"}
{"something2":false,"more":"456","moresamerecord":"otherstuff"}
$ sed -e 's:,{:\n{:g;s:,$::' foo.txt | sed '/abc/d'
{"something":false,"more":"123","moresamerecord":"otherstuff"}
{"something2":false,"more":"def","moresamerecord":"otherstuff"}
{"something2":false,"more":"456","moresamerecord":"otherstuff"}
$ sed -e 's:,{:\n{:g;s:,$::' foo.txt | sed '/abc/d' | tr '\n' ','
{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"},
答案4
awk '
BEGIN { FS = "},{" }
{ k=0
for (i=1; i<=NF; i++)
if ($i !~ /"abc"/)
printf "%s%s", (k++?FS:""), $i
$0=""
}1
' file
$ cat file \
| sed -e 's/},{/}\n{/g' \
| sed -E '/([{:,])"abc"([,:}])/d' \
| paste -sd, - \
;
- 将记录分隔为一条/行。
- 现在删除任何包含以下内容的记录
"abc"
- 用逗号将记录缝合起来
,
输出:
{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}