查找字符串并删除两个分隔符之间的所有内容

查找字符串并删除两个分隔符之间的所有内容

我已经搜索过,我不知道我做错了什么,但我找不到这个问题的答案。

我有一个文件,所有文本都存储为一行。我需要找到一个模式并删除该文本之前和之后的所有文本,直到分隔符。

前任。文件

[{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something":false,"more":"abc","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}]

请记住,这是具有多条记录的单行。我试图找到“abc”并删除上一条记录和下一条记录之间的所有内容。

预期的结果应该是这样的。

[{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}]

我一直在尝试但无法弄清楚这一点,任何帮助将不胜感激。

答案1

正如已经指出的,jq是用于此类数据的工具。然而,jq 确实施加了某些语法约束,例如“对象列表需要位于用方括号表示的数组中”。

如果您无法确保文件已经是有效的 json,您可以使用 sed 对其进行预处理(我们将通过 jq 进行初始运行,因为结果更容易看到,同时还会检查正确性。 )

$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[]'
{
  "something": false,
  "more": "123",
  "moresamerecord": "otherstuff"
}
{
  "something": false,
  "more": "abc",
  "moresamerecord": "otherstuff"
}
{
  "something2": false,
  "more": "def",
  "moresamerecord": "otherstuff"
}
{
  "something2": false,
  "more": "456",
  "moresamerecord": "otherstuff"
}

现在,让我们修改 jq 命令以删除任何匹配的对象"more": "abc"

$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[] | select(.more != "abc")'
{
  "something": false,
  "more": "123",
  "moresamerecord": "otherstuff"
}
{
  "something2": false,
  "more": "def",
  "moresamerecord": "otherstuff"
}
{
  "something2": false,
  "more": "456",
  "moresamerecord": "otherstuff"
}

最后,您似乎还需要一个后处理步骤,将其压缩回带有逗号分隔符且没有空格的一行:

$ sed 's/^/[/; s/,$/]/' data.txt | jq -r '.[] | select(.more != "abc")' | sed 's/}$/},/' | tr -d ' \n'
{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"},

答案2

基本思想是将模式扩展到分隔符,而不是进一步扩展。

因此,要从最接近的匹配{"abc"您可以查找 a{后跟任何不是 的字符{。同样,您可以通过查找后面没有“}”的字符来扩展"abc"至最近的后续字符。}}

然后有一些边缘情况来处理逗号。

sed 's/{[^{]*"abc"[^}]*}//;s/,,/,;s/,$//;s/^,//'

如果您的数据比您显示的更复杂,特别是如果{and}可以嵌套,那么您可能需要切换到解析。正则表达式“无法计数”,因此虽然您可以编写处理任何特定有限深度(例如 3)的模式,但您无法处理任意深度。

使用评论中的建议jq当然值得尝试,而不是使用 sed。

答案3

如果jq这不是解决方案,我建议:

# Instead of a single line pattern matching,
# make the "records" one per line
# then delete the line with the pattern
# finally get everything again to a single line
sed -e 's:,{:\n{:g;s:,$::' file | sed '/abc/d' | tr '\n' ','

一步步:

$ sed -e 's:,{:\n{:g;s:,$::' file
{"something":false,"more":"123","moresamerecord":"otherstuff"}
{"something":false,"more":"abc","moresamerecord":"otherstuff"}
{"something2":false,"more":"def","moresamerecord":"otherstuff"}
{"something2":false,"more":"456","moresamerecord":"otherstuff"}
$ sed -e 's:,{:\n{:g;s:,$::' foo.txt | sed '/abc/d'
{"something":false,"more":"123","moresamerecord":"otherstuff"}
{"something2":false,"more":"def","moresamerecord":"otherstuff"}
{"something2":false,"more":"456","moresamerecord":"otherstuff"}
$ sed -e 's:,{:\n{:g;s:,$::' foo.txt | sed '/abc/d' | tr '\n' ','
{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"},

答案4

awk '
  BEGIN { FS = "},{" }
  { k=0
    for (i=1; i<=NF; i++)
      if ($i !~ /"abc"/)
        printf "%s%s", (k++?FS:""), $i
    $0=""
  }1
' file

$ cat file \
| sed -e 's/},{/}\n{/g'           \
| sed -E '/([{:,])"abc"([,:}])/d' \
| paste -sd, -                    \
;
  • 将记录分隔为一条/行。
  • 现在删除任何包含以下内容的记录"abc"
  • 用逗号将记录缝合起来,

输出:

{"something":false,"more":"123","moresamerecord":"otherstuff"},{"something2":false,"more":"def","moresamerecord":"otherstuff"},{"something2":false,"more":"456","moresamerecord":"otherstuff"}

相关内容