我有一些这样的文字:
Sentence #1 (n tokens):
Blah Blah Blah
[...
...
...]
( #start first set here
... (other possible parens and text here)
) #end first set here
(...)
(...)
Sentence #2 (n tokens):
我想提取第二组括号(包括中间的所有内容),即,
(
... (other possible parens here)
)
有没有一种 bash 方法可以做到这一点。我尝试了简单的
's/(\(.*\))/\1/'
答案1
这样就可以了。可能有更好的方法,但这是我想到的第一个方法:
echo 'Sentence #1 (n tokens):
Blah Blah Blah
[...
...
...]
(
... (other possible parens here)
)
(...)
(...)
Sentence #2 (n tokens):
' | perl -0777 -nE '
$wanted = 2;
$level = 0;
$text = "";
for $char (split //) {
$level++ if $char eq "(";
$text .= $char if $level > 0;
if ($char eq ")") {
if (--$level == 0) {
if (++$n == $wanted) {
say $text;
exit;
}
$text="";
}
}
}
'
输出
(
... (other possible parens here)
)
答案2
Glenn 的答案很好(对于大输入可能更快),但根据记录,Glenn 的建议在 bash 中也是完全可能的。在短短几分钟内将他的答案移植到纯 bash 是一个相对简单的事情:
s='Sentence #1 (n tokens):
Blah Blah Blah
[...
...
...]
(
... (other possible parens here)
)
(...)
(...)
Sentence #2 (n tokens):
'
wanted=2
level=0
text=""
for (( i=0; i<${#s}; i++ )); do
char="${s:i:1}"
if [ "$char" == "(" ]; then (( level++ )) ; fi
if (( level > 0 )); then text+="$char"; fi
if [ "$char" == ")" ]; then
if (( --level == 0 )); then
if (( ++n == wanted )); then
echo "$text"
exit
fi
text=""
fi
fi
done