如何使用grep提取div内容？

Question

使用grep -A

$ grep -A 2 'class="col-6"' test.html | sed -n 2p
        <p>One of three columns</p>

从man grep：

-A NUM,在匹配行之后--after-context=NUM
打印 NUM 尾随上下文的行。

或使用awk：

$ awk '/class="col-6"/{getline; print $0}' test.html
        <p>One of three columns</p>

注意：仅当结构与您的测试输入完全相同时，此方法才有效。一般来说我会总是更喜欢合适的 xml / html 解析器。

例如：pythonbeautifulsoup

$ python3 -c '
from bs4 import BeautifulSoup
with open("test.html") as fp:
    soup = BeautifulSoup(fp)
print(soup.findAll("div", {"class":"col-6"})[0].findAll("p")[0])'
<p>One of three columns</p>

或者xmlstarlet像这样使用：

$ xmlstarlet sel -t -m '//div[@class="col-6"]' -c './p' -n test.html
<p>One of three columns</p>

Answer 1

使用grep -A

$ grep -A 2 'class="col-6"' test.html | sed -n 2p
        <p>One of three columns</p>

从man grep：

-A NUM,在匹配行之后--after-context=NUM
打印 NUM 尾随上下文的行。

或使用awk：

$ awk '/class="col-6"/{getline; print $0}' test.html
        <p>One of three columns</p>

注意：仅当结构与您的测试输入完全相同时，此方法才有效。一般来说我会总是更喜欢合适的 xml / html 解析器。

例如：pythonbeautifulsoup

$ python3 -c '
from bs4 import BeautifulSoup
with open("test.html") as fp:
    soup = BeautifulSoup(fp)
print(soup.findAll("div", {"class":"col-6"})[0].findAll("p")[0])'
<p>One of three columns</p>

或者xmlstarlet像这样使用：

$ xmlstarlet sel -t -m '//div[@class="col-6"]' -c './p' -n test.html
<p>One of three columns</p>

如何使用grep提取div内容？

答案1

相关内容