sed 语法删除 xml

Question 1

你确实应该使用合适的 HTML 或 XML 解析工具。尝试使用正则表达式解析它会导致疯狂。

但是，对于简单的情况：

curl --silent www.brainyquote.com | egrep 'span class="body' | sed -n '6,7{s/<[^>]*>//g;p}'

对于 OS X：

curl --silent www.brainyquote.com | egrep 'span class="body' | sed -n '6,7{' -e 's/<[^>]*>//g' -e 'p' -e '}'

这对 mjb 有用：

curl --silent www.brainyquote.com | egrep '(span class="body")|(span class="bodybold")' | sed -n '6p; 7p; ' | sed -e 's/<[^>]*>//g'

Answer

你确实应该使用合适的 HTML 或 XML 解析工具。尝试使用正则表达式解析它会导致疯狂。

但是，对于简单的情况：

curl --silent www.brainyquote.com | egrep 'span class="body' | sed -n '6,7{s/<[^>]*>//g;p}'

对于 OS X：

curl --silent www.brainyquote.com | egrep 'span class="body' | sed -n '6,7{' -e 's/<[^>]*>//g' -e 'p' -e '}'

这对 mjb 有用：

curl --silent www.brainyquote.com | egrep '(span class="body")|(span class="bodybold")' | sed -n '6p; 7p; ' | sed -e 's/<[^>]*>//g'

Question 2

为了完整性，使用 HTML tidy 和 xmlstarlet 的解决方案：

# note: use recent versions of tidy and xmlstarlet
curl -s www.brainyquote.com | 
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:td[@align='center' and @valign='top' and @width='300']/x:span[@class='body']" -v '.' -n \
-m "//x:html/x:body/x:div/x:table/x:tr[position()=2]/x:td[@align='center' and @valign='top' and @width='300']/x:span[@class='bodybold']" -v '.' -n

Answer

为了完整性，使用 HTML tidy 和 xmlstarlet 的解决方案：

# note: use recent versions of tidy and xmlstarlet
curl -s www.brainyquote.com | 
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:td[@align='center' and @valign='top' and @width='300']/x:span[@class='body']" -v '.' -n \
-m "//x:html/x:body/x:div/x:table/x:tr[position()=2]/x:td[@align='center' and @valign='top' and @width='300']/x:span[@class='bodybold']" -v '.' -n

sed 语法删除 xml

答案1

答案2

相关内容