从 HTML 文件中提取字符串

Question 1

使用awk和多个分隔符-F

searchfor="vodlocker"
wget -q -O- http://pastebin.com/raw/VbrXHEYd | awk -F'SRC="|"' '/SRC/ && /'"$searchfor"'/  {print $4}'

示例输出：

$ searchfor="vodlocker"; wget -q -O- http://pastebin.com/raw/VbrXHEYd | awk -F'SRC="|"' '/SRC/ && /'"$searchfor"'/  {print $4}' 
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Answer

使用awk和多个分隔符-F

searchfor="vodlocker"
wget -q -O- http://pastebin.com/raw/VbrXHEYd | awk -F'SRC="|"' '/SRC/ && /'"$searchfor"'/  {print $4}'

示例输出：

$ searchfor="vodlocker"; wget -q -O- http://pastebin.com/raw/VbrXHEYd | awk -F'SRC="|"' '/SRC/ && /'"$searchfor"'/  {print $4}' 
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Question 2

grep与 PCRE 一起使用（ -P）：

grep -Po 'SRC="\K[^"]+(?=")' testfile.txt

和sed：

sed -nr 's/.*SRC="([^"]+)".*/\1/p' testfile.txt

两者都采用所需的字符串，用双引号引起来，并SRC=在前面加上。

例子：

% wget -q -O- http://pastebin.com/raw/VbrXHEYd | grep -Po 'SRC="\K[^"]+(?=")'      
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

% wget -q -O- http://pastebin.com/raw/VbrXHEYd | sed -nr 's/.*SRC="([^"]+)".*/\1/p'
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Answer

grep与 PCRE 一起使用（ -P）：

grep -Po 'SRC="\K[^"]+(?=")' testfile.txt

和sed：

sed -nr 's/.*SRC="([^"]+)".*/\1/p' testfile.txt

两者都采用所需的字符串，用双引号引起来，并SRC=在前面加上。

例子：

% wget -q -O- http://pastebin.com/raw/VbrXHEYd | grep -Po 'SRC="\K[^"]+(?=")'      
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

% wget -q -O- http://pastebin.com/raw/VbrXHEYd | sed -nr 's/.*SRC="([^"]+)".*/\1/p'
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Question 3

我刚刚发现小狗，这很棒！

$ curl -s https://pastebin.com/raw/VbrXHEYd | pup 'iframe attr{src}'

结果

http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Answer

我刚刚发现小狗，这很棒！

$ curl -s https://pastebin.com/raw/VbrXHEYd | pup 'iframe attr{src}'

结果

http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Question 4

你也可以html2使用sed：

$ curl -s http://pastebin.com/raw/VbrXHEYd | html2 | sed '/iframe\/@src=/!d;s/^.*src=//'
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

Answer

你也可以html2使用sed：

$ curl -s http://pastebin.com/raw/VbrXHEYd | html2 | sed '/iframe\/@src=/!d;s/^.*src=//'
http://vodlocker.com/embed-wrdlm4dbigu4-850x450.html

从 HTML 文件中提取字符串

答案1

答案2

答案3

答案4

相关内容