应用程序对链接到某些指定 html 页面的所有页面进行正则表达式搜索

Question

今天我感觉就像你的Mechanical Turk，所以我写了几行 bash 脚本。

获取 $MAINPAGE 中的所有链接：

wget $MAINPAGE -O - | sed 's%<a%\n&%g' | sed 's%.*href=["\']%["\'].*%'

循环遍历它们并搜索正则表达式：

for LINK in $(wget $MAINPAGE -O - | sed 's%<a%\n&%g' | sed 's%.*href=["\']%["\'].*%'); do
  # abort grepping after first match and return the
  # count (number of matches, which is then 0 or 1),
  # if count > 0 then print the LINK url.
  if [ $(wget $LINK -O - | grep -c -m 1 -e 'I_AM_A_REGEX') -gt 0 ]; then
    echo $LINK
  fi
done

附言：未经测试！

Answer 1

今天我感觉就像你的Mechanical Turk，所以我写了几行 bash 脚本。

获取 $MAINPAGE 中的所有链接：

wget $MAINPAGE -O - | sed 's%<a%\n&%g' | sed 's%.*href=["\']%["\'].*%'

循环遍历它们并搜索正则表达式：

for LINK in $(wget $MAINPAGE -O - | sed 's%<a%\n&%g' | sed 's%.*href=["\']%["\'].*%'); do
  # abort grepping after first match and return the
  # count (number of matches, which is then 0 or 1),
  # if count > 0 then print the LINK url.
  if [ $(wget $LINK -O - | grep -c -m 1 -e 'I_AM_A_REGEX') -gt 0 ]; then
    echo $LINK
  fi
done

附言：未经测试！

应用程序对链接到某些指定 html 页面的所有页面进行正则表达式搜索

答案1

相关内容