如何从 JSON 页面获取 URL,用户在该页面中搜索某个单词,然后显示包含该单词的所有网站?

如何从 JSON 页面获取 URL,用户在该页面中搜索某个单词,然后显示包含该单词的所有网站?

我正在尝试在 bash shell 中使用一个脚本,该脚本返回某个网页上的当前网址...我拥有的是一个返回所有网址的脚本,但您需要在代码中放入您想要的链接,我想要那个用户输入一个单词,然后返回包含该单词的所有 URL。像这样./reddit.sh Linux,然后它会显示带有该单词的 URL。这是我到目前为止的代码:

wget -qO- http://reddit.com/ | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | unique

答案1

完整的解决方案:

使用过的东西:巴什,获取,xmllint,sed,种类

reddit.sh脚本:

#!/bin/bash

search_word="$1"

wget -qO - --follow-tags=a "http://reddit.com/search?q=${search_word}" \
|  xmllint --html --xpath '//a[contains(@href, "'"${search_word}"'")]' - 2>/dev/null \
| sed 's/<\/a>/&\n/g' | sort -u

用法:

$ bash reddit.sh linux

输出(缩短):

<a href="https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/" class="search-link may-blank">https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/</a>
<a href="https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation">https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation</a>
<a href="https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418" class="search-link may-blank">https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-comments may-blank">2,315 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-title may-blank">Every time I try out linux</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-comments may-blank">269 comments</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-title may-blank">20170825: Happy Birthday Linux</a>
...

附加测试用例,搜索python

$ bash reddit.sh python

输出(缩短):

<a href="https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel" class="search-link may-blank">https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-comments may-blank">302 comments</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-title may-blank">Check out this python!</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-comments may-blank">1,364 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-title may-blank">Monty Python Life Of Brian is still relevant today</a>
...

答案2

你尝试过类似的事情吗

target="reddit"; wget -qO- http://reddit.com/ | grep -Po "http.*?(?=\")" | grep -i $target | sort | uniq

编辑:与@RomanPerekhrest 相同的路线进行扩展

target="linux"; wget -qO- "http://reddit.com/search?q=${target}" | grep -Po "http.*?(?=\")" | grep $target | sort -u

编辑就编辑多个单词为@nxnev

target="arch linux"; url="http://reddit.com/search?q=$target"; search=$(echo $target | sed 's/ /|/'); wget -qO- "$url" | grep -Po "http.*?(?=\")" | grep -Eh "$search" | sort -u

答案3

如果你想显示 Reddit 的搜索结果(仅 URL)但不使用他们的 API,这应该可以完成工作:

reddit() {
  local 'search_term' 'user_agent'
  user_agent='your_user_agent'
  for search_term; do
    curl \
      --data-urlencode "q=${search_term}" \
      --get \
      --header "User-Agent: ${user_agent}" \
      --silent \
      "https://www.reddit.com/search" \
    | grep -P -o -e '<a [^>]*? class="search-title may-blank" >.*?<\/a>' \
    | grep -P -o -e '(?<=href=")(.*?)(?=")' \
    | tail -n '+4'
  done
}

例子:

$ reddit 'arch linux'
https://www.reddit.com/r/linux/comments/6pepav/someone_got_offended_by_a_hostname_of_an/
https://www.reddit.com/r/linux/comments/6g6xsu/the_arch_linux_wiki_is_awesome_and_i_would_like/
https://www.reddit.com/r/linuxmasterrace/comments/7ikqxs/my_new_macbook_pro_has_been_made_glorious_by_the/
https://www.reddit.com/r/linux/comments/5sx15b/arch_linux_pulls_the_plug_on_32bit/
https://www.reddit.com/r/archlinux/comments/7a4sgv/almost_no_one_on_campus_got_it_but_i_dressed_up/
https://www.reddit.com/r/archlinux/comments/7blg7w/arch_linux_news_the_end_of_i686_support/
https://www.reddit.com/r/archlinux/comments/7g53jg/here_is_a_screenshot_of_a_music_player_ive_been/
https://www.reddit.com/r/thinkpad/comments/7k704w/my_beloved_x1_carbon_5th_gen_running_arch_linux/
https://www.reddit.com/r/pcmasterrace/comments/39hl6h/im_thoroughly_enjoying_arch_linux_60fps/
https://www.reddit.com/r/linux/comments/3qsmk4/twitch_installs_arch_linux_similar_to_twitch/
https://www.reddit.com/r/linuxmasterrace/comments/7aai76/i_am_using_archlinux/
https://www.reddit.com/r/archlinux/comments/7j2zhl/fully_encrypted_archlinux_with_secure_boot_on/
https://www.reddit.com/r/linuxmasterrace/comments/5dbgku/my_experience_with_arch_linux_so_far/
https://www.reddit.com/r/linux/comments/4m0r93/why_did_archlinux_embrace_systemd/
https://www.reddit.com/r/unixporn/comments/7iss7b/xfce_arch_linux_satisfaction/
https://www.reddit.com/r/archlinux/comments/5ndu7r/my_manual_to_install_arch_linux_the_minimal_way_i/
https://www.reddit.com/r/archlinux/comments/73g3vz/librem_5_will_support_arch_linux/
https://www.reddit.com/r/haskell/comments/7jyie0/the_arch_linux_community_does_not_look_very_about/
https://www.reddit.com/r/linux_gaming/comments/4xep1o/no_mans_sky_running_on_wine_in_64_bit_arch_linux/
https://www.reddit.com/r/archlinux/comments/7bjp8j/hexadecimal_arch_linux_calendar_for_2018/
https://www.reddit.com/r/linux/comments/3r1mdv/twitch_installs_arch_linux_lasts_only_a_few_hours/
https://www.reddit.com/r/archlinux/comments/7hfb9m/farch_functional_arch_linux_system_management/

相关内容