将多个 URL 目标保存到文本文件

Question 1

使用-i选项：

wget -i ./url.txt

从man wget：

-i 文件

--输入文件=文件

从本地或外部文件读取 URL。如果 - 指定为文件，则从标准输入读取 URL。（使用 ./- 从字面名称为 - 的文件中读取。）如果使用此函数，则命令行上不需要出现 URL。如果命令行和输入文件中都有 URL，则命令行中的 URL 将首先被检索。如果未指定 --force-html，则文件应包含一系列 URL，每行一个。

但是，如果指定--force-html，则文档将被视为html。在这种情况下，您可能会遇到相对链接的问题，您可以通过在文档中添加“”或在命令行上指定 --base=url 来解决该问题。

如果文件是外部文件，则当 Content-Type 与 text/html 匹配时，该文档将自动被视为 html。此外，如果未指定，文件的位置将隐式用作基本 href。

Answer

使用-i选项：

wget -i ./url.txt

从man wget：

-i 文件

--输入文件=文件

从本地或外部文件读取 URL。如果 - 指定为文件，则从标准输入读取 URL。（使用 ./- 从字面名称为 - 的文件中读取。）如果使用此函数，则命令行上不需要出现 URL。如果命令行和输入文件中都有 URL，则命令行中的 URL 将首先被检索。如果未指定 --force-html，则文件应包含一系列 URL，每行一个。

但是，如果指定--force-html，则文档将被视为html。在这种情况下，您可能会遇到相对链接的问题，您可以通过在文档中添加“”或在命令行上指定 --base=url 来解决该问题。

如果文件是外部文件，则当 Content-Type 与 text/html 匹配时，该文档将自动被视为 html。此外，如果未指定，文件的位置将隐式用作基本 href。

Question 2

wget有一个选项可以做到这一点：

wget --input-file url.txt

将会每行读取一个 URLurl.txt并按顺序下载到当前目录。

更一般地，您可以使用xargs对于这类事情，结合wget或者curl:

xargs wget < url.txt
xargs curl -O < url.txt

xargs读取其输入的每一行并将其作为参数提供给您给出的命令。这里的命令是wget或curl -O，这两个命令都下载 URL 并将其保存到当前目录中。< url.txt提供的内容url.txt作为命令的输入xargs。

你的Python代码的问题是你从 urllib 得到的是字节然后将数据直接打印到文件中，该文件将字节字符串化b'abc\00\0a...'（这就是编写字节文字的方式）。

Answer

wget有一个选项可以做到这一点：

wget --input-file url.txt

将会每行读取一个 URLurl.txt并按顺序下载到当前目录。

更一般地，您可以使用xargs对于这类事情，结合wget或者curl:

xargs wget < url.txt
xargs curl -O < url.txt

xargs读取其输入的每一行并将其作为参数提供给您给出的命令。这里的命令是wget或curl -O，这两个命令都下载 URL 并将其保存到当前目录中。< url.txt提供的内容url.txt作为命令的输入xargs。

你的Python代码的问题是你从 urllib 得到的是字节然后将数据直接打印到文件中，该文件将字节字符串化b'abc\00\0a...'（这就是编写字节文字的方式）。

Question 3

和w3m：

echo 'http://unix.stackexchange.com/questions/148670/save-html-to-text-file' |
tee - - - | 
xargs -n1 w3m -dump | 
sed '/Save html/!d;N;N;N;N;N;N;N'

在我看来，这xargs甚至没有必要——当然有一个可以同时设置多个网址的设置，但我现在还无法理解它。无论如何，xargs有效：

Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt

Answer

和w3m：

echo 'http://unix.stackexchange.com/questions/148670/save-html-to-text-file' |
tee - - - | 
xargs -n1 w3m -dump | 
sed '/Save html/!d;N;N;N;N;N;N;N'

在我看来，这xargs甚至没有必要——当然有一个可以同时设置多个网址的设置，但我现在还无法理解它。无论如何，xargs有效：

Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt
Save html to text file

            I'd like to save some (plain HTML) web pages to text file, from URL
            stored in text files as well.

            Here's an exemple of the input file containing the URLs:

            ~$: head -3 url.txt

Question 4

还有另外两种方法：

wget $(<file)

和

while read -r link; do wget "$link"; done < file

Answer

还有另外两种方法：

wget $(<file)

和

while read -r link; do wget "$link"; done < file

将多个 URL 目标保存到文本文件

答案1

答案2

答案3

答案4

相关内容