我有以下命令来复制网站,
当它尝试访问 sun.com 时,连接超时了。
我希望 wget 排除 sun.com,以便 wget 能够继续执行下一步。
存在的问题
$ wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows http://pt.jikos.cz/garfield/
.
.
2021-08-09 03:28:28 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
2021-08-09 03:28:30 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
.
Location: https : //packages. debian. org /robots.txt [following]
--2021-08-09 03:28:33-- https : //packages. debian. org /robots.txt
Connecting to packages.debian.org (packages.debian.org)|128.0.10.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24 [text/plain]
Saving to: ‘packages.debian.org/robots.txt’
packages.debian.org 100%[===================>] 24 --.-KB/s in 0s
2021-08-09 03:28:34 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
Loading robots.txt; please ignore errors.
--2021-08-09 03:28:34-- http ://wwws. sun. com/ robots.txt
Resolving wwws.sun.com (wwws.sun.com)... 137.254.16.75
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:28:56-- (try: 2) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:29:19-- (try: 3) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:29:43-- (try: 4) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:30:08-- (try: 5) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:30:34-- (try: 6) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:31:01-- (try: 7) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
预期 $wget 能够保存整个网站而不超时,如果超时,则 wget 将跳过超时连接。
答案1
你可能想要的是:
wget \
--recursive \ # Download the whole site.
--page-requisites \ # Get all assets/elements (CSS/JS/images).
--adjust-extension \ # Save files with .html on the end.
--span-hosts \ # Include necessary assets from offsite as well.
--convert-links \ # Update links to still work in the static version.
--restrict-file-names=windows \ # Modify filenames to work in Windows as well.
--domains yoursite.com \ # Do not follow links outside this domain.
--no-parent \ # Don't follow links outside the directory you pass in.
yoursite.com/whatever/path # The URL to download