如何不使用 wget 下载某些网站？

2024-6-19 • tag-icon

我现在正在做的是

wget www.example.com -m --warc-file="example.com"

这对于大多数网站来说都很好，但对于我正在保存的这个特定网站，有超过一千个冗余页面，例如 www.example.com/eventsf[0]=event_calendar5，同时保留主 www.example.com/events 网站？

如果您使用的是相对较新的 Wget 版本（发布时间不到 6 年），那么您可以使用--accept-regex或--reject-regex选项使用正则表达式并拒绝过滤您真正想要下载的 URL。

相关内容