正如你们中的一些人可能已经在网络中的其他站点上注意到的那样,我有几个脚本可以修复 Stack Exchange 上损坏的图像和链接。大多数这些脚本作为 cronjob 在我的 Raspberry Pi 4 上自动运行。
我注意到链接的一个特点jstor.org。我可以在 Mac 和 RPi 上的浏览器中访问该链接。该脚本(以与 类似的方式浏览网页curl
)在 RPi 上运行时会被 reCAPTCHA 阻止,但在 Mac 上则不会。该网站有一些抓取保护是合乎逻辑的,但这是我第一次看到不同机器之间的差异(它们位于同一家庭网络上)。
这是一个具体的例子;该请求取自我的 Raspberry Pi 上的 Chromium 开发人员工具:
curl 'https://www.jstor.org/stable/2533862' \
-H 'accept-encoding: deflate, gzip' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'accept-language: en-US,en;q=0.9' \
--compressed -v
(注意:我删除了一些sec-
标题,因为它们不相关)
在终端中,此命令适用于我的 Mac,但不适用于 Raspberry Pi。如果我使用 Mac 的用户代理,那没有什么区别。这是生成的 HTML:
这是curl 的完整输出:
pi@raspberrypi:~ $ curl 'https://www.jstor.org/stable/2533862' \
> -H 'accept-encoding: deflate, gzip' \
> -H 'upgrade-insecure-requests: 1' \
> -H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
> -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
> -H 'accept-language: en-US,en;q=0.9' \
> --compressed -v
* Expire in 0 ms for 6 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 5 ms for 1 (transfer 0x1e5a950)
* Trying 151.101.36.152...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x1e5a950)
* Connected to www.jstor.org (151.101.36.152) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=New York; L=New York; O=Ithaka Harbors, Inc.; CN=jstor.org
* start date: Apr 12 15:57:42 2022 GMT
* expire date: May 14 15:57:41 2023 GMT
* subjectAltName: host "www.jstor.org" matched cert's "*.jstor.org"
* issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 OV TLS CA 2022 Q2
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x1e5a950)
> GET /stable/2533862 HTTP/2
> Host: www.jstor.org
> accept-encoding: deflate, gzip
> upgrade-insecure-requests: 1
> user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
> accept-language: en-US,en;q=0.9
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 403
< server: Varnish
< retry-after: 0
< content-type: text/html
< accept-ranges: bytes
< date: Mon, 18 Apr 2022 07:47:18 GMT
< via: 1.1 varnish
< set-cookie: _pxhd=vW1nDMNFFFI3tkNkjQYWOgLI99ajK-hT6LI4ua0sZy38e1p4v9XUHY6a2DoRXv2CFRxDjEnHFZYyof3sUytsZw==:h1LPuATkQi5XBRiv7qid2Y8pMCDr93JembEBMBbV9Cjwzp3HvjzErajD8VCWHMVi0Cc0FTRhPNO6W3t4pYHs/wawxsyE89qgcX4Ci7BGRyI=; Expires=Tue, 18 Apr 2023 07:47:18 GMT; path=/;
< x-served-by: cache-ams21078-AMS
< x-cache: MISS
< x-cache-hits: 0
< content-length: 3468
<
<!DOCTYPE html>
<html class="popup no-js" lang="en">
<head>
<meta name="robots" content="noarchive,NOODP" />
<meta name="description" content="JSTOR is a digital library of academic journals, books, and primary sources." />
<meta name="viewport" content="width=device-width" />
<meta charset="UTF-8"/>
<link rel="stylesheet" href="/assets/global_20171026T1134/build/global/css/popup.css" />
<link rel="apple-touch-icon" href="/assets/global_20171026T1134/build/images/apple-touch-icon.png" />
<title>JSTOR: Access Check</title>
<!-- Custom CSS -->
</head>
<body>
<div class="logo-container">
<a href="/"><img src="/assets/global_20171026T1134/build/images/jstor-logo.png" srcset="/assets/global_20171026T1134/build/images/jstor-logo.png" class="non-responsive" alt="JSTOR Home" width="65" height="90" /></a>
</div>
<div id="content" role="main" class="row content brdra">
<div class="small-12 columns paxl mtxl">
<div class="row popup-inner">
<div class="small-12 columns noGlobalSrch">
<h2>Access Check</h2>
<p>Our systems have detected unusual traffic activity from your network. Please complete this reCAPTCHA to demonstrate that it's
you making the requests and not a robot. If you are having trouble seeing or completing this challenge,
<a href="https://support.jstor.org/hc/en-us/articles/115011068868-Troubleshooting-CAPTCHA-" target="_blank" title="This link opens in a new window">this page</a> may help.
If you continue to experience issues, you can <a href="https://support.jstor.org/" target="_blank" title="This link opens in a new window">contact JSTOR support</a>.</p>
<div id="px-captcha"> </div>
<p>Block Reference: #c5d172ad-beeb-11ec-8c24-556c625a4161<br/>
VID: #<br/>
IP: [my IP address]<br/>
Date and time: Mon, 18 Apr 2022 07:47:18 GMT<br/>
<noscript>Javascript is disabled</noscript></p>
<p>Go back to <a href="/" title="Go back to JSTOR">JSTOR</a></p>
</div>
</div>
</div>
</div>
<div class="row">
<div class="small-12 columns pts">
<small>©2000-<script type="text/javascript">document.write(new Date().getFullYear());</script> ITHAKA. All Rights Reserved. JSTOR®, the JSTOR logo, JPASS®, and ITHAKA® are registered trademarks of ITHAKA.</small>
</div>
</div>
<!-- Px --> <script> window._pxAppId = 'PXu4K0s8nX'; window._pxJsClientSrc = '/u4K0s8nX/init.js'; window._pxFirstPartyEnabled = true; window._pxVid = ''; window._pxUuid = 'c5d172ad-beeb-11ec-8c24-556c625a4161'; window._pxHostUrl = '/u4K0s8nX/xhr'; </script>
<script> var s = document.createElement('script'); s.src = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'; var p = document.getElementsByTagName('head')[0]; p.insertBefore(s, null); if (true ){s.onerror = function () {s = document.createElement('script'); var suffixIndex = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'.indexOf('/captcha.js'); var temperedBlockScript = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'.substring(suffixIndex); s.src = '//captcha.px-cdn.net/PXu4K0s8nX' + temperedBlockScript; p.parentNode.insertBefore(s, p);};}</script>
<!-- Custom Script -->
</body>
* Connection #0 to host www.jstor.org left intact
</html>
作为参考,这是我在 Mac 上得到的结果(HTML 输出由于其长度而被跳过,但这是我所期望的):
glorfindel@Glorfindels-MacBook ~ % curl 'https://www.jstor.org/stable/2533862' \
-H 'accept-encoding: deflate, gzip' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'accept-language: en-US,en;q=0.9' \
--compressed -v
* Trying 151.101.36.152:443...
* Connected to www.jstor.org (151.101.36.152) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
* CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=New York; L=New York; O=Ithaka Harbors, Inc.; CN=jstor.org
* start date: Apr 12 15:57:42 2022 GMT
* expire date: May 14 15:57:41 2023 GMT
* subjectAltName: host "www.jstor.org" matched cert's "*.jstor.org"
* issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 OV TLS CA 2022 Q2
* SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x156012200)
> GET /stable/2533862 HTTP/2
> Host: www.jstor.org
> accept-encoding: deflate, gzip
> upgrade-insecure-requests: 1
> user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
> accept-language: en-US,en;q=0.9
>
< HTTP/2 200
< server: Apache/2.4.29 (Ubuntu)
< x-frame-options: SAMEORIGIN
< set-cookie: AccessSession=H4sIAAAAAAAAAK2RSW_UQBCF7_kVls_pUS_VG7exiRPEkXBCKCqXu8HIw4y8RIIo_x2vAxPIjWN_9er1q6qnqyRJ6ypN3iSpQOmdJUtRauBOlpXDCCijJl9yYdPrSUyrukZiiCjEgr-tOCsKlRX-Lejc5pksPOdaFFxxpcw-g3xRt6taQXBWUhXKoIGCcpJLKwyAsRX4qBb1MKxyD8YBj4o56zQDkJ45owNTngyVshRe4NKCQ_91aonYdGEmj9gsLsLo8ROnDFjD51J9WuYHtRPW7oQzOyFgNaoO9fdLp647XoKeLt9I1HcT-pQ8je8_Nsw597PvyNY4KWwgbFN6KJXymkWP45RecuYcBiZ9FZ1wFFWFW0__4xTmptv2OJzO1mecYVfTRW0cpz7UP0PR4JdJ0rdDGCvP1__Iql5mlX9lDY7bKhCLypYMyGqG5CTjJNyYnkqO_P9nTT4vh-jP9zRW_75ng68VhlcK2PftfLF1B3k-J7r5uCXJ72cQ2kNojg-3Nxv_sPB393f79_t0ynb1_AtWUGjGVAMAAA; Path=/; SameSite=Lax; Secure
< set-cookie: AccessSessionSignature=3322ae2ad6c2aca1491af2e0e493b5ab6c9533cd0d0024b488f8cb4904e4b6a3; Path=/; SameSite=Lax; Secure
< set-cookie: AccessSessionTimedSignature=aa6b942b5efb553254984f1935040fef7c65023499c190c6884e3cc79a66b75a; Path=/; SameSite=Lax; Secure
< set-cookie: UUID=946840f3-8785-4429-865e-39c6cb2b191a; expires=Thu, 17 Apr 2025 07:39:25 GMT; Max-Age=94608000; Path=/; SameSite=None; Secure
< set-cookie: csrftoken=BtBwhZoFKH61vmo7vv3CZAs9DrDmaPkXski76lA478b0kEYpLO8P35H0M2ymzpA4; expires=Mon, 17 Apr 2023 07:39:25 GMT; Max-Age=31449600; Path=/; SameSite=Lax; Secure
< set-cookie: ReferringRequestId=excelsior:3ebf19131196bae82406e55730913657; Path=/; SameSite=Lax; Secure
< content-encoding: gzip
< content-type: text/html; charset=utf-8
< x-jstor-restarts: 2
< accept-ranges: bytes
< date: Mon, 18 Apr 2022 07:39:25 GMT
< via: 1.1 varnish
< set-cookie: _pxhd=4X5A9pQYXcrgxAXzUOZVi-aK2X5V-aHyliphZo8MwnOdMZDI-s0-wFgAEPOOhZwLs2bHY6gFurYQD-XHQ8LKTg==:29IO778AT925teKlLC1rJlVwEP2U/dhPyCHtvFGriTKChA-n8uiCGYCX5scjIwh5sTZ478ZG8SGwxd4lmCJM/DO1SZTeMfI/pjaeDtq44OQ=; Expires=Tue, 18 Apr 2023 07:39:25 GMT; path=/;
< x-served-by: cache-ams21083-AMS
< x-cache: MISS
< x-cache-hits: 0
< vary: Cookie,Accept-Encoding,Fastly-SSL,Origin,X-Requested-Host
<
<!DOCTYPE html>
<html class="no-js" lang="en">
</html>
* Connection #0 to host www.jstor.org left intact
我的实际的脚本在Java上运行但有同样的问题。所以我猜是某物在树莓派或其操作系统中会导致这种情况,但是是什么?我正在根据 运行“Raspbian GNU/Linux 10 (buster)” /etc/os-release
,并curl --version
给出
curl 7.64.0 (arm-unknown-linux-gnueabihf) libcurl/7.64.0 OpenSSL/1.1.1n zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL
(Mac 上的curl 稍新一些(7.79.1),但由于行为似乎与工具无关,我认为这不是问题)。我最初在 Raspberry Pi Stack Exchange 上询问这个问题的一位版主表示,curl 在 Fedora 上对他们来说也失败了。