是什么导致我的 cURL 在某些 Unix 系统上失败而在其他 Unix 系统上工作?

是什么导致我的 cURL 在某些 Unix 系统上失败而在其他 Unix 系统上工作?

正如你们中的一些人可能已经在网络中的其他站点上注意到的那样,我有几个脚本可以修复 Stack Exchange 上损坏的图像和链接。大多数这些脚本作为 cronjob 在我的 Raspberry Pi 4 上自动运行。

我注意到链接的一个特点jstor.org。我可以在 Mac 和 RPi 上的浏览​​器中访问该链接。该脚本(以与 类似的方式浏览网页curl)在 RPi 上运行时会被 reCAPTCHA 阻止,但在 Mac 上则不会。该网站有一些抓取保护是合乎逻辑的,但这是我第一次看到不同机器之间的差异(它们位于同一家庭网络上)。

这是一个具体的例子;该请求取自我的 Raspberry Pi 上的 Chromium 开发人员工具:

curl 'https://www.jstor.org/stable/2533862' \
  -H 'accept-encoding: deflate, gzip' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'accept-language: en-US,en;q=0.9' \
  --compressed -v

(注意:我删除了一些sec-标题,因为它们不相关)

在终端中,此命令适用于我的 Mac,但不适用于 Raspberry Pi。如果我使用 Mac 的用户代理,那没有什么区别。这是生成的 HTML:

在此输入图像描述

这是curl 的完整输出:

pi@raspberrypi:~ $ curl 'https://www.jstor.org/stable/2533862' \
>   -H 'accept-encoding: deflate, gzip' \
>   -H 'upgrade-insecure-requests: 1' \
>   -H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
>   -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
>   -H 'accept-language: en-US,en;q=0.9' \
>   --compressed -v
* Expire in 0 ms for 6 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 0 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 1 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 2 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 3 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 4 ms for 1 (transfer 0x1e5a950)
* Expire in 5 ms for 1 (transfer 0x1e5a950)
*   Trying 151.101.36.152...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x1e5a950)
* Connected to www.jstor.org (151.101.36.152) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=New York; L=New York; O=Ithaka Harbors, Inc.; CN=jstor.org
*  start date: Apr 12 15:57:42 2022 GMT
*  expire date: May 14 15:57:41 2023 GMT
*  subjectAltName: host "www.jstor.org" matched cert's "*.jstor.org"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 OV TLS CA 2022 Q2
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x1e5a950)
> GET /stable/2533862 HTTP/2
> Host: www.jstor.org
> accept-encoding: deflate, gzip
> upgrade-insecure-requests: 1
> user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
> accept-language: en-US,en;q=0.9
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 403 
< server: Varnish
< retry-after: 0
< content-type: text/html
< accept-ranges: bytes
< date: Mon, 18 Apr 2022 07:47:18 GMT
< via: 1.1 varnish
< set-cookie: _pxhd=vW1nDMNFFFI3tkNkjQYWOgLI99ajK-hT6LI4ua0sZy38e1p4v9XUHY6a2DoRXv2CFRxDjEnHFZYyof3sUytsZw==:h1LPuATkQi5XBRiv7qid2Y8pMCDr93JembEBMBbV9Cjwzp3HvjzErajD8VCWHMVi0Cc0FTRhPNO6W3t4pYHs/wawxsyE89qgcX4Ci7BGRyI=; Expires=Tue, 18 Apr 2023 07:47:18 GMT; path=/;
< x-served-by: cache-ams21078-AMS
< x-cache: MISS
< x-cache-hits: 0
< content-length: 3468
< 
<!DOCTYPE html>
<html class="popup no-js" lang="en">
  <head>
    <meta name="robots" content="noarchive,NOODP" />
    <meta name="description" content="JSTOR is a digital library of academic journals, books, and primary sources." />
    <meta name="viewport" content="width=device-width" />
    <meta charset="UTF-8"/>
    <link rel="stylesheet" href="/assets/global_20171026T1134/build/global/css/popup.css" />
    <link rel="apple-touch-icon" href="/assets/global_20171026T1134/build/images/apple-touch-icon.png" />
    <title>JSTOR: Access Check</title>
    <!-- Custom CSS --> 
  </head>
  <body>
    <div class="logo-container">
      <a href="/"><img src="/assets/global_20171026T1134/build/images/jstor-logo.png" srcset="/assets/global_20171026T1134/build/images/jstor-logo.png" class="non-responsive" alt="JSTOR Home" width="65" height="90" /></a>
    </div>
    <div id="content" role="main" class="row content brdra">
      <div class="small-12 columns paxl mtxl">
        <div class="row popup-inner">
          <div class="small-12 columns noGlobalSrch">
            <h2>Access Check</h2>
            <p>Our systems have detected unusual traffic activity from your network. Please complete this reCAPTCHA to demonstrate that it's
               you making the requests and not a robot. If you are having trouble seeing or completing this challenge,
               <a href="https://support.jstor.org/hc/en-us/articles/115011068868-Troubleshooting-CAPTCHA-" target="_blank" title="This link opens in a new window">this page</a> may help.
               If you continue to experience issues, you can <a href="https://support.jstor.org/" target="_blank" title="This link opens in a new window">contact JSTOR support</a>.</p>
            <div id="px-captcha"> </div>
            <p>Block Reference: #c5d172ad-beeb-11ec-8c24-556c625a4161<br/>
               VID: #<br/>
               IP: [my IP address]<br/>
               Date and time: Mon, 18 Apr 2022 07:47:18 GMT<br/>
               <noscript>Javascript is disabled</noscript></p>
            <p>Go back to <a href="/" title="Go back to JSTOR">JSTOR</a></p>
          </div>
        </div>
      </div>
    </div>
    <div class="row">
      <div class="small-12 columns pts">
        <small>&copy;2000-<script type="text/javascript">document.write(new Date().getFullYear());</script> ITHAKA. All Rights Reserved. JSTOR&reg;, the JSTOR logo, JPASS&reg;, and ITHAKA&reg; are registered trademarks of ITHAKA.</small>
      </div>
    </div>
    <!-- Px --> <script> window._pxAppId = 'PXu4K0s8nX'; window._pxJsClientSrc = '/u4K0s8nX/init.js'; window._pxFirstPartyEnabled = true; window._pxVid = ''; window._pxUuid = 'c5d172ad-beeb-11ec-8c24-556c625a4161'; window._pxHostUrl = '/u4K0s8nX/xhr'; </script>
    <script> var s = document.createElement('script'); s.src = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'; var p = document.getElementsByTagName('head')[0]; p.insertBefore(s, null); if (true ){s.onerror = function () {s = document.createElement('script'); var suffixIndex = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'.indexOf('/captcha.js'); var temperedBlockScript = '/u4K0s8nX/captcha/captcha.js?a=c&u=c5d172ad-beeb-11ec-8c24-556c625a4161&v=&m=0'.substring(suffixIndex); s.src = '//captcha.px-cdn.net/PXu4K0s8nX' + temperedBlockScript; p.parentNode.insertBefore(s, p);};}</script>
    <!-- Custom Script --> 
  </body>
* Connection #0 to host www.jstor.org left intact
</html>

作为参考,这是我在 Mac 上得到的结果(HTML 输出由于其长度而被跳过,但这是我所期望的):

glorfindel@Glorfindels-MacBook ~ % curl 'https://www.jstor.org/stable/2533862' \
  -H 'accept-encoding: deflate, gzip' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'accept-language: en-US,en;q=0.9' \
  --compressed -v
*   Trying 151.101.36.152:443...
* Connected to www.jstor.org (151.101.36.152) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=New York; L=New York; O=Ithaka Harbors, Inc.; CN=jstor.org
*  start date: Apr 12 15:57:42 2022 GMT
*  expire date: May 14 15:57:41 2023 GMT
*  subjectAltName: host "www.jstor.org" matched cert's "*.jstor.org"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 OV TLS CA 2022 Q2
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x156012200)
> GET /stable/2533862 HTTP/2
> Host: www.jstor.org
> accept-encoding: deflate, gzip
> upgrade-insecure-requests: 1
> user-agent: Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
> accept-language: en-US,en;q=0.9
> 
< HTTP/2 200 
< server: Apache/2.4.29 (Ubuntu)
< x-frame-options: SAMEORIGIN
< set-cookie: AccessSession=H4sIAAAAAAAAAK2RSW_UQBCF7_kVls_pUS_VG7exiRPEkXBCKCqXu8HIw4y8RIIo_x2vAxPIjWN_9er1q6qnqyRJ6ypN3iSpQOmdJUtRauBOlpXDCCijJl9yYdPrSUyrukZiiCjEgr-tOCsKlRX-Lejc5pksPOdaFFxxpcw-g3xRt6taQXBWUhXKoIGCcpJLKwyAsRX4qBb1MKxyD8YBj4o56zQDkJ45owNTngyVshRe4NKCQ_91aonYdGEmj9gsLsLo8ROnDFjD51J9WuYHtRPW7oQzOyFgNaoO9fdLp647XoKeLt9I1HcT-pQ8je8_Nsw597PvyNY4KWwgbFN6KJXymkWP45RecuYcBiZ9FZ1wFFWFW0__4xTmptv2OJzO1mecYVfTRW0cpz7UP0PR4JdJ0rdDGCvP1__Iql5mlX9lDY7bKhCLypYMyGqG5CTjJNyYnkqO_P9nTT4vh-jP9zRW_75ng68VhlcK2PftfLF1B3k-J7r5uCXJ72cQ2kNojg-3Nxv_sPB393f79_t0ynb1_AtWUGjGVAMAAA; Path=/; SameSite=Lax; Secure
< set-cookie: AccessSessionSignature=3322ae2ad6c2aca1491af2e0e493b5ab6c9533cd0d0024b488f8cb4904e4b6a3; Path=/; SameSite=Lax; Secure
< set-cookie: AccessSessionTimedSignature=aa6b942b5efb553254984f1935040fef7c65023499c190c6884e3cc79a66b75a; Path=/; SameSite=Lax; Secure
< set-cookie: UUID=946840f3-8785-4429-865e-39c6cb2b191a; expires=Thu, 17 Apr 2025 07:39:25 GMT; Max-Age=94608000; Path=/; SameSite=None; Secure
< set-cookie: csrftoken=BtBwhZoFKH61vmo7vv3CZAs9DrDmaPkXski76lA478b0kEYpLO8P35H0M2ymzpA4; expires=Mon, 17 Apr 2023 07:39:25 GMT; Max-Age=31449600; Path=/; SameSite=Lax; Secure
< set-cookie: ReferringRequestId=excelsior:3ebf19131196bae82406e55730913657; Path=/; SameSite=Lax; Secure
< content-encoding: gzip
< content-type: text/html; charset=utf-8
< x-jstor-restarts: 2
< accept-ranges: bytes
< date: Mon, 18 Apr 2022 07:39:25 GMT
< via: 1.1 varnish
< set-cookie: _pxhd=4X5A9pQYXcrgxAXzUOZVi-aK2X5V-aHyliphZo8MwnOdMZDI-s0-wFgAEPOOhZwLs2bHY6gFurYQD-XHQ8LKTg==:29IO778AT925teKlLC1rJlVwEP2U/dhPyCHtvFGriTKChA-n8uiCGYCX5scjIwh5sTZ478ZG8SGwxd4lmCJM/DO1SZTeMfI/pjaeDtq44OQ=; Expires=Tue, 18 Apr 2023 07:39:25 GMT; path=/;
< x-served-by: cache-ams21083-AMS
< x-cache: MISS
< x-cache-hits: 0
< vary: Cookie,Accept-Encoding,Fastly-SSL,Origin,X-Requested-Host
< 
<!DOCTYPE html>
<html class="no-js" lang="en">
    
</html>
* Connection #0 to host www.jstor.org left intact

我的实际的脚本在Java上运行但有同样的问题。所以我猜是某物在树莓派或其操作系统中会导致这种情况,但是是什么?我正在根据 运行“Raspbian GNU/Linux 10 (buster)” /etc/os-release,并curl --version给出

curl 7.64.0 (arm-unknown-linux-gnueabihf) libcurl/7.64.0 OpenSSL/1.1.1n zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL

(Mac 上的curl 稍新一些(7.79.1),但由于行为似乎与工具无关,我认为这不是问题)。我最初在 Raspberry Pi Stack Exchange 上询问这个问题的一位版主表示,curl 在 Fedora 上对他们来说也失败了。

相关内容