如何在 bash 中使用引用和转义

如何在 bash 中使用引用和转义

我收到了谷歌报告的生成 404 错误的 URL 列表。

我可以使用 curl (从命令行)测试一个 url,如下所示:

curl -k --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)" https://MYURLHERE

效果和我预期的完全一样。我想把它放在一个脚本中,这样我就可以浏览一下它们的列表,这就是我所拥有的。

#!/usr/bin/bash
url=$1

curlcmd="curl -k --user-agent \"Googlebot/2.1 (+http://www.google.com/bot.html)\""
$curlcmd $url

但它不起作用。我不断

curl: (1) Protocol "(+http" not supported or disabled in libcurl

我不知道如何摆脱这个问题并使其发挥作用。有什么建议吗?

答案1

用引号将变量 $1 括起来,或者可以使用如下方法:

$ touch $$
$ echo 'http://www.google.com' >> $$
$ echo 'http://www.yahoo.com' >> $$
$ for url in $(cat $$); do curl -I $url ; done
HTTP/1.1 200 OK
Date: Wed, 22 Nov 2017 15:57:19 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2017-11-22-15; expires=Fri, 22-Dec-2017 15:57:19 GMT; path=/; domain=.google.com
Set-Cookie: NID=117=CaOUCOyr9TPjs64tqyz1MuqHsASzL_3eO5n-NE4ubqAikITGbs7QY0aegNByOWX1Vaf9SsUVQDJ1wdaIOZwXoiqfVZ9ISLtta7tvcDH6LFM52OGFKRH4J5Clde2EX8oG; expires=Thu, 24-May-2018 15:57:19 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Age: 0
Transfer-Encoding: chunked
Via: 1.1 localhost.localdomain

HTTP/1.1 200 OK
Date: Wed, 22 Nov 2017 15:57:19 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2017-11-22-15; expires=Fri, 22-Dec-2017 15:57:19 GMT; path=/; domain=.google.com
Set-Cookie: NID=117=VRrA0-bCESlSCoerEK0n1hxXfldwpQI4cisiKrEgnKVph9HkfQJu-tbur3ZBiLh3-RFKZ0kbWUWsBwJKzsi_aPUuJzztM1rCuDfljZLxqjaHanZxiCx7qch4P2WCoDDC; expires=Thu, 24-May-2018 15:57:19 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Age: 0
Transfer-Encoding: chunked
Via: 1.1 localhost.localdomain

HTTP/1.1 200 OK
Date: Wed, 22 Nov 2017 15:57:19 GMT
Via: http/1.1 media-router-fp56.prod.media.ne1.yahoo.com (ApacheTrafficServer [c s f ]), 1.1 localhost.localdomain
Server: ATS
Cache-Control: no-store, no-cache, max-age=0, private
Content-Type: text/html
Content-Language: en
Expires: -1
X-Frame-Options: SAMEORIGIN
Content-Length: 12
Age: 0

$ 

答案2

您可以像这样修改它:

#!/usr/bin/bash
url="$1"

curlcmd='curl -k --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)"'
$curlcmd "$url"

您收到的消息表明不支持 http(默认)。请改用 https:

./test.sh https://www.somepage.com

答案3

对于查询URL 列表作为命令行参数给出:

#!/bin/sh

USER_AGENT="Googlebot/2.1 (+http://www.google.com/bot.html)"

curl_with_ua(){
  curl -k --user-agent "$USER_AGENT" "$1"
}

for url in "$@"; do
  curl_with_ua "$url"
done

答案4

我无法重现您的问题...我只是收到这个听起来毫无意义的警告,否则它可以起作用:

curl: (3) URL rejected: Port number was not a decimal number between 0 and 65535

虽然这个错误很难理解,但实际上这是错误的做事方式。在没有引号的变量中执行操作几乎总是一个坏主意。而简单地添加引号也会失败:

url=https://startpage.com
curlcmd="curl -k --user-agent \"Googlebot/2.1 (+http://www.google.com/bot.html)\""
"$curlcmd" "$url"

由于上面错误地使用了引号,错误是这样的,因为它使整个 cmd (包括所有空格)变成一个无参数的命令:

bash: curl -k --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)": No such file or directory

而导致问题的原因可能是“2.1”和“(”之间的空格。如果使用引号,所有空格都会被合并成一个毫无意义的大参数。如果没有引号,它们是分开的,但“2.1”后面的空格也会将其拆分成另一个参数。

另外,您可以使用eval它来使该空间的某些转义真正起作用实际上合乎逻辑......但我不推荐使用 eval。

我喜欢使用函数来做这种事。(这可能在 sh 中也有效)

#!/usr/bin/bash
url="$1"

mycurl() {
    curl -k --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)" "$@"
}

mycurl "$url"

或者数组也可以工作(但不是在 sh 中...不要被系统所sh欺骗bash --posix

#!/usr/bin/bash
url="$1"

curlcmd=(curl -k --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)")

# an example of the main reason I generally use an array... you can modify the command dynamically
if [ "$DEBUG" = 1 ]; then
    curlcmd+=(-v)
fi

# and this runs it... fully protected variables in quotes, but also easy, unlike all the other escaping methods
"${curlcmd[@]}" "$url"

相关内容