I'm making an HTTP proxy in C++; when a client sends a GET
or CONNECT
request to the proxy, the proxy parses the HTTP header of the packet, resolve the hostname in it, opens another socket to the server destination and send client's request. Then the proxy will send server's response to the client.
Here's, for example, what the proxy sends to the server when he gets a GET
request or a CONNECT
request from the client:
GET http://www.gstatic.com/generate_204 HTTP/1.1
CONNECT cr-input.getspeakit.com:443 HTTP/1.1
But when I parse a GET
response from server, I find a 400 status code, i.e. Bad Request
: this seems to be (from Wikipedia):
a malformed request syntax, invalid request message framing, or deceptive request routing.
Do I send wrong arguments to the server in the GET
request?
答案1
When an HTTP client sends a GET
request, the destination hostname is usually not in the requested URI. That is, instead of sending
GET http://www.gstatic.com/generate_204 HTTP/1.1
an HTTP 1.1 client sends:
GET /generate_204 HTTP/1.1
Host: www.gstatic.com
Since the client "knows" that it needs to resolve the DNS name "www.gstatic.com" to an IP address, and send the HTTP request to that IP address, it doesn't really need to include the hostname again as part of the requested path. The Host
header is a hint to the server of the originally requested hostname.
Note that the above semantics are covered by RFC 7230, Section 5.3. And there, it does state that the "absolute form" of the requested/target resource could include the schema and hostname; it's the "origin form" which I described above. If your origin/destination server returns "400 Bad Request" for the "absolute form" that your proxy is using, it suggests either a) that server does not support the "absolute form", or b) something else is wrong (a missing Host
request header?).
This means that your HTTP proxy shouldn't really rely on the destination hostname being in the first line of the HTTP request (for an HTTP client could use the "origin form" of the requested/target resource; instead, your proxy should look for the Host
header, if you need to know that information. And to avoid the 400 Bad Request
from the origin server, I recommend that your proxy send e.g.:
GET /generate_204 HTTP/1.1
Host: www.gstatic.com
For the semantics of the CONNECT
method, see RFC 7231, Section 4.3.6. There, we see that the requested resource must consist of the hostname and port. Any 2xx
response from the destination server indicates success; any other response code indicates that the requested "tunnel" is not set up. The rest of the RFC there is worth reading, for other edge cases and behaviors.
Hope this helps!