How to properly send GET and CONNECT requests from proxy to client?

How to properly send GET and CONNECT requests from proxy to client?

I'm making an HTTP proxy in C++; when a client sends a GET or CONNECT request to the proxy, the proxy parses the HTTP header of the packet, resolve the hostname in it, opens another socket to the server destination and send client's request. Then the proxy will send server's response to the client.

Here's, for example, what the proxy sends to the server when he gets a GET request or a CONNECT request from the client:

GET http://www.gstatic.com/generate_204 HTTP/1.1

CONNECT cr-input.getspeakit.com:443 HTTP/1.1

But when I parse a GET response from server, I find a 400 status code, i.e. Bad Request: this seems to be (from Wikipedia):

a malformed request syntax, invalid request message framing, or deceptive request routing.

Do I send wrong arguments to the server in the GET request?

答案1

When an HTTP client sends a GET request, the destination hostname is usually not in the requested URI. That is, instead of sending

GET http://www.gstatic.com/generate_204 HTTP/1.1

an HTTP 1.1 client sends:

GET /generate_204 HTTP/1.1
Host: www.gstatic.com

Since the client "knows" that it needs to resolve the DNS name "www.gstatic.com" to an IP address, and send the HTTP request to that IP address, it doesn't really need to include the hostname again as part of the requested path. The Host header is a hint to the server of the originally requested hostname.

Note that the above semantics are covered by RFC 7230, Section 5.3. And there, it does state that the "absolute form" of the requested/target resource could include the schema and hostname; it's the "origin form" which I described above. If your origin/destination server returns "400 Bad Request" for the "absolute form" that your proxy is using, it suggests either a) that server does not support the "absolute form", or b) something else is wrong (a missing Host request header?).

This means that your HTTP proxy shouldn't really rely on the destination hostname being in the first line of the HTTP request (for an HTTP client could use the "origin form" of the requested/target resource; instead, your proxy should look for the Host header, if you need to know that information. And to avoid the 400 Bad Request from the origin server, I recommend that your proxy send e.g.:

GET /generate_204 HTTP/1.1
Host: www.gstatic.com

For the semantics of the CONNECT method, see RFC 7231, Section 4.3.6. There, we see that the requested resource must consist of the hostname and port. Any 2xx response from the destination server indicates success; any other response code indicates that the requested "tunnel" is not set up. The rest of the RFC there is worth reading, for other edge cases and behaviors.

Hope this helps!

相关内容