HTTP Introduction

Dr. Greg Bernstein

Updated March 18th, 2021

The HTTP Protocol

References

Wikipedia: HTTP
RFC7230 Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
RFC7231 Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
Proxy Servers

Request/Response Protocol

HTTP is a stateless request/response protocol that operates by exchanging messages across a reliable transport or session-layer “connection”.

Meaning?

Stateless: The protocol itself provides “no memory” about previous requests or responses. Each request is “new”.
Request/Response: All exchanges are initiate by the client by making a request. Servers cannot independently send updates to the clients. Use Websockets for that.

HTTP Messages

How is a message different from a packet?

A message is this case is an application layer concept. Large HTTP messages are broken into smaller packets by the transport layer (usually TCP) and then sent over IP.
There are only two types of HTTP messages: request and response.
“Messages are passed in a format similar to that used by Internet mail [RFC5322] and the Multipurpose Internet Mail Extensions (MIME) [RFC2045]”

Client

From RFC7230:

An HTTP client is a program that establishes a connection to a server for the purpose of sending one or more HTTP requests.

Server

From RFC7230:

An HTTP server is a program that accepts connections in order to service HTTP requests by sending HTTP responses.

The same computer can host multiple clients and servers.
A single program may have both client and server functionality

Typical Clients & Servers

We typically think of a web Browser as the client, but we can and will make HTTP requests programmatically for testing and other purposes.
Web servers come in many “flavors” depending on deployment context, features, performance, etc…
However intermediate systems called proxies may also take part

HTTP Flavors

HTTP 1.1 is a text based protocol that uses CRLF to separate various parts of messages.
HTTP 2 keeps most of the high level interface that we will learn but provides much more efficient methods for encoding (binary) and transmission of messages.
Both flavors are supported by Node.js and Express.js, we will only use HTTP 1.1 for simplicity.

HTTP Requests

Message Format

begins with a request-line that includes a method, URI, and protocol version
followed by header fields containing request modifiers, client information, and representation metadata
an empty line to indicate the end of the header section,
and finally a message body containing the payload body if any

Request Line

method request-target HTTP1.1 \r\n

Method types: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE. See RFC7231
The request-target is either the entire URI or only the path part (slashes after the domain)

All general-purpose servers MUST support the methods GET and HEAD. All other methods are OPTIONAL.

HTTP Header Format

The message (request or response) start line is followed by one or more Headers. These headers have the form:

Header-Name: Information \r\n

Request Header Fields

Controls: Host, Cache-Control, Expect, …
Content Negotiation: Accept, Accept-Charset, Accept-Encoding, Accept-Language
Authentication Credentials: Authorization, Proxy-Authorization
Request Context: From, Referer, User-Agent

Example

request to www.grotto-networking.com:

GET / HTTP/1.1\r\n
Host: www.grotto-networking.com\r\n
Connection: keep-alive\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n
Accept-Encoding: gzip, deflate, sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
\r\n

Don’t Panic

The clients and servers we’ll use take care of most of the headers for us.
We will add a few headers (client or server side) for extra functionality.

HTTP Response

Response Message

begins with a status line that includes the protocol version, a success or error code, and textual reason phrase
possibly followed by header fields containing server information, resource metadata, and representation metadata
an empty line to indicate the end of the header section,
finally a message body containing the payload body, if any

Response Start Line

HTTP/1.1 status-code reason-phrase \r\n

Status-code A number between 100-505 (last I looked)
reason-phrase Text explaining the status codes

Popular Status Codes

See RFC7231

200 OK: Things worked!
400 Bad Request, 401 Unauthorized, 403 Forbidden: Permission/Authorization issues
404 Not Found: Client asking for something that doesn’t exist or we put something in the wrong place on the server.
500 Internal Server Error: A problem with the server code.

Response Headers

Control Data: Age, Cache-Control, Expires, Date, Location,…
Validator Fields: Etag, Last-Modified
Authentication Challenges: WWW-Authenticate, Proxy-Authenticate
Response Context: Accept-Ranges, Allow, Server

Meta-Data Headers

Used to describe body content RFC7231

Content-Type, Content-Encoding
Content-Language, Content-Location

Example Response

From www.grotto-networking.com

HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Wed, 12 Apr 2017 20:08:38 GMT\r\n
Content-Type: text/html\r\n
Transfer-Encoding: chunked\r\n
Connection: keep-alive\r\n
Vary: Accept-Encoding\r\n
Last-Modified: Thu, 16 Mar 2017 02:56:28 GMT\r\n
ETag: W/"5581190-2b5e-54ad0351c774c"\r\n
Content-Encoding: gzip\r\n
\r\n
[Page content gziped]

Page Loading

From MDN HTTP Overview

Simple page example:

Wireshark HTTP trace to www.grotto-networking.com:

Proxies

Proxies 1

IETF Definition
- An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients.

Proxy Types

From Wikipedia Proxy Server

A forward proxy is an Internet-facing proxy used to retrieve data from a wide range of sources. Used for monitoring, content filtering, bypassing filters and censorship, caching, and more.
A reverse proxy is an internal-facing proxy used as a front-end to control access to servers on a private network. Common tasks include: load-balancing, authentication, decryption or caching.

Proxies in Development

We will want to use proxy functionality in development:

Application Layer Switching

Servers and proxies will perform different actions on requests messages based on:

The URL or portions of the URL
The HTTP Method (GET, POST, etc…)
We can think of this as application layer switching. High end servers such as NGINX and Apache 2 provide elaborate configuration options for this. Almost all servers provide some capabilities for this.