HTTP Introduction

Dr. Greg Bernstein

Updated March 18th, 2021

The HTTP Protocol

References

Request/Response Protocol

From RFC7230:

HTTP is a stateless request/response protocol that operates by exchanging messages across a reliable transport or session-layer “connection”.

Meaning?

  • Stateless: The protocol itself provides “no memory” about previous requests or responses. Each request is “new”.

  • Request/Response: All exchanges are initiate by the client by making a request. Servers cannot independently send updates to the clients. Use Websockets for that.

HTTP Messages

How is a message different from a packet?

  • A message is this case is an application layer concept. Large HTTP messages are broken into smaller packets by the transport layer (usually TCP) and then sent over IP.

  • There are only two types of HTTP messages: request and response.

  • “Messages are passed in a format similar to that used by Internet mail [RFC5322] and the Multipurpose Internet Mail Extensions (MIME) [RFC2045]”

Client

From RFC7230:

An HTTP client is a program that establishes a connection to a server for the purpose of sending one or more HTTP requests.

Server

From RFC7230:

An HTTP server is a program that accepts connections in order to service HTTP requests by sending HTTP responses.

  • The same computer can host multiple clients and servers.

  • A single program may have both client and server functionality

Typical Clients & Servers

  • We typically think of a web Browser as the client, but we can and will make HTTP requests programmatically for testing and other purposes.

  • Web servers come in many “flavors” depending on deployment context, features, performance, etc…

  • However intermediate systems called proxies may also take part

HTTP Flavors

  • HTTP 1.1 is a text based protocol that uses CRLF to separate various parts of messages.

  • HTTP 2 keeps most of the high level interface that we will learn but provides much more efficient methods for encoding (binary) and transmission of messages.

  • Both flavors are supported by Node.js and Express.js, we will only use HTTP 1.1 for simplicity.

HTTP Requests

Message Format

  • begins with a request-line that includes a method, URI, and protocol version
  • followed by header fields containing request modifiers, client information, and representation metadata
  • an empty line to indicate the end of the header section,
  • and finally a message body containing the payload body if any

Request Line

method request-target HTTP1.1 \r\n
  • Method types: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE. See RFC7231

  • The request-target is either the entire URI or only the path part (slashes after the domain)

All general-purpose servers MUST support the methods GET and HEAD. All other methods are OPTIONAL.

HTTP Header Format

The message (request or response) start line is followed by one or more Headers. These headers have the form:

Header-Name: Information \r\n

Request Header Fields

  • Controls: Host, Cache-Control, Expect, …
  • Content Negotiation: Accept, Accept-Charset, Accept-Encoding, Accept-Language
  • Authentication Credentials: Authorization, Proxy-Authorization
  • Request Context: From, Referer, User-Agent

Example

request to www.grotto-networking.com:

GET / HTTP/1.1\r\n
Host: www.grotto-networking.com\r\n
Connection: keep-alive\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n
Accept-Encoding: gzip, deflate, sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
\r\n

Don’t Panic

  • The clients and servers we’ll use take care of most of the headers for us.
  • We will add a few headers (client or server side) for extra functionality.

HTTP Response

Response Message

  • begins with a status line that includes the protocol version, a success or error code, and textual reason phrase
  • possibly followed by header fields containing server information, resource metadata, and representation metadata
  • an empty line to indicate the end of the header section,
  • finally a message body containing the payload body, if any

Response Start Line

HTTP/1.1 status-code reason-phrase \r\n
  • Status-code A number between 100-505 (last I looked)
  • reason-phrase Text explaining the status codes

Response Headers

  • Control Data: Age, Cache-Control, Expires, Date, Location,…
  • Validator Fields: Etag, Last-Modified
  • Authentication Challenges: WWW-Authenticate, Proxy-Authenticate
  • Response Context: Accept-Ranges, Allow, Server

Meta-Data Headers

Used to describe body content RFC7231

  • Content-Type, Content-Encoding
  • Content-Language, Content-Location

Example Response

From www.grotto-networking.com

HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Wed, 12 Apr 2017 20:08:38 GMT\r\n
Content-Type: text/html\r\n
Transfer-Encoding: chunked\r\n
Connection: keep-alive\r\n
Vary: Accept-Encoding\r\n
Last-Modified: Thu, 16 Mar 2017 02:56:28 GMT\r\n
ETag: W/"5581190-2b5e-54ad0351c774c"\r\n
Content-Encoding: gzip\r\n
\r\n
[Page content gziped]

Page Loading

From MDN HTTP Overview

fetching a page

Simple page example:

Wireshark HTTP trace to www.grotto-networking.com:

Packet capture

Proxies

Proxies 1

  • IETF Definition
    • An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients.

Proxy Types

From Wikipedia Proxy Server

  • A forward proxy is an Internet-facing proxy used to retrieve data from a wide range of sources. Used for monitoring, content filtering, bypassing filters and censorship, caching, and more.

  • A reverse proxy is an internal-facing proxy used as a front-end to control access to servers on a private network. Common tasks include: load-balancing, authentication, decryption or caching.

Proxies in Development

We will want to use proxy functionality in development:

Application Layer Switching

Servers and proxies will perform different actions on requests messages based on:

  • The URL or portions of the URL

  • The HTTP Method (GET, POST, etc…)

  • We can think of this as application layer switching. High end servers such as NGINX and Apache 2 provide elaborate configuration options for this. Almost all servers provide some capabilities for this.

// reveal.js plugins