Skip to content

E2E Load Testing#190

Open
AltayAkkus wants to merge 6 commits intointernetarchive:masterfrom
AltayAkkus:master
Open

E2E Load Testing#190
AltayAkkus wants to merge 6 commits intointernetarchive:masterfrom
AltayAkkus:master

Conversation

@AltayAkkus
Copy link
Copy Markdown

@AltayAkkus AltayAkkus commented Apr 18, 2026

Hi there :),
there a quite few PRs open in gowarc for a pretty long time, in #186 @NGTmeaty noted

Both are pretty large changes to how gowarc operates and need to be done carefully to ensure we don't break anything

which I fully understand, gowarc needs to handle millions of requests to a diverse set of distinct webservers.
I hope this PR enables us to test gowarc in real-traffic scenarios, to have tests that if they pass allows us to ship changes to gowarc with the necessary confidence.

TL;DR

I wrote some utils to mock real webservers with all their quirks (delayed responses, mid-stream connection resets).

It allows you write end-to-end tests, performing thousands of requests against these servers, using gowarcs client.
You can assert that

  • the client forwarded the HTTP request to our testing server correctly
  • the HTTP response gowarc returns matches the expected response
  • the WARC file contains the expected requests/responses

Core concept: EchoServer

In order to test gowarc on the noisy reality of the web, we need a mechanism that is

a) flexible enough to emulate a variety of different servers and resources (POST JSON APIs, file servers, traditional HTML pages)
b) verifiable, so that we can assert that the HTTP request is correctly received by the target server, AND the resulting HTTP response is correctly returned and correctly stored in the WARC file.

To achieve this, I wrote a simple EchoServer, based on the httptest.Server.
Every request must contain a X-Echo-Imperative header, containing a JSON with this schema:

{
  "request": {
    "path": "/api/items/908", 
    "headers": {
      "Content-Type": "multipart/form-data" // optional additional headers
    },
    "method": "POST",
    "body": {
      "length": 26797,
      "encoding": "utf-8", // either utf-8 or binary
      "seed": 534203 // seed used to randomly generate the body
    }
  },
  "response": {
    "statusCode": 418,
    "headers": {
      "Content-Type": "application/json", // e.g.
    },
    "body": {
      "length": 106545,
      "encoding": "binary",
      "seed": 250538,
      "compress": "gzip" // empty, gzip, deflate or zstd
      //"filepath": "testdata/1GB.pdf"
      // length, encoding and seed are ignored if the filepath is set
    }
  },
  "transport": {
    "delay": 5,     // instructs the server to wait n milliseconds before writing the body
    "abort": false  // instructs the server to drop the connection after writing the headers
  }
}

The request object specifies a HTTP request, the EchoServer handler (e2e/echoserver/server.go) validates every incoming request according to it.

The response object specifies the HTTP response that our EchoServer shall answer with.
The body logic is shared between the request and response field in the Imperative.
One combination of length + encoding + seed will always produce the same body []byte.

The transport object allows injecting delays and/or connection drops.

The echoserve/imperative/generation.go contains functions to generate http.Request, bodies ([]byte) from Imperatives. The assertions.go contains functions to validate http.Request and http.Response against Imperatives.

The EchoServer always echo'es the Imperative in it's HTTP responses (under the same Header key as in the request).

We can validate every response returned by client.Do(req) and in the WARC file by reading the header, building the body from the Imperative's specification, and comparing it against the actual body returned.

Both the generation from and validation using Imperative's is thoroughly covered in tests (assertions_test.go && generation_test.go)

WARCValidator

The WARCValidator package first checks the validity of all digests present in the WARC files.
It also verifies revisits point to a existing record, and that their payload digests match.
For each record it reads the X-Echo-Imperative, and asserts that the specified headers are present in the record, aswell as that the body is byte equal to the specified body.
The detected errors are appended to a []ValidationError.

This does not allow us to check if all expected records are present within the WARC file, for that it returns a Tracker(map[string]Counts)
To check if all transmitted requests are present in the WARC file, it returns a Tracker (map[string]Counts).
The key is the Hash value of the Imperative, example output:

ImpHash: 4c75925205d8381c788f08756fcfcdcc141d63a10cc37bbef9, Count: {Request:11 Response:1 Revisit:10}
ImpHash: 2440882cfffcf8c9f560b99f051c5af5bda89d456f869f585d, Count: {Request:1 Response:1 Revisit:0}
ImpHash: bdbf010e8b80cc174fe1e2e3567f499bb71b05e1e9cf7b2ff7a, Count: {Request:1 Response:1 Revisit:0}

Proxy

Very simple SOCKS5 proxy, using things-go/go-socks5.
Additionally, this proxy records the number of requests. Our load_test.go uses that counter to validate that the requests were actually routed thru the proxy.

load_test.go

This test builds >1k imperatives with different methods, req/resp bodies, delays, aborts. A real stress test.
When creating a new EchoServer you can pick between HTTP/TLS, HTTP1.1/HTTP2 and IPv4/IPv6.
We create 8 servers with all possible configurations, and run the >1k requests against all these servers, concurrently.

The EchoServer checks if the request was properly transmitted via gowarc,
the load test checks that the returned response matches the specification,
and finally we validate all >2k WARC records, checking their existence, headers and bodies.
This is repeated twice, NewWARCWritingHTTPClient with and without a proxy.

You can define a local proxy for the load test, using the GOWARC_E2E_PROXY environment variable.

gowarc-load.mp4

You can use mitmproxy or Proxyman to intercept the requests.
Flipping a single bit in any of the thousands of requests will fail our test :)

Usage in Zeno

I deliberately build the e2e/echoserver package in a reusable way, enabling us to use it's capabilities to (also) add extensive e2e testing to Zeno.

Using the Imperative schema, we could emulate servers that e.g. host huge PDF files, get slower request by request, and occasionally drop the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant