Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi there :),
there a quite few PRs open in
gowarcfor a pretty long time, in #186 @NGTmeaty notedwhich I fully understand,
gowarcneeds to handle millions of requests to a diverse set of distinct webservers.I hope this PR enables us to test
gowarcin real-traffic scenarios, to have tests that if they pass allows us to ship changes togowarcwith the necessary confidence.TL;DR
I wrote some utils to mock real webservers with all their quirks (delayed responses, mid-stream connection resets).
It allows you write end-to-end tests, performing thousands of requests against these servers, using
gowarcs client.You can assert that
gowarcreturns matches the expected responseCore concept: EchoServer
In order to test
gowarcon the noisy reality of the web, we need a mechanism that isa) flexible enough to emulate a variety of different servers and resources (POST JSON APIs, file servers, traditional HTML pages)
b) verifiable, so that we can assert that the HTTP request is correctly received by the target server, AND the resulting HTTP response is correctly returned and correctly stored in the WARC file.
To achieve this, I wrote a simple EchoServer, based on the
httptest.Server.Every request must contain a
X-Echo-Imperativeheader, containing a JSON with this schema:The
requestobject specifies a HTTP request, the EchoServer handler (e2e/echoserver/server.go) validates every incoming request according to it.The
responseobject specifies the HTTP response that our EchoServer shall answer with.The
bodylogic is shared between the request and response field in the Imperative.One combination of length + encoding + seed will always produce the same body []byte.
The
transportobject allows injecting delays and/or connection drops.The
echoserve/imperative/generation.gocontains functions to generatehttp.Request, bodies ([]byte) from Imperatives. Theassertions.gocontains functions to validatehttp.Requestandhttp.Responseagainst Imperatives.The EchoServer always echo'es the Imperative in it's HTTP responses (under the same Header key as in the request).
We can validate every response returned by
client.Do(req)and in the WARC file by reading the header, building the body from the Imperative's specification, and comparing it against the actual body returned.Both the generation from and validation using Imperative's is thoroughly covered in tests (
assertions_test.go&&generation_test.go)WARCValidator
The WARCValidator package first checks the validity of all digests present in the WARC files.
It also verifies
revisits point to a existing record, and that their payload digests match.For each record it reads the
X-Echo-Imperative, and asserts that the specified headers are present in the record, aswell as that the body is byte equal to the specified body.The detected errors are appended to a
[]ValidationError.This does not allow us to check if all expected records are present within the WARC file, for that it returns a Tracker(
map[string]Counts)To check if all transmitted requests are present in the WARC file, it returns a Tracker (map[string]Counts).
The key is the Hash value of the Imperative, example output:
Proxy
Very simple SOCKS5 proxy, using
things-go/go-socks5.Additionally, this proxy records the number of requests. Our
load_test.gouses that counter to validate that the requests were actually routed thru the proxy.load_test.go
This test builds >1k imperatives with different methods, req/resp bodies, delays, aborts. A real stress test.
When creating a new EchoServer you can pick between HTTP/TLS, HTTP1.1/HTTP2 and IPv4/IPv6.
We create 8 servers with all possible configurations, and run the >1k requests against all these servers, concurrently.
The EchoServer checks if the request was properly transmitted via
gowarc,the load test checks that the returned response matches the specification,
and finally we validate all >2k WARC records, checking their existence, headers and bodies.
This is repeated twice,
NewWARCWritingHTTPClientwith and without a proxy.You can define a local proxy for the load test, using the
GOWARC_E2E_PROXYenvironment variable.gowarc-load.mp4
You can use mitmproxy or Proxyman to intercept the requests.
Flipping a single bit in any of the thousands of requests will fail our test :)
Usage in Zeno
I deliberately build the
e2e/echoserverpackage in a reusable way, enabling us to use it's capabilities to (also) add extensive e2e testing to Zeno.Using the Imperative schema, we could emulate servers that e.g. host huge PDF files, get slower request by request, and occasionally drop the connection.