Harden Accept-Encoding parsing to prevent NPE#10307
Harden Accept-Encoding parsing to prevent NPE#10307metsw24-max wants to merge 2 commits intogwtproject:mainfrom
Conversation
There was a problem hiding this comment.
This reads like it might be a fix for a security issue ("harden", "unsafe", the naming and javadoc of the test, etc), but you never come right out and say that it is. Can you confirm one way or another?
Is there a web browser (or other JS runtime) that exists which can load content from an http server that does not conform to the existing assumptions? Nothing about your description is wrong per se, but
- The header is never absent in any js client i'm aware of
- gzip is never disabled (or this would have stopped working, and I don't see an associated ticket?)
- gzip is always explicitly listed (using "wildcard" gets into the situation of "sure, the server can send anything that I don't know about, which means I probably can't read it since I didn't list it")
- adversarial requests are
- unlikely (bound to loopback by default),
- unproductive (they could have just gotten real content with a real request,
- and benign (the worst the attacker can do is cause a NPE which ... ensures they get no data?
- "Relied on brittle substring matching instead of structured parsing" - that at least has merit, if there were ever an encoding contributed that has gzip as a substring - which I think is unlikely at best.
Nevertheless, there is a spec, and we're probably better off following it than not. Two notes there that I feel you missed:
- The current implementation of the server cannot (or will not) support IDENTITY encoding, which is ostensibly the correct fallback if gzip is not present or disabled ("If a non-empty Accept-Encoding header field is present in a request and none of the available representations for the response have a content coding that is listed as acceptable, the origin server SHOULD send a response without any content coding unless the identity coding is indicated as unacceptable.")
- If we fail a request due to encoding issues, the server should be sending 415, not 405 ("Servers that fail a request due to an unsupported content coding ought to respond with a 415 (Unsupported Media Type) status and include an Accept-Encoding header field in that response...")
All told, I don't think this change is necessary, but it isn't wrong either. It feels a bit like churn for its own sake, with no actual problem being solved, but it also isn't keeping to the spec rigorously enough to seem like an actual "fix". What do you think?
| public class WebServerSecurityTest extends TestCase { | ||
|
|
||
| public void testAcceptsGzipEncodingRejectsNullOrEmptyHeader() { | ||
| assertFalse(WebServer.acceptsGzipEncoding(null)); |
There was a problem hiding this comment.
This is wrong, isn't it?
https://httpwg.org/specs/rfc9110.html#field.accept-encoding
If no Accept-Encoding header field is in the request, any content coding is considered acceptable by the user agent.
| public void testAcceptsGzipEncodingRejectsMalformedQValue() { | ||
| assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number")); | ||
| } | ||
| } |
There was a problem hiding this comment.
As per your note about structured parsing, this should include a test for something with gzip as a substring of its name - or as an alternative "metadata" for the named value?
There was a problem hiding this comment.
Please also test the supported variations in blank spaces, other valid characters
| double qValue = 1.0; | ||
| for (int i = 1; i < parts.length; i++) { | ||
| String parameter = parts[i].trim().toLowerCase(Locale.ROOT); | ||
| if (parameter.startsWith("q=")) { |
There was a problem hiding this comment.
This seems to imply that other parameters are supported, is that correct? By spec it doesn't seem allowed.
| qValue = Double.parseDouble(qValueText); | ||
| } catch (NumberFormatException e) { | ||
| qValue = 0.0; |
There was a problem hiding this comment.
Allowed range is explicitly [0-1], and precision is <=3 decimal places https://httpwg.org/specs/rfc9110.html#quality.values - we should perhaps fail if outside that range?
|
you were right to call this out. I’ve updated the change to focus on correctness and RFC 9110 compliance for Accept-Encoding handling, rather than framing it as a security fix. This revision addresses several issues in the previous implementation:
I also:
Please let me know if you’d prefer stricter handling of invalid parameters or further alignment with the spec. |
| @@ -0,0 +1,70 @@ | |||
| /* | |||
| * Copyright 2026 The GWT Project Authors. | |||
There was a problem hiding this comment.
| * Copyright 2026 The GWT Project Authors. | |
| * Copyright 2026 The GWT Project Authors |
| double qValue = qValueText == null ? 1.0 : Double.parseDouble(qValueText); | ||
|
|
||
| if (encoding.equals("gzip")) { | ||
| gzipQValue = qValue; |
There was a problem hiding this comment.
can't we return early here? do we need to track the q values at all after we check a given entry?
if the encoding is valid but isn't * or gzip, we ignore it. if it is either * or gzip and has a q value greater than zero, we return true. if no * or gzip had a value greater than zero, we return false.
what am I missing that we can't substantially trim this down?
| // RFC 9110: if Accept-Encoding is absent, any content coding is acceptable. | ||
| return true; | ||
| } | ||
| if (acceptEncodingHeader.trim().isEmpty()) { |
There was a problem hiding this comment.
| if (acceptEncodingHeader.trim().isEmpty()) { | |
| if (acceptEncodingHeader.isBlank()) { |
that said, if i'm right about simplifying, this can be included in the final return false, no need for this "present but nothing in it" check.
| for (String encodingSpec : acceptEncodingHeader.split(",")) { | ||
| Matcher matcher = ACCEPT_ENCODING_SPEC.matcher(encodingSpec); | ||
| if (!matcher.matches()) { | ||
| continue; |
There was a problem hiding this comment.
contents don't adhere to the spec, can we continue parsing at all? should this return false instead?
There was a problem hiding this comment.
this current implementation takes a more tolerant approach: it ignores malformed entries and continues parsing valid ones so a single bad token doesn’t cause the entire Accept-Encoding header to be rejected.
i will align with whichever behavior you prefer here:
- reject entire header on any malformed entry or
- ignore invalid entries and continue parsing
Let me know which direction you’d like
|
|
||
| private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile( | ||
| "^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*" | ||
| + "(0(?:\\.\\d{0,3})?|1(?:\\.0{0,3})?))?\\s*$"); |
There was a problem hiding this comment.
technically you could use two capture groups instead of one containing two non-capture groups. then, you can use their presence/absence to decide "is it zero" or "is it anything valid other than zero", and skip the Double.parseDouble entirely, leaving the work to the state machine.
| public void testAcceptsGzipEncodingRejectsMalformedQValue() { | ||
| assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number")); | ||
| } | ||
| } |
There was a problem hiding this comment.
Please also test the supported variations in blank spaces, other valid characters
| private static final Pattern CACHE_JS_FILE = Pattern.compile("/(" + STRONG_NAME + ").cache.js$"); | ||
|
|
||
| private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile( | ||
| "^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*" |
There was a problem hiding this comment.
OWS doesn't appear to be allowed at any point after q in the weight (and "BWS" is apparently only supported in specific cases around auth and transfer-encoding, TIL). Note also that all forms of "whitespace" in the spec are required to be limited to SP and TAB, while \s includes others including CR, LF, NBSP.
https://httpwg.org/specs/rfc9110.html#quality.values
https://httpwg.org/specs/rfc9110.html#whitespace
Or is that a concern that other servers deal with in the wild?
This regex is not easy to read - can you break it down into smaller segments to assemble it to be easier to validate?
|
@metsw24-max I'd suggest using less AI and more of your own words in pull request descriptions. The PR description should make it clear why you are suggesting these changes, not just what the theoretical benefits of the contribution are. The reasoning for the null pointer fix is clear -- that would warrant a one line change. The rest looks like a lot of low level code that partially solves a problem that's only theoretical. If we decide to ignore the problem, we can just delete that code (fine by me). If we decide to solve it properly, we should use a higher level solution that handles all edge-cases. boolean acceptsGzip(HttpServletRequest request) {
QuotedQualityCSV qsv = new QuotedQualityCSV(); // org.eclipse.jetty.http.QuotedQualityCSV
String encoding = request.getHeader("Accept-Encoding");
qsv.addValue(encoding);
return encoding == null || qsv.getValues().stream().anyMatch(v -> v.equals("gzip") || v.equals("*"));
}(untested) |
|
@zbynek I verified Jetty It consistently excludes entries with The only remaining ambiguity is precedence for Is that acceptable for this change, or would you prefer explicit rejection when |
If a client sends that header, I'd consider that an error on the client side (thought the syntax is OK, this allows server to send made up encodings the client can't possibly decode), so let's not worry about the case. |
This change hardens handling of the
Accept-Encodingrequest header in the CodeServer by replacing unsafe substring matching with structured parsing logic.Previously gzip negotiation relied on:
request.getHeader("Accept-Encoding").contains("gzip")
This approach had multiple issues:
gzip;q=0*)Introduced a dedicated helper:
acceptsGzipEncoding(String header)
This method:
q=)q=0*) fallback