Skip to content

Harden Accept-Encoding parsing to prevent NPE#10307

Open
metsw24-max wants to merge 2 commits intogwtproject:mainfrom
metsw24-max:harden-accept-encoding-gzip-handling
Open

Harden Accept-Encoding parsing to prevent NPE#10307
metsw24-max wants to merge 2 commits intogwtproject:mainfrom
metsw24-max:harden-accept-encoding-gzip-handling

Conversation

@metsw24-max
Copy link
Copy Markdown

This change hardens handling of the Accept-Encoding request header in the CodeServer by replacing unsafe substring matching with structured parsing logic.

Previously gzip negotiation relied on:

request.getHeader("Accept-Encoding").contains("gzip")

This approach had multiple issues:

  • Could throw a NullPointerException when the header is absent
  • Incorrectly accepted gzip even when explicitly disabled e.g., gzip;q=0
  • Ignored wildcard encodings (*)
  • Did not handle malformed or adversarial header inputs safely
  • Relied on brittle substring matching instead of structured parsing

Introduced a dedicated helper:

acceptsGzipEncoding(String header)

This method:

  • Safely handles null and empty headers
  • Parses encoding tokens and parameters
  • Correctly interprets q-values (q=)
  • Rejects gzip when q=0
  • Supports wildcard (*) fallback
  • Handles malformed inputs defensively (fails closed)

Copy link
Copy Markdown
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like it might be a fix for a security issue ("harden", "unsafe", the naming and javadoc of the test, etc), but you never come right out and say that it is. Can you confirm one way or another?

Is there a web browser (or other JS runtime) that exists which can load content from an http server that does not conform to the existing assumptions? Nothing about your description is wrong per se, but

  • The header is never absent in any js client i'm aware of
  • gzip is never disabled (or this would have stopped working, and I don't see an associated ticket?)
  • gzip is always explicitly listed (using "wildcard" gets into the situation of "sure, the server can send anything that I don't know about, which means I probably can't read it since I didn't list it")
  • adversarial requests are
    • unlikely (bound to loopback by default),
    • unproductive (they could have just gotten real content with a real request,
    • and benign (the worst the attacker can do is cause a NPE which ... ensures they get no data?
  • "Relied on brittle substring matching instead of structured parsing" - that at least has merit, if there were ever an encoding contributed that has gzip as a substring - which I think is unlikely at best.

Nevertheless, there is a spec, and we're probably better off following it than not. Two notes there that I feel you missed:

  • The current implementation of the server cannot (or will not) support IDENTITY encoding, which is ostensibly the correct fallback if gzip is not present or disabled ("If a non-empty Accept-Encoding header field is present in a request and none of the available representations for the response have a content coding that is listed as acceptable, the origin server SHOULD send a response without any content coding unless the identity coding is indicated as unacceptable.")
  • If we fail a request due to encoding issues, the server should be sending 415, not 405 ("Servers that fail a request due to an unsupported content coding ought to respond with a 415 (Unsupported Media Type) status and include an Accept-Encoding header field in that response...")

All told, I don't think this change is necessary, but it isn't wrong either. It feels a bit like churn for its own sake, with no actual problem being solved, but it also isn't keeping to the spec rigorously enough to seem like an actual "fix". What do you think?

public class WebServerSecurityTest extends TestCase {

public void testAcceptsGzipEncodingRejectsNullOrEmptyHeader() {
assertFalse(WebServer.acceptsGzipEncoding(null));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, isn't it?

https://httpwg.org/specs/rfc9110.html#field.accept-encoding

If no Accept-Encoding header field is in the request, any content coding is considered acceptable by the user agent.

Comment thread dev/codeserver/javatests/com/google/gwt/dev/codeserver/WebServerSecurityTest.java Outdated
public void testAcceptsGzipEncodingRejectsMalformedQValue() {
assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number"));
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your note about structured parsing, this should include a test for something with gzip as a substring of its name - or as an alternative "metadata" for the named value?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test the supported variations in blank spaces, other valid characters

Comment thread dev/codeserver/BUILD Outdated
double qValue = 1.0;
for (int i = 1; i < parts.length; i++) {
String parameter = parts[i].trim().toLowerCase(Locale.ROOT);
if (parameter.startsWith("q=")) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that other parameters are supported, is that correct? By spec it doesn't seem allowed.

Comment on lines +573 to +575
qValue = Double.parseDouble(qValueText);
} catch (NumberFormatException e) {
qValue = 0.0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowed range is explicitly [0-1], and precision is <=3 decimal places https://httpwg.org/specs/rfc9110.html#quality.values - we should perhaps fail if outside that range?

Comment thread dev/codeserver/javatests/com/google/gwt/dev/codeserver/WebServerSecurityTest.java Outdated
@metsw24-max
Copy link
Copy Markdown
Author

you were right to call this out.

I’ve updated the change to focus on correctness and RFC 9110 compliance for Accept-Encoding handling, rather than framing it as a security fix.

This revision addresses several issues in the previous implementation:

  • Missing or empty Accept-Encoding headers are now treated as accepting gzip (per RFC 9110)
  • Substring matching has been replaced with token-based parsing (avoids cases like "xgzip")
  • q-values are validated within the [0,1] range, with malformed values treated as not acceptable
  • Wildcard handling and precedence between explicit encodings and "*" now follow expected behavior
  • When no acceptable representation is available, the response returns 415 with an Accept-Encoding header

I also:

  • Renamed the tests to remove the security framing
  • Added coverage for substring edge cases, malformed q-values, wildcard handling, and missing headers
  • Removed Bazel BUILD changes to stay aligned with the project’s build setup

Please let me know if you’d prefer stricter handling of invalid parameters or further alignment with the spec.

@@ -0,0 +1,70 @@
/*
* Copyright 2026 The GWT Project Authors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright 2026 The GWT Project Authors.
* Copyright 2026 The GWT Project Authors

double qValue = qValueText == null ? 1.0 : Double.parseDouble(qValueText);

if (encoding.equals("gzip")) {
gzipQValue = qValue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we return early here? do we need to track the q values at all after we check a given entry?

if the encoding is valid but isn't * or gzip, we ignore it. if it is either * or gzip and has a q value greater than zero, we return true. if no * or gzip had a value greater than zero, we return false.

what am I missing that we can't substantially trim this down?

// RFC 9110: if Accept-Encoding is absent, any content coding is acceptable.
return true;
}
if (acceptEncodingHeader.trim().isEmpty()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (acceptEncodingHeader.trim().isEmpty()) {
if (acceptEncodingHeader.isBlank()) {

that said, if i'm right about simplifying, this can be included in the final return false, no need for this "present but nothing in it" check.

for (String encodingSpec : acceptEncodingHeader.split(",")) {
Matcher matcher = ACCEPT_ENCODING_SPEC.matcher(encodingSpec);
if (!matcher.matches()) {
continue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contents don't adhere to the spec, can we continue parsing at all? should this return false instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this current implementation takes a more tolerant approach: it ignores malformed entries and continues parsing valid ones so a single bad token doesn’t cause the entire Accept-Encoding header to be rejected.

i will align with whichever behavior you prefer here:

  • reject entire header on any malformed entry or
  • ignore invalid entries and continue parsing

Let me know which direction you’d like


private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile(
"^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*"
+ "(0(?:\\.\\d{0,3})?|1(?:\\.0{0,3})?))?\\s*$");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically you could use two capture groups instead of one containing two non-capture groups. then, you can use their presence/absence to decide "is it zero" or "is it anything valid other than zero", and skip the Double.parseDouble entirely, leaving the work to the state machine.

public void testAcceptsGzipEncodingRejectsMalformedQValue() {
assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number"));
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test the supported variations in blank spaces, other valid characters

private static final Pattern CACHE_JS_FILE = Pattern.compile("/(" + STRONG_NAME + ").cache.js$");

private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile(
"^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OWS doesn't appear to be allowed at any point after q in the weight (and "BWS" is apparently only supported in specific cases around auth and transfer-encoding, TIL). Note also that all forms of "whitespace" in the spec are required to be limited to SP and TAB, while \s includes others including CR, LF, NBSP.

https://httpwg.org/specs/rfc9110.html#quality.values
https://httpwg.org/specs/rfc9110.html#whitespace

Or is that a concern that other servers deal with in the wild?

This regex is not easy to read - can you break it down into smaller segments to assemble it to be easier to validate?

@zbynek
Copy link
Copy Markdown
Collaborator

zbynek commented Apr 9, 2026

@metsw24-max I'd suggest using less AI and more of your own words in pull request descriptions. The PR description should make it clear why you are suggesting these changes, not just what the theoretical benefits of the contribution are.

The reasoning for the null pointer fix is clear -- that would warrant a one line change.

The rest looks like a lot of low level code that partially solves a problem that's only theoretical. If we decide to ignore the problem, we can just delete that code (fine by me). If we decide to solve it properly, we should use a higher level solution that handles all edge-cases.
Since this part of code depends on Jetty, you should be able to do something like

boolean acceptsGzip(HttpServletRequest request) {
            QuotedQualityCSV qsv = new QuotedQualityCSV(); // org.eclipse.jetty.http.QuotedQualityCSV
            String encoding = request.getHeader("Accept-Encoding");
            qsv.addValue(encoding);
            return encoding == null || qsv.getValues().stream().anyMatch(v -> v.equals("gzip") || v.equals("*"));
}

(untested)

@metsw24-max
Copy link
Copy Markdown
Author

@zbynek I verified Jetty QuotedQualityCSV behavior for the edge cases.

It consistently excludes entries with q=0, and uses wildcard as fallback when present.

The only remaining ambiguity is precedence for gzip;q=0,*;q=1, which currently resolves to wildcard-based acceptance in this Jetty-backed implementation.

Is that acceptable for this change, or would you prefer explicit rejection when gzip appears with q=0 even if wildcard exists?

@zbynek
Copy link
Copy Markdown
Collaborator

zbynek commented Apr 15, 2026

precedence for gzip;q=0,*;q=1,

If a client sends that header, I'd consider that an error on the client side (thought the syntax is OK, this allows server to send made up encodings the client can't possibly decode), so let's not worry about the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants