Harden Accept-Encoding parsing to prevent NPE by metsw24-max · Pull Request #10307 · gwtproject/gwt

metsw24-max · 2026-04-09T06:33:21Z

This change hardens handling of the Accept-Encoding request header in the CodeServer by replacing unsafe substring matching with structured parsing logic.

Previously gzip negotiation relied on:

request.getHeader("Accept-Encoding").contains("gzip")

This approach had multiple issues:

Could throw a NullPointerException when the header is absent
Incorrectly accepted gzip even when explicitly disabled e.g., gzip;q=0
Ignored wildcard encodings (*)
Did not handle malformed or adversarial header inputs safely
Relied on brittle substring matching instead of structured parsing

Introduced a dedicated helper:

acceptsGzipEncoding(String header)

This method:

Safely handles null and empty headers
Parses encoding tokens and parameters
Correctly interprets q-values (q=)
Rejects gzip when q=0
Supports wildcard (*) fallback
Handles malformed inputs defensively (fails closed)

niloc132

This reads like it might be a fix for a security issue ("harden", "unsafe", the naming and javadoc of the test, etc), but you never come right out and say that it is. Can you confirm one way or another?

Is there a web browser (or other JS runtime) that exists which can load content from an http server that does not conform to the existing assumptions? Nothing about your description is wrong per se, but

The header is never absent in any js client i'm aware of
gzip is never disabled (or this would have stopped working, and I don't see an associated ticket?)
gzip is always explicitly listed (using "wildcard" gets into the situation of "sure, the server can send anything that I don't know about, which means I probably can't read it since I didn't list it")
adversarial requests are
- unlikely (bound to loopback by default),
- unproductive (they could have just gotten real content with a real request,
- and benign (the worst the attacker can do is cause a NPE which ... ensures they get no data?
"Relied on brittle substring matching instead of structured parsing" - that at least has merit, if there were ever an encoding contributed that has gzip as a substring - which I think is unlikely at best.

Nevertheless, there is a spec, and we're probably better off following it than not. Two notes there that I feel you missed:

The current implementation of the server cannot (or will not) support IDENTITY encoding, which is ostensibly the correct fallback if gzip is not present or disabled ("If a non-empty Accept-Encoding header field is present in a request and none of the available representations for the response have a content coding that is listed as acceptable, the origin server SHOULD send a response without any content coding unless the identity coding is indicated as unacceptable.")
If we fail a request due to encoding issues, the server should be sending 415, not 405 ("Servers that fail a request due to an unsupported content coding ought to respond with a 415 (Unsupported Media Type) status and include an Accept-Encoding header field in that response...")

All told, I don't think this change is necessary, but it isn't wrong either. It feels a bit like churn for its own sake, with no actual problem being solved, but it also isn't keeping to the spec rigorously enough to seem like an actual "fix". What do you think?

niloc132 · 2026-04-09T13:01:27Z

+public class WebServerSecurityTest extends TestCase {
+
+  public void testAcceptsGzipEncodingRejectsNullOrEmptyHeader() {
+    assertFalse(WebServer.acceptsGzipEncoding(null));


This is wrong, isn't it?

https://httpwg.org/specs/rfc9110.html#field.accept-encoding

If no Accept-Encoding header field is in the request, any content coding is considered acceptable by the user agent.

niloc132 · 2026-04-09T13:04:06Z

+  public void testAcceptsGzipEncodingRejectsMalformedQValue() {
+    assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number"));
+  }
+}


As per your note about structured parsing, this should include a test for something with gzip as a substring of its name - or as an alternative "metadata" for the named value?

Please also test the supported variations in blank spaces, other valid characters

niloc132 · 2026-04-09T13:06:32Z

+      double qValue = 1.0;
+      for (int i = 1; i < parts.length; i++) {
+        String parameter = parts[i].trim().toLowerCase(Locale.ROOT);
+        if (parameter.startsWith("q=")) {


This seems to imply that other parameters are supported, is that correct? By spec it doesn't seem allowed.

niloc132 · 2026-04-09T13:08:30Z

+            qValue = Double.parseDouble(qValueText);
+          } catch (NumberFormatException e) {
+            qValue = 0.0;


Allowed range is explicitly [0-1], and precision is <=3 decimal places https://httpwg.org/specs/rfc9110.html#quality.values - we should perhaps fail if outside that range?

metsw24-max · 2026-04-09T15:14:11Z

you were right to call this out.

I’ve updated the change to focus on correctness and RFC 9110 compliance for Accept-Encoding handling, rather than framing it as a security fix.

This revision addresses several issues in the previous implementation:

Missing or empty Accept-Encoding headers are now treated as accepting gzip (per RFC 9110)
Substring matching has been replaced with token-based parsing (avoids cases like "xgzip")
q-values are validated within the [0,1] range, with malformed values treated as not acceptable
Wildcard handling and precedence between explicit encodings and "*" now follow expected behavior
When no acceptable representation is available, the response returns 415 with an Accept-Encoding header

I also:

Renamed the tests to remove the security framing
Added coverage for substring edge cases, malformed q-values, wildcard handling, and missing headers
Removed Bazel BUILD changes to stay aligned with the project’s build setup

Please let me know if you’d prefer stricter handling of invalid parameters or further alignment with the spec.

niloc132 · 2026-04-09T16:06:28Z

@@ -0,0 +1,70 @@
+/*
+ * Copyright 2026 The GWT Project Authors.


Suggested change

* Copyright 2026 The GWT Project Authors.

* Copyright 2026 The GWT Project Authors

niloc132 · 2026-04-09T16:13:13Z

+      double qValue = qValueText == null ? 1.0 : Double.parseDouble(qValueText);
+
+      if (encoding.equals("gzip")) {
+        gzipQValue = qValue;


can't we return early here? do we need to track the q values at all after we check a given entry?

if the encoding is valid but isn't * or gzip, we ignore it. if it is either * or gzip and has a q value greater than zero, we return true. if no * or gzip had a value greater than zero, we return false.

what am I missing that we can't substantially trim this down?

niloc132 · 2026-04-09T16:14:34Z

+      // RFC 9110: if Accept-Encoding is absent, any content coding is acceptable.
+      return true;
+    }
+    if (acceptEncodingHeader.trim().isEmpty()) {


Suggested change

if (acceptEncodingHeader.trim().isEmpty()) {

if (acceptEncodingHeader.isBlank()) {

that said, if i'm right about simplifying, this can be included in the final return false, no need for this "present but nothing in it" check.

niloc132 · 2026-04-09T16:16:01Z

+    for (String encodingSpec : acceptEncodingHeader.split(",")) {
+      Matcher matcher = ACCEPT_ENCODING_SPEC.matcher(encodingSpec);
+      if (!matcher.matches()) {
+        continue;


contents don't adhere to the spec, can we continue parsing at all? should this return false instead?

this current implementation takes a more tolerant approach: it ignores malformed entries and continues parsing valid ones so a single bad token doesn’t cause the entire Accept-Encoding header to be rejected.

i will align with whichever behavior you prefer here:

reject entire header on any malformed entry or

ignore invalid entries and continue parsing

Let me know which direction you’d like

niloc132 · 2026-04-09T16:20:05Z


+  private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile(
+      "^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*"
+          + "(0(?:\\.\\d{0,3})?|1(?:\\.0{0,3})?))?\\s*$");


technically you could use two capture groups instead of one containing two non-capture groups. then, you can use their presence/absence to decide "is it zero" or "is it anything valid other than zero", and skip the Double.parseDouble entirely, leaving the work to the state machine.

niloc132 · 2026-04-09T16:22:24Z

+  public void testAcceptsGzipEncodingRejectsMalformedQValue() {
+    assertFalse(WebServer.acceptsGzipEncoding("gzip;q=not-a-number"));
+  }
+}


Please also test the supported variations in blank spaces, other valid characters

niloc132 · 2026-04-09T20:12:00Z

  private static final Pattern CACHE_JS_FILE = Pattern.compile("/(" + STRONG_NAME + ").cache.js$");

+  private static final Pattern ACCEPT_ENCODING_SPEC = Pattern.compile(
+      "^\\s*([!#$%&'*+.^_`|~0-9A-Za-z-]+|\\*)\\s*(?:;\\s*q\\s*=\\s*"


OWS doesn't appear to be allowed at any point after q in the weight (and "BWS" is apparently only supported in specific cases around auth and transfer-encoding, TIL). Note also that all forms of "whitespace" in the spec are required to be limited to SP and TAB, while \s includes others including CR, LF, NBSP.

https://httpwg.org/specs/rfc9110.html#quality.values
https://httpwg.org/specs/rfc9110.html#whitespace

Or is that a concern that other servers deal with in the wild?

This regex is not easy to read - can you break it down into smaller segments to assemble it to be easier to validate?

zbynek · 2026-04-09T21:40:32Z

@metsw24-max I'd suggest using less AI and more of your own words in pull request descriptions. The PR description should make it clear why you are suggesting these changes, not just what the theoretical benefits of the contribution are.

The reasoning for the null pointer fix is clear -- that would warrant a one line change.

The rest looks like a lot of low level code that partially solves a problem that's only theoretical. If we decide to ignore the problem, we can just delete that code (fine by me). If we decide to solve it properly, we should use a higher level solution that handles all edge-cases.
Since this part of code depends on Jetty, you should be able to do something like

boolean acceptsGzip(HttpServletRequest request) {
            QuotedQualityCSV qsv = new QuotedQualityCSV(); // org.eclipse.jetty.http.QuotedQualityCSV
            String encoding = request.getHeader("Accept-Encoding");
            qsv.addValue(encoding);
            return encoding == null || qsv.getValues().stream().anyMatch(v -> v.equals("gzip") || v.equals("*"));
}

(untested)

metsw24-max · 2026-04-10T03:30:01Z

@zbynek I verified Jetty QuotedQualityCSV behavior for the edge cases.

It consistently excludes entries with q=0, and uses wildcard as fallback when present.

The only remaining ambiguity is precedence for gzip;q=0,*;q=1, which currently resolves to wildcard-based acceptance in this Jetty-backed implementation.

Is that acceptable for this change, or would you prefer explicit rejection when gzip appears with q=0 even if wildcard exists?

zbynek · 2026-04-15T22:05:01Z

precedence for gzip;q=0,*;q=1,

If a client sends that header, I'd consider that an error on the client side (thought the syntax is OK, this allows server to send made up encodings the client can't possibly decode), so let's not worry about the case.

Harden Accept-Encoding parsing to prevent NPE

64729b6

niloc132 reviewed Apr 9, 2026

View reviewed changes

correctness and RFC 9110 compliance for Accept-Encoding handling

21240b7

niloc132 reviewed Apr 9, 2026

View reviewed changes

	* Copyright 2026 The GWT Project Authors.
	* Copyright 2026 The GWT Project Authors

	if (acceptEncodingHeader.trim().isEmpty()) {
	if (acceptEncodingHeader.isBlank()) {

Conversation

metsw24-max commented Apr 9, 2026

Uh oh!

niloc132 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metsw24-max commented Apr 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zbynek commented Apr 9, 2026

Uh oh!

metsw24-max commented Apr 10, 2026

Uh oh!

zbynek commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

niloc132 left a comment •

edited

Loading