Skip to content

[IonJava] Optimize text read/write hot paths (+18.1% aggregate JMH)#1151

Open
ganschen wants to merge 1 commit into
amazon-ion:masterfrom
ganschen:perf/text-hot-paths
Open

[IonJava] Optimize text read/write hot paths (+18.1% aggregate JMH)#1151
ganschen wants to merge 1 commit into
amazon-ion:masterfrom
ganschen:perf/text-hot-paths

Conversation

@ganschen
Copy link
Copy Markdown

Description

Targeted performance optimizations for the hottest text parsing and writing methods in com.amazon.ion.impl, benchmarked via a new TextHotPathBenchmark. Reader-side improvements dominate; writer-side changes are neutral.

Changes

Reader side

  • IonReaderTextRawTokensX.load_double_quoted_string / load_single_quoted_string: ASCII fast path that bypasses prohibited-char checks, UTF-8 decoding, and surrogate handling for the common case (printable ASCII, excluding the terminating quote and backslash).
  • IonReaderTextRawTokensX.load_symbol_identifier / skip_over_symbol_identifier: Read directly from the stream for valid symbol characters, skipping the newline check in read_char().
  • IonReaderTextRawTokensX.skip_over_blob / read_base64_byte_helper / read_base64_getchar_helper: Inline the whitespace skip so base64 data doesn't pay a function call per byte.
  • IonReaderTextUserX.isIonVersionMarker: Replace java.util.regex with character-by-character validation (drops Matcher allocation and regex engine dispatch).

Writer side

  • IonWriterSystemText.writeSeparator: Cache IonTextUtils.isAllWhitespace(_separator_character) in _separator_is_whitespace, refreshed whenever the separator changes.
  • OutputStreamFastAppendable.appendAscii(CharSequence) on non-String sequences: Batch writes instead of per-character buffer checks.
  • OutputStreamFastAppendable.appendAscii(char): Use the compile-time constant MAX_BYTES_LEN for the length check so JIT can fold it.

Correctness notes

  • load_double_quoted_string: The prohibited-char check must exclude \r (0x0D) and \n (0x0A) so they fall through to line_count(). Without this exclusion, multi-line double-quoted strings would throw "invalid character". Caught by SpotBugs flagging the downstream if (c == '\r' || c == '\n') as unreachable.
  • printCodePoints keeps the original single escapes[c] != null check rather than layering a lookup table on top. An extra NEVER_ESCAPED_ASCII table regressed long ASCII strings by 63% in an early draft, so it was not included.

Benchmark

Host: Amazon Corretto JDK 21.0.11, JMH 1.36, @BenchmarkMode(Mode.AverageTime), 1 fork, 3 warmup × 3s, 5 measurement × 3s.

Scenario Baseline (µs/op) Treatment (µs/op) Improvement
Read many ASCII double-quoted strings 874 382 +56%
Read a 32 KB base64 blob with whitespace 507 235 +54%
Read text with repeated $ion_1_0 IVMs 156 121 +23%
Read many ASCII symbol identifiers 924 778 +16%
Write 2000 decimals 180 186 −3% (noise)
Write 50 copies of a 28 KB ASCII string 958 966 −1% (noise)
Write 500 structs × 20 identifier fields 1364 1396 −2% (noise)
Write identifiers containing U+00E4 47 46 +1% (noise)

Tests

./gradlew test passes with ion-tests submodule initialized. ./gradlew spotbugsMain passes against the upstream baseline.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Targeted optimizations for the hottest text parsing and writing methods
in `com.amazon.ion.impl`, benchmarked via a new `TextHotPathBenchmark`.

## Changes

### Reader side

- `IonReaderTextRawTokensX.load_double_quoted_string` /
  `load_single_quoted_string`: ASCII fast path that bypasses
  prohibited-char checks, UTF-8 decoding, and surrogate handling for the
  common case (printable ASCII, excluding the terminating quote and
  backslash).
- `IonReaderTextRawTokensX.load_symbol_identifier` /
  `skip_over_symbol_identifier`: read directly from the stream for
  valid symbol characters, skipping the newline check in `read_char`.
- `IonReaderTextRawTokensX.skip_over_blob` /
  `read_base64_byte_helper` / `read_base64_getchar_helper`: inline the
  whitespace skip so base64 data doesn't pay a function call per byte.
- `IonReaderTextUserX.isIonVersionMarker`: replace `java.util.regex`
  with character-by-character validation (drops `Matcher` allocation
  and regex engine dispatch).

### Writer side

- `IonWriterSystemText.writeSeparator`: cache
  `IonTextUtils.isAllWhitespace(_separator_character)` in
  `_separator_is_whitespace`, refreshed whenever the separator changes.
- `OutputStreamFastAppendable.appendAscii(CharSequence)` on non-String
  sequences: batch writes instead of per-character buffer checks.
- `OutputStreamFastAppendable.appendAscii(char)`: use the compile-time
  constant `MAX_BYTES_LEN` for the length check so JIT can fold it.

## Correctness notes

- `load_double_quoted_string`: the prohibited-char check must exclude
  `\r` (0x0D) and `\n` (0x0A) so they fall through to `line_count()`.
  Without this exclusion, multi-line double-quoted strings would throw
  "invalid character". Caught by SpotBugs flagging the downstream
  `if (c == '\r' || c == '\n')` as unreachable.
- `printCodePoints` keeps the original single `escapes[c] != null`
  check rather than layering a lookup table on top. An extra
  `NEVER_ESCAPED_ASCII` table regressed long ASCII strings by 63% in
  an early draft, so it was not included.

## Deliberately not included

Two writer-side optimizations from earlier drafts were reverted after
JMH showed they regressed their target scenarios on JDK 21:

- A `tryWriteAsIdentifier` fast path in `writeSymbolToken` that combined
  classification + output in one pass. It hurt all writer scenarios
  (−2% to −5%) because most symbols ARE valid identifiers, so the
  "fast path" validation happens every time and then `_output.appendAscii`
  runs the same character walk again under the hood.
- A `SMALL_INT_STRINGS` cache for `Integer.toString` in `printDecimal`.
  It regressed `writeDecimalHeavyText` by 4.7%; the cache lookup +
  range-check cost more than JDK 21's already-fast `Integer.toString`
  for small values.
- Churn in `config/spotbugs/baseline.xml`. The draft regenerated the
  file with every bytecode-offset / source-line shift from the edits.
  After fixing the `\r`/`\n` correctness bug above, the only real new
  findings disappear, so the file reverts cleanly to upstream.

## Benchmark

Host: `dev-dsk-ganschen-2b`, Corretto JDK 21.0.10
(OpenJDK 64-Bit Server VM 21.0.10+7-LTS), JMH 1.36,
`-Mode AverageTime`, 1 fork, 3 warmup x 3s, 5 measurement x 3s.
Positive % = faster in this CR vs the pre-CR baseline at commit
`1e84f39b`.

| Scenario | Baseline (µs/op) | Treatment (µs/op) | Faster by |
|---|---:|---:|---:|
| Read many ASCII double-quoted strings | 874 | 377 | **+56.8%** |
| Read a 32 KB base64 blob with whitespace | 502 | 234 | **+53.4%** |
| Read text with repeated `$ion_1_0` IVMs | 158 | 120 | **+23.5%** |
| Read many ASCII symbol identifiers | 925 | 791 | **+14.5%** |
| Write 2000 decimals with scale/exp 0..19 | 181 | 180 | **+0.7%** |
| Write 50 copies of a 28 KB ASCII string | 991 | 994 | -0.3% |
| Write 500 structs x 20 identifier fields | 1,396 | 1,411 | -1.1% |
| Write identifiers containing U+00E4 | 46 | 46 | -0.3% |
| **Sum of all scenarios** | **5,072** | **4,152** | **+18.1%** |

Reader-side speedups dominate on real profiles where Ion text sits on
the parse path; reads account for ~60% of Amazon Profiler Flattener
self-CPU within `com.amazon.ion.impl`. The three remaining write-side
deltas are all ≤1.1% — within benchmark noise on a shared devdesk.

## Tests

`./gradlew test` passes (ion-tests submodule initialized);
`./gradlew spotbugsMain` passes against the upstream baseline.xml.

cr: https://code.amazon.com/reviews/CR-270698079
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 79.03226% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.98%. Comparing base (3c1b6b1) to head (531753b).
⚠️ Report is 148 commits behind head on master.

Files with missing lines Patch % Lines
...a/com/amazon/ion/impl/IonReaderTextRawTokensX.java 82.45% 2 Missing and 8 partials ⚠️
...n/java/com/amazon/ion/impl/IonReaderTextUserX.java 60.86% 3 Missing and 6 partials ⚠️
...om/amazon/ion/impl/OutputStreamFastAppendable.java 80.00% 1 Missing and 2 partials ⚠️
.../java/com/amazon/ion/impl/IonWriterSystemText.java 81.81% 2 Missing ⚠️
.../com/amazon/ion/impl/_Private_IonTextAppender.java 88.88% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1151      +/-   ##
============================================
+ Coverage     67.23%   67.98%   +0.75%     
- Complexity     5484     5662     +178     
============================================
  Files           159      160       +1     
  Lines         23025    23375     +350     
  Branches       4126     4201      +75     
============================================
+ Hits          15481    15892     +411     
+ Misses         6262     6178      -84     
- Partials       1282     1305      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant