Skip to content

perf: test vectorized varint algo#811

Open
anthony-swirldslabs wants to merge 3 commits intomainfrom
810-vectorVarInt
Open

perf: test vectorized varint algo#811
anthony-swirldslabs wants to merge 3 commits intomainfrom
810-vectorVarInt

Conversation

@anthony-swirldslabs
Copy link
Copy Markdown
Contributor

@anthony-swirldslabs anthony-swirldslabs commented May 1, 2026

Description:
Introducing a vectorized LEB128 algo for reading varint values that uses a fully unrolled loop and employs a "negative limit" trick to avoid explicit limit checks. It's 4x times faster for 1-byte varints than our current implementation. It's consistently and equally fast for 2, 3, 4, and 5-byte varints as well: 2.4x faster for 2 byte and 2x-9x faster for longer encodings.

A varint.md is added to describe the algorithm, so that we don't have to repeat the lengthy doc in every implementation. The core PBJ implementations will be replaced in a separate PR in the future.

Also, a unit test is added to verify the correctness of the algorithm.

UPDATE ON 5/4/2026 morning: A bug in the benchmark implementation has been discovered where the position wasn't properly updated after finishing reading a varint. The first table below is updated with the new results, which look rather disappointing now. The second table is removed as the old results are no longer relevant.

UPDATE ON 5/4/2026 afternoon: After more tweaking, as well as restoring the fair conditions between the new algo and the existing pbj implementation in terms of using the long type and actually checking the limit against the length of the buffer, here's the updated results on Mac aarch64:

Benchmark results:

Benchmark                               (range)   Mode  Cnt     Score    Error   Units
VarIntByteArrayReadBench.pbj                  1  thrpt   15  1352.686 ±  5.331  ops/us
VarIntByteArrayReadBench.pbj                  2  thrpt   15   355.533 ±  2.242  ops/us
VarIntByteArrayReadBench.pbj                  3  thrpt   15   413.225 ±  1.534  ops/us
VarIntByteArrayReadBench.pbj                  4  thrpt   15   293.223 ±  1.596  ops/us
VarIntByteArrayReadBench.pbj                  5  thrpt   15   320.100 ±  6.319  ops/us
VarIntByteArrayReadBench.vector_zigZag        1  thrpt   15  1529.252 ±  9.868  ops/us
VarIntByteArrayReadBench.vector_zigZag        2  thrpt   15   972.393 ± 12.976  ops/us
VarIntByteArrayReadBench.vector_zigZag        3  thrpt   15   596.073 ±  2.460  ops/us
VarIntByteArrayReadBench.vector_zigZag        4  thrpt   15   581.045 ±  3.294  ops/us
VarIntByteArrayReadBench.vector_zigZag        5  thrpt   15   442.047 ±  1.256  ops/us

The numbers may not look as impressive as the old broken implementations showed, but we still get a performance boost of some 13% for 1 byte varint. For 2-byte varint the performance boost is actually 2.7x, which looks pretty good. Longer varints are improved by some 40% or thereabout, which again isn't bad at all.

I'll share results on an AMD once I get them.

Related issue(s):

Fixes #810

Notes for reviewer:
All tests should pass.

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
@anthony-swirldslabs anthony-swirldslabs self-assigned this May 1, 2026
@anthony-swirldslabs anthony-swirldslabs requested review from a team as code owners May 1, 2026 23:18
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

JUnit Test Report

    81 files  ±0      81 suites  ±0   3m 16s ⏱️ ±0s
 1 519 tests ±0   1 515 ✅ ±0   4 💤 ±0  0 ❌ ±0 
10 407 runs  ±0  10 379 ✅ ±0  28 💤 ±0  0 ❌ ±0 

Results for commit ff9a81c. ± Comparison against base commit b629795.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Integration Test Report

    420 files  +1      420 suites  +1   17m 7s ⏱️ - 10m 47s
114 984 tests +2  114 984 ✅ +2  0 💤 ±0  0 ❌ ±0 
115 226 runs  +2  115 226 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit ff9a81c. ± Comparison against base commit b629795.

This pull request removes 3 and adds 5 tests. Note that renamed tests count towards both.
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000009b935f0@2a8f10c8
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000009b93838@61a537ae
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [3] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x0000000009b93a80@376e7549
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [1] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x000000007cc5eaf0@2bbec358
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [2] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x000000007cc5ed38@58f5f9ca
com.hedera.pbj.integration.test.ParserNeverWrapsTest ‑ [3] com.hedera.pbj.integration.test.ParserNeverWrapsTest$$Lambda/0x000000007cc5ef80@23e54450
com.hedera.pbj.integration.test.VectorVarIntTest ‑ [1] true
com.hedera.pbj.integration.test.VectorVarIntTest ‑ [2] false

♻️ This comment has been updated with latest results.

Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Signed-off-by: Anthony Petrov <anthony@swirldslabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vectorized varint algo

2 participants