Fix O(N²) parsing of large decimal Natural literals by nikita-volkov · Pull Request #2732 · dhall-lang/dhall-haskell

nikita-volkov · 2026-04-26T00:07:55Z

I've encountered this issue upon attempts to load large resolved files, which led to Dhall getting stuck on the loading phase. I then fed the reproduction of the problem to an LLM. Following is what it has come up with. I have confirmed on my use case that the fix works.

The previous 'decimal' parser in 'naturalLiteral' (Token.hs) read digits one by one via 'many (satisfy digit)' and then converted with:

foldl' (\acc x -> acc * 10 + x) 0 digits

For an N-digit number this performs N big-integer multiplications, where the k-th multiplication costs O(k) (Karatsuba grows with operand size), so the total is O(1+2+…+N) = O(N²). For a 1.26 M-digit literal this caused 0.79 s of parse time per number; with 8+ such literals in a single file parsing alone took ~13 s out of the observed ~39 s total.

Fix

Replace the naive left-fold with a divide-and-conquer conversion:

Capture all digits in one shot using 'takeWhileP' (single O(N) scan).
Recursively split the digit string in half: convert the high half and the low half independently, then combine as hi_value * 10^lo_len + lo_value For strings ≤ 18 digits the plain left-fold is used (fits in 64-bit).

This reduces the work to O(M(N)·log N) where M(N) is the cost of one N-digit multiplication (O(N^1.585) via GMP Karatsuba), which is far better than O(N²).

Measurements (aarch64-osx, GHC 9.12.2):
Single 1.26 M-digit literal: 0.79 s → 0.17 s (4.6×)
All 120 large literals: 13.2 s → 2.0 s (6.6×)
Full resolved.dhall (type): ~39 s → 7.4 s (5.3×)

Regression test and benchmark

Added 'largeNaturalLiteralParsing' to Dhall.Test.Regression: parses a 100,000-digit literal and asserts completion within 10 seconds.
Added 'Large natural number literal (1M digits)' to the parser benchmark so future regressions are visible in benchmark runs.

The previous 'decimal' parser in 'naturalLiteral' (Token.hs) read digits one by one via 'many (satisfy digit)' and then converted with: foldl' (\acc x -> acc * 10 + x) 0 digits For an N-digit number this performs N big-integer multiplications, where the k-th multiplication costs O(k) (Karatsuba grows with operand size), so the total is O(1+2+…+N) = O(N²). For a 1.26 M-digit literal this caused 0.79 s of parse time per number; with 8+ such literals in a single file parsing alone took ~13 s out of the observed ~39 s total. Fix --- Replace the naive left-fold with a divide-and-conquer conversion: 1. Capture all digits in one shot using 'takeWhileP' (single O(N) scan). 2. Recursively split the digit string in half: convert the high half and the low half independently, then combine as hi_value * 10^lo_len + lo_value For strings ≤ 18 digits the plain left-fold is used (fits in 64-bit). This reduces the work to O(M(N)·log N) where M(N) is the cost of one N-digit multiplication (O(N^1.585) via GMP Karatsuba), which is far better than O(N²). Measurements (aarch64-osx, GHC 9.12.2): Single 1.26 M-digit literal: 0.79 s → 0.17 s (4.6×) All 120 large literals: 13.2 s → 2.0 s (6.6×) Full resolved.dhall (type): ~39 s → 7.4 s (5.3×) Regression test and benchmark ------------------------------ * Added 'largeNaturalLiteralParsing' to Dhall.Test.Regression: parses a 100,000-digit literal and asserts completion within 10 seconds. * Added 'Large natural number literal (1M digits)' to the parser benchmark so future regressions are visible in benchmark runs.

nikita-volkov force-pushed the fix-load-resolved branch from 75ce831 to e4dd12e Compare April 26, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix O(N²) parsing of large decimal Natural literals#2732

Fix O(N²) parsing of large decimal Natural literals#2732
nikita-volkov wants to merge 1 commit intodhall-lang:mainfrom
nikita-volkov:fix-load-resolved

nikita-volkov commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nikita-volkov commented Apr 26, 2026

Fix

Regression test and benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant