Replace per-block Mmap with pread, ~300x apply speedup on Darwin by jverkoey · Pull Request #790 · drolbr/Overpass-API

jverkoey · 2026-04-19T03:27:30Z

Profiling update_from_dir with macOS sample(1) for 30 seconds on an Apple Silicon M-series showed 350 of 361 on-CPU samples (97%) inside the __mmap syscall. The hot path is:

File_Blocks::read_block_ -> Mmap::Mmap -> mmap

called once per compressed block read. On macOS each mmap syscall costs ~0.25 ms of kernel overhead (virtual range alloc, page-table setup, fault-in, teardown on munmap). Across thousands of block reads per minute-diff the syscall tax dominates wall time and prevents apply from keeping pace with the 1-diff-per-minute fetch rate. Linux mmap is cheaper so this is invisible on Linux.

Analysis

For Overpass's access pattern there's no benefit to a memory mapping: each compressed block is read once, decompressed into a separate buffer by Zlib/LZ4 Inflate, and never revisited. The Mmap class exists only to own the read buffer for decompression. Replacing the mmap/munmap pair with a pread into a heap buffer keeps Mmap::ptr() pointer-compatible with every caller and eliminates the syscall tax. Linux performance is unaffected — pread hits the same page cache that mmap would have.

Measured effect

Single-diff apply (diff 7076518, 6550 ops) against a live 291 GB database on Apple Silicon:

	wall time	CPU%	notes
before	9 min 01 s	100%	97% samples in `__mmap`
after	1.79 s	67%	now compute-bound

~300x speedup. The NO_COMPRESSION branch of File_Blocks::read_block_ already uses pread via data_file.read(); this brings the compressed path to parity.

Notes

Public interface (Mmap::ptr()) unchanged — no caller needs adjusting.
Exception behavior preserved: throws File_Error on I/O failure with the same arguments.
No Linux regression expected (pread and mmap both use the unified page cache), but would welcome Linux benchmarks from a reviewer with access.

Companion PRs #788 (off64_t alias) and #789 (sun_len fix) address other Darwin-specific issues hit while bringing osm-3s up on Apple Silicon natively.

Profiling update_from_dir with macOS sample(1) for 30 seconds showed 350 of 361 on-CPU samples (97%) inside the __mmap syscall. The hot path is: File_Blocks::read_block_ -> Mmap::Mmap -> mmap called once per compressed block read. On macOS each mmap syscall costs ~0.25 ms of kernel overhead (virtual range alloc, page-table setup, fault-in, teardown on munmap). Across thousands of block reads per minute-diff the syscall tax dominates wall time and prevents apply from keeping pace with the 1-diff-per-minute fetch rate. Linux mmap is cheaper so this is invisible on Linux. For Overpass's access pattern there is no benefit to a memory mapping: each compressed block is read once, decompressed into a separate buffer, and never revisited. Replacing the mmap with pread into a heap buffer keeps the Mmap::ptr() interface pointer-compatible with every caller (Zlib and LZ4 Inflate) while eliminating the syscall tax. Linux performance is unaffected -- pread hits the same page cache that mmap would have. Measured effect on a single-diff apply (7076518, 6550 ops) against a live 291 GB database on Apple Silicon (M-series): before: 9 min 01 s (100% CPU, 97% samples in __mmap) after: 1.79 s (67% CPU, now compute-bound) ~300x speedup. The NO_COMPRESSION branch of File_Blocks::read_block_ already uses pread via data_file.read(); this brings the compressed path to parity.

This was referenced Apr 19, 2026

Darwin: alias off64_t to off_t in NATIVE_LARGE_FILES block #788

Open

Darwin: fix sockaddr_un sun_len causing silent WRITE_START drops #789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace per-block Mmap with pread, ~300x apply speedup on Darwin#790

Replace per-block Mmap with pread, ~300x apply speedup on Darwin#790
jverkoey wants to merge 1 commit intodrolbr:masterfrom
ClutchEngineering:pr-mmap-pread

jverkoey commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jverkoey commented Apr 19, 2026

Analysis

Measured effect

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant