Skip to content

prune_previous: avoid pruning immediate versions to better support concurrency#3028

Draft
jamesblackburn wants to merge 2 commits into
masterfrom
fix-prune-previous
Draft

prune_previous: avoid pruning immediate versions to better support concurrency#3028
jamesblackburn wants to merge 2 commits into
masterfrom
fix-prune-previous

Conversation

@jamesblackburn
Copy link
Copy Markdown
Collaborator

@jamesblackburn jamesblackburn commented Apr 14, 2026

Fix prune_previous data-loss race; bound prune/delete via TOMBSTONE_ALL high-water mark

Problem

Concurrent writers appending to the same base version could lose data: writer 1's
prune_previous physically deleted the base version's data segments before writer 2
had finished referencing them.

Fix — two-phase mark/sweep

  • Mark (version_map::write_and_prune_previous): write the new head plus an
    individual TOMBSTONE for every live previous version in one journal entry. Pruned
    versions become present-but-invisible above the TOMBSTONE_ALL line, so their data
    stays in storage and remains findable. Returns the loaded entry for reuse.
  • Sweep (LocalVersionedEngine::delete_unreferenced_pruned_indexes): reuse that entry
    to collect the above-line tombstoned set, retain the anchor (just-superseded previous
    head) plus anything inside the PrunePreviousProtectionSecs window, physically delete the
    rest (with snapshot/dedup/append-inheritance protection), then advance the TOMBSTONE_ALL
    line past the removed block.

The bounded UNDELETED_ONLY load stops at the line, so prune/delete_all cost is
proportional to retained data, not total history — a constant number of storage ops per
call. Under delayed_deletes the physical delete is skipped but the line still advances so
the chain stays bounded.

Also

  • delete_all_versions reclaims prune-retained anchors via finalize_tombstone_all_result
    (filtered to individually-tombstoned keys), preventing a leak when a whole symbol is deleted.
  • get_version_at_time now finds snapshot-protected deleted versions when a live version also
    exists — prune leaves these behind more often now.

Behavioural change ⚠️

prune_previous no longer physically removes the immediately-previous version straight away.
It is tombstoned (invisible to list_versions) but its data is retained as an anchor —
reclaimed on a later prune once aged past VersionStore.PrunePreviousProtectionSecs
(default 600s; set to 0 to disable the window, anchor still applies). Steady-state on-disk
footprint is anchor + latest rather than just latest.

API

No signature changes to V1/V2 public APIs. Docstrings on all prune_previous_versions entry
points updated to describe deferred deletion. New config: VersionStore.PrunePreviousProtectionSecs.

Tests

New test_prune_previous.py (window retention, anchor retention, burst reclaim,
append/dedup dangling-reference safety, delete-all reclaim + snapshot safety); updated C++
VersionMap/rapidcheck expectations; storage-op-count test now asserts constant per-prune
ops instead of a hardcoded number.

@jamesblackburn jamesblackburn added bug Something isn't working minor Feature change, should increase minor version labels Apr 14, 2026
@man-group man-group deleted a comment from github-actions Bot Apr 14, 2026
Comment thread cpp/arcticdb/version/version_map.hpp Outdated
Comment thread cpp/arcticdb/version/version_map.hpp Outdated
@jamesblackburn jamesblackburn changed the title Fix prune previous Fix prune_previous_version race condition causing data loss under concurrent writes Apr 14, 2026
@jamesblackburn jamesblackburn marked this pull request as draft April 14, 2026 08:11
@man-group man-group deleted a comment from claude Bot Apr 14, 2026
@jamesblackburn jamesblackburn force-pushed the fix-prune-previous branch 3 times, most recently from b613b4b to 50ab49d Compare April 14, 2026 20:08
@jamesblackburn
Copy link
Copy Markdown
Collaborator Author

Issue with as_of reads when a version is only held by snapshots: #3034

@jamesblackburn jamesblackburn force-pushed the fix-prune-previous branch 2 times, most recently from 7bb7dd6 to 52a32be Compare June 1, 2026 19:10
@jamesblackburn jamesblackburn changed the title Fix prune_previous_version race condition causing data loss under concurrent writes prune_previous: avoid pruning immediate versions to better support concurrency Jun 1, 2026
A write with prune_previous_version=True (or append/update/etc.) used to
physically delete the previous versions' data as soon as it tombstoned them.
A concurrent writer mid-append on one of those versions inherits its data
segments; if the prune deleted them first, the appender committed a version
referencing deleted data and became unreadable.

Decouple tombstoning (immediate, unconditional - pruned versions vanish from
readers right away, matching user intuition) from physical deletion (gated for
concurrent-writer safety). The mechanism is a single-write fold:

- Phase 1 (version_map::write_and_prune_previous): in one journal entry write
  the new TABLE_INDEX head, an individual TOMBSTONE for each retained previous
  version, and a single TOMBSTONE_ALL high-water mark. Retained = the anchor
  (latest undeleted previous version - the base an appender builds on, kept
  regardless of age) plus anything inside the PrunePreviousProtectionSecs
  window (default 600s). The TOMBSTONE_ALL is placed at floor-1 (floor = oldest
  retained), burying only versions strictly below it. The head stays the new
  TABLE_INDEX so the symbol ref's first key is an index key and the LATEST read
  fast-path is preserved. The anchor is computed from the loaded undeleted
  entry, not threaded from the caller, whose previous key can point at a
  tombstoned former head after a delete_version.
- Phase 2 (local_versioned_engine::delete_unreferenced_pruned_indexes): reuses
  the phase-1 entry (no reload), splits the tombstoned indexes by the persisted
  line, physically deletes the buried block (except snapshotted keys), and
  protects the retained block. No version-map write. Skipped under
  delayed_deletes.

The high-water line keeps prune/delete bounded: UNDELETED_ONLY loads stop at
the line, so each prune reads only the retained region plus the live head, never
total history. Aged-out retained versions are reclaimed by a later prune once
the floor rises past them.

The explicit admin prune_previous_versions() keeps master's aggressive behaviour
(tombstone everything below latest + delete immediately, no protection window) -
use the per-write flag when concurrent-writer safety is needed.

Docs (VERSIONING.md, runtime_config.md, library.py/_store.py docstrings) updated
to describe the two prune flavours and the single-write fold.
New tests:
- test_prune_previous.py: protection-window and anchor behaviour, data-remains-
  in-storage-during-window, delete-all reclaims prune-retained versions, and
  snapshot safety.
- test_concurrent_read_and_write.py::test_concurrent_read_write_eager_prune: a
  concurrent reader never sees NoDataFoundException while another process writes
  with prune_previous_version=True on the eager (non-delayed-delete) store. The
  read path does not retry on a missing key, so reader safety depends on the
  protection window; the test enables the 600s window before forking so the
  children inherit it (fails with window=0, passes with window=600).
- test_version_map.cpp: PrunePreviousTombstonesAllPreExisting and the folded
  single-entry chain structure; version_map_model.hpp tombstone model.

conftest.py adds the session-autouse _disable_prune_protection_window fixture
(PrunePreviousProtectionSecs=0) so the existing suite exercises the anchor-only
retention deterministically.

Updated key-count / visibility assertions across the version-store tests to
match the single-write fold: visible versions go back to 1 after a prune while
storage-key counts reflect the retained anchor; the admin prune_previous_versions
API prunes aggressively (latest only).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working minor Feature change, should increase minor version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant