Skip to content

[fix][broker] Wait for orphan schema ledger cleanup before retry#27

Open
Denovo1998 wants to merge 1 commit into
masterfrom
schema_orphan_ledger_cleanup
Open

[fix][broker] Wait for orphan schema ledger cleanup before retry#27
Denovo1998 wants to merge 1 commit into
masterfrom
schema_orphan_ledger_cleanup

Conversation

@Denovo1998

@Denovo1998 Denovo1998 commented Apr 25, 2026

Copy link
Copy Markdown
Owner

Fixes #xyz

Main Issue: apache#25514

PIP: #xyz

Motivation

PR apache#25514 added orphan schema ledger cleanup when concurrent schema creation loses the schema locator CAS race. However, the cleanup used callback-style asyncDeleteLedger inside whenComplete, so the schema storage retry could continue before the BookKeeper delete callback completed.

This makes the no-orphan-ledger assertion timing-sensitive and weakens the cleanup guarantee exposed by schema creation. This PR waits for the orphan ledger delete callback before preserving the original CAS exception and letting the existing retry flow continue.

Modifications

  • Added a deleteLedgerAsync helper in BookkeeperSchemaStorage to wrap bookKeeper.asyncDeleteLedger with a CompletableFuture.
  • Updated the initial schema creation CAS-failure path to wait for orphan ledger deletion before rethrowing the original AlreadyExistsException or BadVersionException.
  • Updated the schema locator update CAS-failure path with the same cleanup behavior, keeping initial creation and update paths consistent.
  • Kept deletion failure behavior compatible: delete failures are logged but do not fail schema creation.
  • Refactored schema concurrency test helpers and added coverage for concurrent compatible schema updates to verify failed CAS attempts do not leave extra schema ledgers.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Matching PR in forked repository

PR in forked repository: apache#25579

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant