fix(search): restore live settings on per-entity promote path by harshach · Pull Request #27920 · open-metadata/OpenMetadata

harshach · 2026-05-05T21:20:07Z

Summary

Reindex promotion succeeds but the swapped-in index keeps the bulk-build settings (refresh_interval=-1, replicas=0, translog=async), so live writes after promotion are buffered indefinitely and never become searchable until a manual _refresh. This surfaces in production as "indexing is fast, but searches/UIs return nothing on freshly created entities."

Two coupled bugs in the per-entity distributed promote path:

DefaultRecreateHandler.promoteEntityIndex was missing the live-settings restore. finalizeReindex calls applyLiveServingSettings + maybeForceMerge before the alias swap; the per-entity sibling jumped straight from doc-count check to alias swap. Mirror the call sequence.
DistributedSearchIndexExecutor constructed a second RecreateIndexHandler without wiring jobData. With jobData=null, applyLiveServingSettings resolves no overrides, buildRevertJson returns null, and the call silently no-ops — so even if the previous fix were in place, the per-entity path would still miss settings restoration. Pass currentJob.getJobConfiguration() into the handler at construction, mirroring what DistributedIndexingStrategy already does.

Provenance

These two source changes are cherry-picks of b272de85f9 and 68dad610ed, which currently sit unmerged on the context_center feature branch (PR #27558) — a Knowledge → Context Center rename PR that is unrelated to search indexing. Splitting them out so they can ship independently.

A regression test (testPromoteEntityIndexAppliesLiveServingSettingsBeforeSwap) is added to lock the behavior in: it builds a handler with bulk overrides on EventPublisherJob, calls promoteEntityIndex, and asserts searchClient.updateIndexSettings is invoked with refresh_interval=1s, number_of_replicas=1, translog.durability=request before the alias swap. Verified locally: the test passes with both source commits applied; reverting either one fails the assertion with Wanted but not invoked.

Test plan

mvn test -pl openmetadata-service -Dtest='DefaultRecreateHandlerTest,DistributedSearchIndexExecutorTest,DistributedIndexingStrategyTest,SearchIndexExecutorControlFlowTest' → 136 tests, 0 failures
Sanity-check the regression test fails when applyLiveServingSettings/maybeForceMerge are removed from promoteEntityIndex (Mockito reports Wanted but not invoked)
Manual: trigger a Search Indexing app run with bulk overrides configured (replicas=0, refresh=-1, translog=async) and confirm post-promotion _settings show refresh_interval=1s, not -1

🤖 Generated with Claude Code

Summary by Gitar

Search index cleanup:
- Modified DefaultRecreateHandler to prevent unnecessary delete-by-alias-name operations on canonical indices.
- Added guard logic to skip deletion when the index is an active alias, avoiding 30+ seconds of exponential backoff per entity.
Testing:
- Added unit test testFinalizeReindexSkipsDeleteWhenCanonicalIsAlias to verify the cleanup guard and ensure correct alias migration to the staged index.

_{This will update automatically on new commits.}

DefaultRecreateHandler exposes two finalization paths: - finalizeReindex(...) — centralized end-of-job promotion. Calls applyLiveServingSettings + maybeForceMerge before the alias swap, reverting the bulk overrides (refresh_interval=-1, replicas=0, async translog) back to live values (refresh=1s, replicas=1, durable translog). - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed search-indexer's "promote as soon as all partitions for an entity complete" callback (DistributedSearchIndexExecutor.promoteEntityIndex). Swaps the alias and cleans up old indices — but never restored live settings. When an entity finishes its partitions before the final reconciliation (typically the smallest entities — e.g. knowledge `page` with ~11 rows), its index is promoted via the per-entity path, the alias swap succeeds, and the bulk-build overrides become the new live settings. refresh_interval stays at -1 in production, so live writes after the reindex are buffered in the translog and never reach searchable segments until a manual _refresh. Externally this surfaces as "create an article, hierarchy is empty until I re-trigger reindex" — exactly the user-reported bug. Mirror the finalizeReindex sequence by calling applyLiveServingSettings (and maybeForceMerge for parity) at the top of the promote block in promoteEntityIndex, before the alias swap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DefaultRecreateHandler.applyLiveServingSettings reads from the handler's jobData field (live + bulk index-settings overrides on the EventPublisherJob). The per-entity distributed-promotion path in DistributedSearchIndexExecutor created its own DefaultRecreateHandler instance and never called withJobData(jobData) on it. With jobData=null, buildRevertJson returns null and applyLiveServingSettings silently no-ops — meaning the previous fix (b272de8) never actually re-applied live settings on the per-entity promote path, even though the call was reached. currentJob.getJobConfiguration() is the EventPublisherJob the strategy created. Wire it into the new handler at construction time, mirroring the withJobData call DistributedIndexingStrategy already makes on the strategy's own handler instance. With this change, the per-entity promote path now logs "Applying live serving settings to staged index '...' for entity 'page': {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}" before the alias swap, and post-promotion `_settings` show refresh_interval=1s instead of the stuck -1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Asserts that DefaultRecreateHandler.promoteEntityIndex calls searchClient.updateIndexSettings with the live-revert JSON (refresh_interval=1s, replicas=1, translog.durability=request) before swapping the alias, given a handler with bulk overrides wired through withJobData. Without the two preceding fixes the assertion fails with "Wanted but not invoked" — applyLiveServingSettings was never reached on the per-entity promotion path, so the staged index inherited refresh_interval=-1 and post-promotion live writes never became searchable until a manual _refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Fixes the distributed per-entity search-index promotion path so a promoted staged index is returned to live-serving settings before it starts serving reads/writes. This sits in the search reindexing flow and aligns the per-entity distributed promote path with the existing full finalizeReindex behavior.

Changes:

Restores live-serving index settings and optional force-merge in promoteEntityIndex before alias promotion.
Passes distributed job configuration into the recreate handler so per-entity promotion can resolve bulk/live index settings.
Adds a regression test around DefaultRecreateHandler.promoteEntityIndex.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java`	Adds live-settings restore + force-merge to the per-entity promotion path.
`openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java`	Wires `currentJob` configuration into the recreate handler used by distributed per-entity promotion.
`openmetadata-service/src/test/java/org/openmetadata/service/search/DefaultRecreateHandlerTest.java`	Adds a regression test for live-settings restoration during per-entity promotion.

DefaultRecreateHandlerTest.PromoteEntityIndexTests: - testPromoteEntityIndexAppliesSettingsBeforeAliasSwap: InOrder verification that updateIndexSettings runs BEFORE swapAliases. Catches a swap-then-revert misordering (which would briefly serve live reads against refresh=-1 settings). - testPromoteEntityIndexForceMergesWhenConfigured: forceMerge(staged, 1) is invoked when bulkIndexSettings.forceMergeOnPromote=true. Catches a regression where the force-merge call gets dropped without anyone noticing. - testPromoteEntityIndexSkipsSettingsWithoutJobData: locks in the safe no-op behavior when a handler is constructed without withJobData. Documents that no-jobData → no settings call (vs. crash or silent revert to defaults). DistributedSearchIndexExecutorTest: - initializeEntityTrackerWiresJobDataIntoDefaultRecreateHandler: triggers the private initializeEntityTracker with currentJob holding a populated jobConfiguration and verifies recreateHandler.withJobData(jobData) is called on the per-entity handler. This catches the second half of the original regression: even if applyLiveServingSettings is reached on promoteEntityIndex, jobData=null makes it a silent no-op. Future edits that drop the wiring or move handler construction elsewhere will fail here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Triggers SearchIndexingApplication with bulkIndexSettings configured (refresh_interval=-1, number_of_replicas=0, translog.durability=async), waits for the run to terminate, then queries _settings on the promoted table_search_index alias against the real OpenSearch/Elasticsearch container (via TestSuiteBootstrap.createSearchClient()). Asserts that each concrete index resolved by the alias has the live values applied (refresh=1s, replicas=1, translog.durability=request) and not the bulk overrides. This is the end-to-end counterpart to the unit-level regression test in DefaultRecreateHandlerTest. Catches the same class of bug at the layer where it actually surfaced in production: an alias swap that completed successfully according to logs but left the new live index unsearchable because refresh was disabled and writes were buffered indefinitely. Modeled on SearchIndexingFieldsParityIT for run-trigger / poll structure; adds the post-completion _settings verification step that no other IT performs today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dant test DefaultRecreateHandler: move applyLiveServingSettings + maybeForceMerge inside the try/finally that unregisters the staged index. Without this, a transient OS/ES failure during _settings update or _forcemerge propagated out before the finally ran, leaving searchRepository.unregisterStagedIndex permanently registered — so live writes kept routing to a staged index nothing reads from. Same fix applied to finalizeReindex for consistency (its window is shorter since it runs at end-of-job, but the leak shape is identical). Per gitar-bot review. DefaultRecreateHandlerTest: - testPromoteEntityIndexAppliesLiveServingSettingsBeforeSwap: replace independent verify() calls with InOrder so the test actually locks the "settings before alias swap" ordering its name and the PR description promise. A swap-then-revert refactor would have passed before this. Per Copilot review. - Drop testPromoteEntityIndexAppliesSettingsBeforeAliasSwap (the standalone InOrder test added in the previous commit) — folded back into the test above, which now covers both ordering and JSON content in one place. - Add testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure — regression test for the gitar-bot fix above. Verified to fail with "IllegalState connection reset" when the calls are moved back outside the try block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

The why-it's-in-a-try and why-jobData-is-wired blocks read like commit messages, not code annotations. Tests and commit history carry the rationale; the code itself reads fine without them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ic IT Six findings from copilot-pull-request-reviewer. 1+4. Track canonicalDeleted via positive evidence only (DefaultRecreateHandler). ES/OS indexExists/getAliases/listIndicesByPrefix all swallow transport errors and return false/empty. A "negative" probe result cannot be distinguished from "probe failed". The previous shape blindly trusted probe values and could misclassify a transient failure as data loss when canonical was actually still serving (gate-says-no-but-actually-yes case) or, conversely, as not-data-loss when canonical was actually gone. Now: a `canonicalDeleted` boolean defaults to false and only flips to true after both the delete call and a positive post-state check confirm the index is gone. dataLoss classification uses this flag — never claim data loss without positive evidence canonical was deleted. Added regression test `testPromoteEntityIndexAmbiguousGateProbeIsNotDataLoss` for the gate- ambiguity case. 2. Wire onJobFailed in DistributedIndexingStrategy. Previously only SearchIndexExecutor emitted the listener callback for FAILED. The distributed strategy returned the status but never invoked listeners.onJobFailed, so jobData.failure stayed empty and the AppRunRecord/WebSocket update had no failure context. Now mirrors the single-server behavior with an IllegalStateException naming the data-loss entities. 3. IT: assertEquals("request", durability) instead of assertNotEquals ("async"). The non-equals assertion would pass if a silent translog- revert drop left durability at any non-async cluster default, missing the regression. Pin the exact configured value. 5. IT: assert exactly one concrete index resolves the alias, not at-least-one. A broken swap that leaves the alias attached to BOTH the pre-existing live index AND the new *_rebuild_* index would satisfy "any rebuild present" but duplicate search results in production. Use assertEquals(1, settingsByIndex.size()). 6. IT hermeticity: snapshot/restore SearchIndexingApplication's appConfiguration. /v1/apps/trigger/{name} merges payload into the persisted config, so without restore later tests in the suite inherit this test's bulkIndexSettings / liveIndexSettings / useDistributedIndexing values — making suite ordering change what they exercise. Both IT methods now do a try/finally with snapshotAppConfig() + restoreAppConfig(). All 151 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

restoreAppConfig POSTs to /v1/apps/trigger/{name}, which both merges the body into the persisted config AND starts a new run. Returning without waiting left SearchIndexingApplication running into the next test class (AppsResourceIT.test_triggerApp_200), which then timed out for 2 minutes on "already running". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The terminal-state check compared status against uppercase "SUCCESS", "FAILED", "COMPLETED", but appRunRecord.json defines the status enum in lowercase ("success", "failed", "completed", ...). The check never matched and the 30s wait silently fell through to the catch block, making it a no-op. test_triggerApp_200 then relied on its 2-minute "already running" trigger retry, which timed out whenever a longer reindex (e.g. SearchIndexingFieldsParityIT's "all entities" reindex) was still in flight. Switch the terminal check to "not running and not started" case-insensitively, and raise the ceiling to 5 minutes so the wait actually covers a long in-flight reindex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

    if (canonicalIndex == null || stagedIndex == null) {
      LOG.error(
-          "Cannot finalize reindex for entity '{}'. canonicalIndex={}, stagedIndex={}",
+          "Cannot promote index for entity '{}'. canonicalIndex={}, stagedIndex={}",
          entityType,
          canonicalIndex,


…-promote # Conflicts: # openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java # openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java

…ata/OpenMetadata into harshach/search-alias-promote

The test triggers SearchIndexingApplication and waits for it to complete, but during the parallel-tests slot other classes can also trigger the same app concurrently. The resulting in-flight run then leaks into AppsResourceIT.test_triggerApp_200 (which runs in the sequential slot) and exhausts its 2-minute "already running" trigger Awaitility. AppsResourceIT is already in the sequential bucket for the same reason. Mirror it for SearchIndexAliasPromotionIT across all seven failsafe profiles (mysql/postgres × elasticsearch/opensearch and the RDF profile) — include in sequential-tests, exclude from parallel-tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

DefaultRecreateHandler.promote: when canonicalIndex is null but stagedIndex is non-null, the early-return previously skipped the finally clause that unregisters the staged index. Live writes could stay routed to the staged index after we bail. Release the registration explicitly before returning. Extend the existing testPromoteWithNullCanonicalIndex unit test to assert the unregister call. SearchIndexAliasPromotionIT.snapshotAppConfig: distinguish "snapshot failed" from "config absent". The previous Map.of() return on exception caused restoreAppConfig to POST an empty body to /v1/apps/trigger/{name}, which is a no-op merge that silently leaks this test's bulk/live setting overrides into downstream tests and starts a spurious app run. Return null on failure so the caller short-circuits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The IT triggered the bundled SearchIndexingApplication, which made it expensive (full reindex per run, ~1-2 minutes) and a source of bleed-through into AppsResourceIT.test_triggerApp_200 even after moving to the sequential failsafe bucket. The same surface is already covered by unit tests: - DefaultRecreateHandlerTest "Should restore live serving settings on staged index before alias swap" (and the per-step swap-order, data-loss-flagging, post-state-check tests) - DistributedSearchIndexExecutorTest verifies the withJobData(jobConfig) wiring on the per-entity promotion handler Drop the IT and revert its pom.xml include/exclude entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

DefaultRecreateHandler.promote: validate aliasesToAttach BEFORE applyLiveServingSettings/maybeForceMerge so the empty-aliases error path skips wasted I/O and segment churn on a staged index that will never be swapped in. Extend testPromoteWithEmptyContextAliases to verify updateIndexSettings and forceMerge are not invoked. Adjust testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure to populate canonicalAliases so it still reaches applyLiveServingSettings. Refresh checkAliasAttached Javadoc: with positive-evidence canonicalDeleted tracking, a null result is classified as data loss only when canonicalDeleted=true; otherwise it is degraded (retryable). The previous wording claimed every null was data loss, which no longer matches the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

On every reindex after the first, the canonical name (e.g. openmetadata_table_search_index) is an alias on the previous staged index — not a concrete index. The three-step swap then attempted deleteIndexWithBackoff(canonicalAlias), which OS/ES rejects with illegal_argument_exception ("matches an alias, specify the corresponding concrete indices instead"). The 6-attempt exponential backoff burned ~31s per entity. With 30+ default entity types the SearchIndexingApplication ran past CI's 300s setup budget, blocking every Playwright shard and python-integration setup at "Setup OpenMetadata Test Environment". Two-pronged fix in DefaultRecreateHandler.promote: 1. Detect canonicalIsAlias via getIndicesByAlias(canonicalIndex). When true, take the new promoteByAtomicAliasSwap path: a single swapAliases call atomically moves every alias (parents + canonical) from old → new staged. No name collision, no per-entity delete-by-alias-name, no degraded/data-loss windows. 2. listIndicesByPrefix returns the canonical alias name itself among its results (alongside concrete *_rebuild_* indices). Filter that out of oldIndicesToCleanup when canonicalIsAlias, so the cleanup loop's deleteIndexWithBackoff doesn't replay the same 31s backoff. Keep canonical in cleanup when it is concrete — that's where the first-reindex flow drops the original concrete after the three-step swap moves aliases off it. Local repro: SearchIndexingApplication now completes in ~7s instead of hanging past 300s. New unit test testPromoteEntityIndexAtomicSwapWhenCanonicalIsAlias locks the new shape (single swap, no delete-by-alias, old concrete cleaned up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local repro on the CI-equivalent stack (mysql + elasticsearch + sample data + ingestion) completes in ~30s with all 60 entities going through the atomic-swap path. CI on the same commit still hangs past 300s. Add a structured log line at the top of every promote() so the next CI run shows which entities reach promotion and what shape (atomic vs three-step) was selected — pinpoints whether a specific entity gets stuck or if reindex never reaches promote(). No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Getter

… main The original "live settings not restored on staged after promote" regression is fixed by PR #27930 (commit e56abb8), which is already on main: applyLiveServingSettings + maybeForceMerge run before alias swap, and DistributedSearchIndexExecutor wires job configuration into the per-entity DefaultRecreateHandler. That minimal fix is sufficient for the original symptom (newly created entities not searchable until manual _refresh). This commit reverts every file we refactored further on top of that minimal fix back to origin/main: - DefaultRecreateHandler.java — drop deferred-canonical-delete three-step swap, post-state checks, dataLoss tracking, safeIndexExists/checkIndexExists/ checkAliasAttached helpers, promoteByAtomicAliasSwap path, ALIAS_PROMOTE_BEGIN diagnostic - RecreateIndexHandler.java — drop getFailedPromotions / getDataLossPromotions defaults - DistributedIndexingStrategy.java — drop FAILED-status emit and dataLoss aggregation - SearchIndexExecutor.java — drop FAILED-status emit - DistributedSearchIndexExecutor.java — drop @Getter on recreateIndexHandler - The matching unit tests in DefaultRecreateHandlerTest / DistributedIndexingStrategyTest / SearchIndexExecutorControlFlowTest / DistributedSearchIndexExecutorTest all revert to origin/main. The branch's only remaining contribution is the AppsResourceIT case-mismatch fix in waitForAppJobCompletion (a pre-existing bug discovered while diagnosing). Reason: the further refactor has consistently failed CI on this branch since the first commit. Local repro on the CI-equivalent stack (./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true) showed the canonical-is-alias fix working end-to-end (~30s, 60 entities), but CI still timed out at 300s. Without server-side logs from CI we can't target the remaining gap. PR #27930 already addresses the user-visible regression, so the safest move is to drop the further refactor and ship just the case-mismatch fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-promote

After the first reindex, the canonical name (e.g. table_search_index) is an alias on the previous staged, not a concrete index. OpenSearch's listIndicesByPrefix returns the alias name as one of its result keys, which then drives a deleteIndexWithBackoff(canonicalIndex) attempt that fails with "[illegal_argument_exception] matches an alias, specify the corresponding concrete indices instead". The 6-attempt exponential backoff burns ~31s per entity (1+2+4+8+16s); on a 60-entity reindex that wastes ~30 minutes in cleanup with search degraded throughout. Drop the alias name from oldIndicesToDelete when getIndicesByAlias proves canonical is an alias right now. The atomic swapAliases call moves the canonical alias from the old concrete to the new staged in one step; the underlying old concrete is already in oldIndicesToDelete and gets cleaned up normally by the post-swap loop. No three-step swap or deferred-canonical-delete restructure needed. Adjusts testFinalizeReindexPromotesPartialData to use a realistic canonical-concrete setup (no self-alias on its own name — that state cannot exist in real OS/ES) so the new guard does not misfire on the existing test fixture. New unit test testFinalizeReindexSkipsDeleteWhenCanonicalIsAlias locks the new behavior: when canonical is an alias, deleteIndexWithBackoff is never called with the canonical name, and the old concrete rebuild is cleaned up via the swap path. Verified locally on the CI-equivalent stack (./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true): both first reindex (canonical-concrete) and second reindex (canonical-alias) complete in ~8s with no "matches an alias" errors and clean cleanup logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-06T19:52:38Z

Code Review ✅ Approved 4 resolved / 4 findings

Restores live settings on the per-entity promote path by correctly applying index settings and wiring job configurations, ensuring indices become searchable post-promotion. Addresses several issues including unhandled exceptions during promotion, missing cleanup of REST clients in tests, and incorrect alias swap logic.

✅ 4 resolved

✅ Edge Case: applyLiveServingSettings/maybeForceMerge exception skips unregisterStagedIndex

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:330-331 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:334 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:408-409
Lines 330-331 in promoteEntityIndex call applyLiveServingSettings and maybeForceMerge outside the try/catch/finally block (lines 334-410). If either throws (e.g., Elasticsearch connectivity issue during updateIndexSettings), the finally at line 408 that calls searchRepository.unregisterStagedIndex(entityType, stagedIndex) is never reached. This leaves the staged index permanently registered, potentially routing live writes to an un-promoted index that nothing reads from.

This mirrors the same pattern in finalizeReindex (lines 139-140 are also outside its try block), so it's a pre-existing design choice being replicated here. However, since the per-entity path runs mid-job (not at terminal cleanup), the window for a transient ES failure is wider.

✅ Quality: Rest5Client created but never closed in integration test

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchIndexAliasPromotionIT.java:250
In readIndexSettings, TestSuiteBootstrap.createSearchClient() returns a Rest5Client that is never closed. Over repeated test runs or if the test is retried, this leaks HTTP connections to Elasticsearch. Use try-with-resources or close in a finally block.

✅ Edge Case: Post-state checks can throw unhandled, leaving promotion in limbo

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:344 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:371
The new post-state verification calls at lines 344 and 371 (searchClient.indexExists(canonicalIndex) and searchClient.getAliases(stagedIndex)) are outside any try-catch block. If these calls throw (network timeout, transport error), the exception propagates unhandled from promoteWithDeferredCanonicalDelete.

The getAliases call at line 371 is especially dangerous: at that point the canonical index has already been deleted. An unhandled exception means:

The canonical index is gone (data loss state)

markPromotionFailed is never called, so the failure isn't recorded

The alias may or may not have been attached

Similarly, if indexExists at line 344 throws after deleteIndexWithBackoff succeeded, we never proceed to add the canonical alias.

✅ Bug: promoteByAtomicAliasSwap hardcodes reindexSuccess=true

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:338-346 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:476 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:496
When canonicalIsAlias is true, promoteWithDeferredCanonicalDelete delegates to promoteByAtomicAliasSwap (line 338-345) without forwarding the reindexSuccess parameter. promoteByAtomicAliasSwap then hardcodes true at line 496 when calling recordPromotionSuccessAndCleanupOldIndices. This means partial-data promotions (reindexSuccess=false with docCount>0) on re-reindexed entities will be logged as reindexSuccess: true, producing misleading operational logs. The functional impact is low since reindexSuccess is only used in the log message, but it makes debugging harder when investigating partial-data scenarios.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-06T20:54:28Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-06T22:14:30Z

🟡 Playwright Results — all passed (10 flaky)

✅ 4002 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 86 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	298	1	4
🟡 Shard 2	747	5	8
🟡 Shard 3	758	3	7
✅ Shard 4	775	0	18
✅ Shard 5	687	0	41
🟡 Shard 6	737	1	8

🟡 10 flaky test(s) (passed on retry)

Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/DataProductRename.spec.ts › should handle multiple consecutive renames and preserve assets (shard 2, 1 retry)
Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Add test case to existing Bundle Suite (shard 2, 1 retry)
Features/EntitySummaryPanel.spec.ts › should display summary panel for tableColumn (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

harshach and others added 3 commits May 5, 2026 13:59

Copilot AI review requested due to automatic review settings May 5, 2026 21:20

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 5, 2026

Copilot started reviewing on behalf of harshach May 5, 2026 21:20 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread ...penmetadata/service/apps/bundles/searchIndex/distributed/DistributedSearchIndexExecutor.java Outdated

Comment thread ...tadata-service/src/test/java/org/openmetadata/service/search/DefaultRecreateHandlerTest.java Outdated

gitar-bot Bot reviewed May 5, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

harshach had a problem deploying to test May 5, 2026 21:30 — with GitHub Actions Error

harshach and others added 2 commits May 5, 2026 14:37

gitar-bot Bot reviewed May 5, 2026

View reviewed changes

Comment thread ...a-integration-tests/src/test/java/org/openmetadata/it/tests/SearchIndexAliasPromotionIT.java Outdated

harshach had a problem deploying to test May 5, 2026 21:48 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings May 5, 2026 21:51

Copilot started reviewing on behalf of harshach May 5, 2026 21:52 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread ...a-integration-tests/src/test/java/org/openmetadata/it/tests/SearchIndexAliasPromotionIT.java Outdated

harshach had a problem deploying to test May 5, 2026 22:08 — with GitHub Actions Error

harshach had a problem deploying to test May 6, 2026 02:47 — with GitHub Actions Failure

Merge branch 'main' into harshach/search-alias-promote

0351078

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

harshach and others added 2 commits May 5, 2026 21:54

Copilot AI reviewed May 6, 2026

View reviewed changes

mohityadav766 and others added 4 commits May 6, 2026 12:28

Merge branch 'main' into harshach/search-alias-promote

22717f5

Merge branch 'harshach/search-alias-promote' of github.com:open-metad…

aee61f2

…ata/OpenMetadata into harshach/search-alias-promote

Copilot AI reviewed May 6, 2026

View reviewed changes

harshach and others added 2 commits May 6, 2026 06:51

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

harshach and others added 2 commits May 6, 2026 07:09

gitar-bot Bot reviewed May 6, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java Outdated

harshach and others added 4 commits May 6, 2026 10:59

Merge remote-tracking branch 'origin/main' into harshach/search-alias…

3e0d7ca

…-promote

Conversation

harshach commented May 5, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Provenance

Test plan

Summary by Gitar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 6, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented May 6, 2026

🟡 Playwright Results — all passed (10 flaky)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harshach commented May 5, 2026 •

edited by gitar-bot Bot

Loading

gitar-bot Bot commented May 6, 2026 •

edited

Loading