fix: Move version metadata migration to 2.0.0 and remove crashing MySQL multi-valued index by yan-3005 · Pull Request #27816 · open-metadata/OpenMetadata

yan-3005 · 2026-04-29T11:50:27Z

Problem

The 1.12.7 migration (e4d3e423e1) introduced two new columns on entity_extension — versionNum and changedFieldKeys — to support version timeline filtering. It populated those columns via a postDataMigrationSQLScript.sql and backed changedFieldKeys with a MySQL multi-valued index.

This caused an immediate crash in AUT:

Exceeded max total length of values per record
for multi-valued index 'idx_entity_extension_changed_field_keys' by 6611 bytes

Root cause — why the MySQL multi-valued index blows up

MySQL's multi-valued index on a JSON array reserves the full declared CHAR(N) storage per element, per row in the index. The index was declared as CHAR(512) with utf8mb4, which reserves 2048 bytes per element. MySQL imposes a hard per-row limit of ~8 KB for multi-valued indexes. That means any entity version with more than ~4 changed field names in a single version would exceed the limit and crash the ALTER TABLE or UPDATE.

entity_extension in production routinely has versions with dozens of changed fields (e.g. a bulk tag update can produce 50–300 changedFieldKeys entries in one version row). The index design made the migration structurally impossible to run on real data.

The Postgres GIN index on the same column does not have this problem — GIN stores posting lists per element without a per-row byte cap — so it is kept unchanged.

Why not partial indexing (index only the first N elements)?

MySQL's multi-valued index indexes all elements in the array or none. There is no way to index a subset. The only options were:

Sanitize (truncate) the data — rejected, we must not drop user data
Use a smaller CHAR(N) — still fails for wide entities; just moves the threshold
Remove the index entirely — chosen

Why is the index safe to remove?

Every query that uses changedFieldKeys always starts with WHERE e.id = :id, scoping results to a single entity (typically 100–500 version rows). JSON_CONTAINS on 100–500 rows is negligible — the index provided no meaningful speedup. A legacy code path (getExtensionsWithFieldChangedLegacy) already works without the index and is already live in production.

What changed

1. Removed the 1.12.7 migration entirely

All four SQL files created by the original PR are deleted. They did not exist before that commit, so reverting them leaves 1.12.7 clean.

bootstrap/sql/migrations/native/1.12.7/mysql/schemaChanges.sql          ← deleted
bootstrap/sql/migrations/native/1.12.7/mysql/postDataMigrationSQLScript.sql ← deleted
bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql       ← deleted
bootstrap/sql/migrations/native/1.12.7/postgres/postDataMigrationSQLScript.sql ← deleted

2. Moved the DDL to 2.0.0 schemaChanges.sql

MySQL — appended to 2.0.0/mysql/schemaChanges.sql:

ALTER TABLE entity_extension
  ADD COLUMN versionNum DOUBLE NULL,
  ADD COLUMN changedFieldKeys JSON NULL;

CREATE INDEX idx_entity_extension_version_order
  ON entity_extension (id, versionNum);

No multi-valued index. The BTree index on (id, versionNum) is kept for efficient DESC ordering of versions per entity.

Postgres — appended to 2.0.0/postgres/schemaChanges.sql:

ALTER TABLE entity_extension
  ADD COLUMN IF NOT EXISTS versionNum DOUBLE PRECISION,
  ADD COLUMN IF NOT EXISTS changedFieldKeys JSONB;

CREATE INDEX IF NOT EXISTS idx_entity_extension_version_order
  ON entity_extension (id, versionNum DESC) WHERE versionNum IS NOT NULL;

CREATE INDEX IF NOT EXISTS idx_entity_extension_changed_field_keys
  ON entity_extension USING GIN (changedFieldKeys) WHERE changedFieldKeys IS NOT NULL;

GIN index retained — it works correctly on Postgres with no per-row size limit.

3. Replaced the SQL data migration with a batched Java migration

The original postDataMigrationSQLScript.sql was a single-shot CTE that ran as one SQL statement over the entire entity_extension table. On large instances this would hold a lock for minutes and risk a timeout.

The data migration is now a batched Java method in v200/MigrationUtil.backfillVersionMetadata().

Cursor-based pagination

The SELECT uses cursor-based pagination on the PRIMARY KEY instead of WHERE versionNum IS NULL:

SELECT id, extension, json FROM entity_extension
WHERE (id, extension) > (:lastId, :lastExt)
AND extension LIKE '%.version.%'
AND versionNum IS NULL
ORDER BY id, extension
LIMIT 5000

Why this matters on large tables:

The naive WHERE versionNum IS NULL LIMIT N approach forces a full table scan on every batch. Verified with EXPLAIN on a 2.1M-row table:

Approach	EXPLAIN type	key	rows per batch
`WHERE versionNum IS NULL LIMIT 1000`	`ALL`	none	2,187,980
Cursor `(id, extension) > (x, y)`	`index`	`PRIMARY`	5,000

With the cursor, the PRIMARY KEY index drives the scan forward from the last position — each batch reads exactly LIMIT rows. The LIKE and IS NULL filters are applied as cheap post-filters within the cursor's range. This also makes re-runs safe: versionNum IS NULL skips already-processed rows so the migration is idempotent.

PreparedBatch for updates

Instead of issuing one execute() per row (N round trips per batch), all updates in a batch are sent as a single PreparedBatch.execute() call — one round trip regardless of batch size.

Fallback on batch failure

If PreparedBatch.execute() fails (e.g. one row has a malformed JSON column), the code falls back to processing each row individually:

First attempt: full update (versionNum + changedFieldKeys)
If that fails: versionNum-only update (guarantees the row won't be retried)
If even that fails: log a warning and skip the row — one bad row cannot abort the entire backfill

This preserves changedFieldKeys for all healthy rows in the batch even when one row is problematic. The old per-row approach had the same two-tier fallback; the new approach restores that per-row safety net while still getting the throughput benefit of PreparedBatch on the happy path.

Performance summary (2.1M-row table)

	Before (naive IS NULL)	After (cursor + PreparedBatch)
Batch size	1,000	5,000
Batches total	2,147	~430
Rows scanned per SELECT	2,187,980	5,000
UPDATE round trips per batch	1,000	1
Observed time (2.1M rows)	~2 hours	~5–15 min

4. Java vs SQL — behavioral equivalence

Aspect	SQL (original)	Java (this PR)
`versionNum` source	`CAST(SUBSTRING_INDEX(ext, '.version.', -1) AS DOUBLE)`	`lastIndexOf(".version.")` + `Double.parseDouble` — identical semantics
Field name dedup	`SELECT DISTINCT`	`LinkedHashSet` — same
NULL / empty filter	`WHERE field_name IS NOT NULL AND field_name <> ''`	`nameNode.isNull()` + `isEmpty()` — same
Array order	`JSON_ARRAYAGG` (arbitrary)	alphabetically sorted — stricter, functionally equivalent for `JSON_CONTAINS`
Batching	single-shot	5,000 rows/batch — safer for large tables

5. Tests

All backfill unit tests moved into v200/MigrationUtilTest (41 tests total)
Tests cover: MySQL/Postgres SQL selection, null JSON handling, multi-row batch (verifies .add() called per row), batch-level fallback triggering per-row retry, per-row full-update failure falling back to versionNum-only
MigrationSqlStatementHashTest updated to validate 2.0.0/mysql/schemaChanges.sql (unique statement hashes, no INFORMATION_SCHEMA references)

Pros / Cons

Pros

Migration no longer crashes on MySQL — the multi-valued index was the sole cause of the AUT failure
Cursor pagination eliminates O(N²) full table scans; verified via EXPLAIN
PreparedBatch reduces UPDATE round trips from N to 1 per batch
Per-row fallback in the batch failure path ensures changedFieldKeys is preserved for all healthy rows even if one row is bad
Postgres behaviour is unchanged (GIN index kept)
Fully covered by unit tests

Cons

Any environment that already ran the 1.12.7 migration (and survived — unlikely given the crash) would skip the 2.0.0 backfill. In practice the 1.12.7 migration crashed before completing on MySQL, so no MySQL environment has these columns populated. Postgres environments that ran 1.12.7 successfully already have the columns; the 2.0.0 Java migration is idempotent (AND versionNum IS NULL) so it is safe to re-run
The changedFieldKeys array order differs from the original SQL (JSON_ARRAYAGG is unordered; Java sorts alphabetically). This has no functional impact since all consumers use JSON_CONTAINS which is order-agnostic
MySQL ADD COLUMN has no IF NOT EXISTS guard (MySQL 8.0 limitation). If a MySQL environment somehow survived 1.12.7 with the columns already present, the 2.0.0 DDL would fail with "Duplicate column name". In practice this cannot happen since 1.12.7 crashed before the columns were created on MySQL

Test plan

Run full migration against MySQL with a pre-2.0.0 dataset — confirm no crash, all entity_extension rows get versionNum and changedFieldKeys populated
Run full migration against Postgres — confirm same
Verify version timeline UI works correctly after migration (version ordering, field-change filtering)
Confirm MigrationSqlStatementHashTest passes in CI
Confirm v200/MigrationUtilTest (41 tests) passes in CI

…crashing MySQL multi-valued index

github-actions · 2026-04-29T11:53:02Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot

Pull request overview

This PR fixes a MySQL migration crash caused by a multi-valued JSON index on entity_extension.changedFieldKeys by removing the problematic index, moving the schema change to the 2.0.0 migration, and replacing the previous single-shot SQL backfill with a batched Java backfill in the v200 migration utilities.

Changes:

Remove the 1.12.7 MySQL/Postgres migration SQL files that added/backfilled version metadata (including the crashing MySQL multi-valued index).
Add versionNum / changedFieldKeys columns and safe indexes to 2.0.0 schema changes (keep Postgres GIN; remove MySQL multi-valued index).
Implement and invoke a batched Java backfill (backfillVersionMetadata) during v200 data migrations; move/extend unit tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java	Adds batched Java backfill logic plus helpers to compute `versionNum` and `changedFieldKeys`.
openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v200/Migration.java	Invokes the new backfill as part of v200 MySQL data migration (non-blocking via try/catch).
openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v200/Migration.java	Invokes the new backfill as part of v200 Postgres data migration (non-blocking via try/catch).
openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java	Adds unit tests for extraction helpers + backfill behavior and SQL selection.
openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/MigrationSqlStatementHashTest.java	Updates migration SQL hash assertions to validate the `2.0.0/mysql/schemaChanges.sql` file.
bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql	Adds `versionNum`/`changedFieldKeys` columns and `(id, versionNum)` index (no multi-valued JSON index).
bootstrap/sql/migrations/native/2.0.0/postgres/schemaChanges.sql	Adds `versionNum`/`changedFieldKeys` columns and indexes (incl. GIN on `changedFieldKeys`).
bootstrap/sql/migrations/native/1.12.7/mysql/schemaChanges.sql	Deleted (removes the crashing MySQL multi-valued index migration).
bootstrap/sql/migrations/native/1.12.7/mysql/postDataMigrationSQLScript.sql	Deleted (removes single-shot SQL backfill).
bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql	Deleted (DDL moved to 2.0.0).
bootstrap/sql/migrations/native/1.12.7/postgres/postDataMigrationSQLScript.sql	Deleted (backfill moved to Java batching).

Comments suppressed due to low confidence (1)

openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java:54

There are two identical imports for org.mockito.ArgumentCaptor in this file. Please remove the duplicate to keep imports clean and avoid potential formatter/checkstyle failures.

import org.mockito.ArgumentCaptor;
import org.mockito.MockedStatic;
import org.openmetadata.service.resources.databases.DatasourceConfig;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.ArgumentCaptor;
import org.openmetadata.schema.entity.activity.ActivityEvent;

…uplicate import

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-29T14:10:29Z

+  void backfillIsNoOpWhenQueryReturnsEmptyBatch() {
+    Handle bHandle = mock(Handle.class, RETURNS_DEEP_STUBS);
+    when(bHandle.createQuery(any(String.class)).bind(anyString(), anyInt()).mapToMap().list())
+        .thenReturn(List.of());
+


The backfillVersionMetadata tests are stubbing createQuery(...).bind(anyString(), anyInt())..., but the implementation binds two string params (lastId, lastExt) and then an int (limit). As written, these stubs won’t match the actual call chain and list() can return Mockito’s default null, causing the migration to NPE when it does batch.size(). Update the stubbing to match the real bind overloads/arguments (or stub at the mapToMap().list() level without binding matchers).

Copilot · 2026-04-29T14:10:30Z

+      assertDoesNotThrow(() -> MigrationUtil.backfillVersionMetadata(bHandle));
+
+      ArgumentCaptor<String> sqlCaptor = ArgumentCaptor.forClass(String.class);
+      verify(bHandle).createUpdate(sqlCaptor.capture());
+      String sql = sqlCaptor.getValue();
+      assertTrue(sql.contains("changedFieldKeys = :changedFieldKeys"));
+      assertFalse(sql.contains("::jsonb"));


These new backfill tests assert/verify Handle.createUpdate(...) is used to run the update SQL, but the implementation uses handle.prepareBatch(updateSql) and executes a PreparedBatch. As a result the assertions around captured SQL (and the times(3) verification) won’t validate the actual behavior. Adjust the tests to mock/verify prepareBatch(...) and PreparedBatch.execute(), and only expect createUpdate(...) when explicitly testing the fallback path.

Copilot · 2026-04-29T14:10:30Z

+                      + "WHERE (id, extension) > (:lastId, :lastExt) "
+                      + "ORDER BY id, extension "
+                      + "LIMIT :limit")


backfillVersionMetadata currently selects batches from entity_extension without filtering (versionNum IS NULL / extension LIKE '%.version.%'). That means rerunning the migration (or running it on Postgres instances that already populated these columns) will scan the entire table and rewrite every version row, which can generate large write amplification/WAL and extend upgrade time unnecessarily. Consider adding WHERE versionNum IS NULL AND extension LIKE '%.version.%' (and keeping the keyset pagination) so the backfill is idempotent and only touches rows that still need population.

Suggested change

+ "WHERE (id, extension) > (:lastId, :lastExt) "

+ "ORDER BY id, extension "

+ "LIMIT :limit")

+ "WHERE versionNum IS NULL "

+ "AND extension LIKE :extensionPattern "

+ "AND (id, extension) > (:lastId, :lastExt) "

+ "ORDER BY id, extension "

+ "LIMIT :limit")

.bind("extensionPattern", "%.version.%")

Copilot · 2026-04-29T14:10:31Z

+      totalProcessed += batch.size();
+      if (!batch.isEmpty()) {
+        LOG.info("Backfilled {} entity_extension rows so far", totalProcessed);
+      }


The log counters in backfillVersionMetadata use totalProcessed += batch.size() and then log "Backfilled" rows, but the code may skip rows (non-version extensions, null id/extension) and only add batchCount updates. This makes the progress logs inaccurate and can be misleading during upgrades. Track/log the number of rows actually updated (e.g., batchCount or the batch execute result) separately from rows scanned.

Copilot · 2026-04-29T14:10:31Z

+  private static final int BATCH_SIZE = 5000;
+
+  private static final String UPDATE_MYSQL =
+      "UPDATE entity_extension SET versionNum = :versionNum, changedFieldKeys = :changedFieldKeys "
+          + "WHERE id = :id AND extension = :extension";


BATCH_SIZE is set to 5000, which is much larger than batching used in other migration utils in this repo (commonly 100–500). A 5000-row select + JSON parse + batch update can increase transaction size and lock/WAL pressure during upgrades. Consider reducing this to a smaller value (and/or making it configurable) to keep migrations predictable on large datasets.

…data Replace WHERE versionNum IS NULL full-table scan (EXPLAIN: type=ALL, 2.1M rows/batch) with PK cursor pagination (EXPLAIN: type=index, key=PRIMARY, 5000 rows/batch). Use PreparedBatch to reduce UPDATE round trips from 5000 to 1 per batch. Bump BATCH_SIZE 1000 -> 5000.

gitar-bot · 2026-04-29T16:13:05Z

Code Review ✅ Approved 7 resolved / 7 findings

Migration moved to 2.0.0 and incompatible MySQL indexes removed. Multiple critical stability issues addressed, including NullPointerExceptions, batch fallback data loss, and inefficient table scans.

✅ 7 resolved

✅ Bug: NullPointerException if id or extension is NULL in DB row

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java:1384-1386
In backfillVersionMetadata, lines 1385-1386 call row.get("extension").toString() and row.get("id").toString() without null checks. If the database returns a row where id or extension is NULL (e.g. corrupt data), this will throw a NullPointerException that is uncaught — it propagates out of the loop and stops the entire backfill. Unlike the json column which has an explicit null check on line 1391, these fields are dereferenced directly.

While id and extension are likely NOT NULL columns, defensive coding in a migration that processes every row in a large table is prudent — a single bad row shouldn't halt the entire backfill.

✅ Bug: Duplicate import of org.mockito.ArgumentCaptor in test

📄 openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java:48 📄 openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java:53
The test file imports org.mockito.ArgumentCaptor twice (lines 48 and 53). This will cause a compilation warning or error depending on the compiler configuration, and will certainly fail spotless:check.

✅ Edge Case: Postgres environments that ran 1.12.7 may get duplicate columns

📄 bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql:139-141 📄 bootstrap/sql/migrations/native/2.0.0/postgres/schemaChanges.sql:144-146
The Postgres DDL uses ADD COLUMN IF NOT EXISTS (idempotent), so re-running is safe. However, the MySQL DDL at line 139 uses ADD COLUMN without IF NOT EXISTS. If any MySQL environment somehow survived the 1.12.7 migration (columns exist but data migration crashed), the 2.0.0 ALTER TABLE will fail with 'Duplicate column name'. The PR description acknowledges this is unlikely since 1.12.7 crashed before completing on MySQL, but the asymmetry with Postgres is worth noting.

MySQL 8.0 doesn't support ADD COLUMN IF NOT EXISTS natively, but a workaround (e.g. a stored procedure or migration framework check) could make this more robust.

✅ Bug: Batch fallback loses changedFieldKeys for all rows in batch

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java:1434-1441
The old code had per-row fallback: if a single row's full update (versionNum + changedFieldKeys) failed, only that row fell back to versionNum-only, while all other rows kept their changedFieldKeys.

With PreparedBatch, if ANY single row causes the batch to fail (e.g. one row has oversized changedFieldKeys on MySQL), executeWithFallback catches the exception and calls fallbackVersionNumOnly for ALL rows in the batch. This means every row in the batch loses its changedFieldKeys, even though only one row was problematic.

With BATCH_SIZE=5000, a single bad row can cause up to 5000 rows to lose their changedFieldKeys data.

✅ Performance: Cursor scans entire table instead of only unprocessed version rows

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java:1376-1380
The old query used WHERE extension LIKE '%.version.%' AND versionNum IS NULL which efficiently filtered to only unprocessed version rows. The new cursor-based query scans ALL rows in entity_extension (non-version rows, already-processed rows), filtering .version. rows in Java with continue.

entity_extension stores all entity extension data, not just versions. On a large instance this table can have millions of rows, but only a fraction are version rows. The migration will read and discard a large number of irrelevant rows, and will also re-process already-backfilled rows (overwriting with identical values).

Adding AND extension LIKE '%.version.%' back to the SQL query would significantly reduce the number of rows fetched, and adding AND versionNum IS NULL would make it idempotent and skip already-processed rows (important for Postgres environments that ran 1.12.7).

...and 2 more resolved from earlier reviews

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

+ALTER TABLE entity_extension
+  ADD COLUMN versionNum DOUBLE NULL,
+  ADD COLUMN changedFieldKeys JSON NULL;
+
+CREATE INDEX idx_entity_extension_version_order
+  ON entity_extension (id, versionNum);


+        } catch (Exception versionOnlyEx) {
+          LOG.warn(
+              "Skipping row id={} extension={} after both updates failed",
+              id,
+              extension,
+              versionOnlyEx);
+        }


sonarqubecloud · 2026-04-29T17:17:35Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-29T18:42:16Z

🟡 Playwright Results — all passed (16 flaky)

✅ 3966 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	298	1	4
🟡 Shard 2	740	5	8
🟡 Shard 3	754	1	7
🟡 Shard 4	755	4	18
🟡 Shard 5	686	1	41
🟡 Shard 6	733	4	8

🟡 16 flaky test(s) (passed on retry)

Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 2, 1 retry)
Features/LandingPageWidgets/DomainDataProductsWidgets.spec.ts › Data Product asset count should update when assets are added (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Should verify property name is visible for apiCollection in right panel (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Container (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Add and update Security and SLA tabs (shard 4, 1 retry)
Pages/Entity.spec.ts › Tag Add, Update and Remove for child entities (shard 4, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
Pages/Users.spec.ts › Should handle default persona change and removal correctly (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

yan-3005 · 2026-05-05T03:54:59Z

#27837

fix: move version metadata migration from 1.12.7 to 2.0.0 and remove …

338fdf8

…crashing MySQL multi-valued index

Copilot AI review requested due to automatic review settings April 29, 2026 11:50

yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Apr 29, 2026

yan-3005 self-assigned this Apr 29, 2026

Copilot started reviewing on behalf of yan-3005 April 29, 2026 11:51 View session

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...adata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java Outdated

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...a-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java Outdated

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql

Copilot AI reviewed Apr 29, 2026

View reviewed changes

yan-3005 had a problem deploying to test April 29, 2026 12:00 — with GitHub Actions Error

fix: null guard for id/extension in backfillVersionMetadata, remove d…

7c3afeb

…uplicate import

yan-3005 temporarily deployed to test April 29, 2026 12:17 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test April 29, 2026 12:17 — with GitHub Actions Error

yan-3005 temporarily deployed to test April 29, 2026 12:17 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings April 29, 2026 14:04

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...adata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...adata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...adata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java Outdated

Copilot AI reviewed Apr 29, 2026

View reviewed changes

yan-3005 had a problem deploying to test April 29, 2026 14:16 — with GitHub Actions Error

yan-3005 force-pushed the ram/fix-version-metadata-migration branch from 298a69c to b49fa11 Compare April 29, 2026 14:31

gitar-bot Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...a-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java

yan-3005 had a problem deploying to test April 29, 2026 14:42 — with GitHub Actions Error

yan-3005 temporarily deployed to test April 29, 2026 14:42 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test April 29, 2026 14:42 — with GitHub Actions Error

yan-3005 had a problem deploying to test April 29, 2026 14:43 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings April 29, 2026 16:11

yan-3005 force-pushed the ram/fix-version-metadata-migration branch from b49fa11 to 322436a Compare April 29, 2026 16:11

Merge branch 'main' into ram/fix-version-metadata-migration

14cb78e

Copilot started reviewing on behalf of yan-3005 April 29, 2026 16:12 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

yan-3005 temporarily deployed to test April 29, 2026 16:23 — with GitHub Actions Inactive

yan-3005 closed this May 5, 2026

Conversation

yan-3005 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause — why the MySQL multi-valued index blows up

Why not partial indexing (index only the first N elements)?

Why is the index safe to remove?

What changed

1. Removed the 1.12.7 migration entirely

2. Moved the DDL to 2.0.0 schemaChanges.sql

3. Replaced the SQL data migration with a batched Java migration

Cursor-based pagination

PreparedBatch for updates

Fallback on batch failure

Performance summary (2.1M-row table)

4. Java vs SQL — behavioral equivalence

5. Tests

Pros / Cons

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitar-bot Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud Bot commented Apr 29, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented Apr 29, 2026

🟡 Playwright Results — all passed (16 flaky)

Uh oh!

yan-3005 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yan-3005 commented Apr 29, 2026 •

edited

Loading

gitar-bot Bot commented Apr 29, 2026 •

edited

Loading