fix(search): align dataAsset aggregation counts with index=tableColumn totals by mohityadav766 · Pull Request #27846 · open-metadata/OpenMetadata

mohityadav766 · 2026-04-30T11:01:12Z

Summary

Route the dataAsset/all alias through a new buildAllAssetsSearchBuilderV2, which builds a per-entity-type bool union — each clause is filter(entityType=<type>) must(<type's own query>). The tableColumn branch reuses the column builder; every other type goes through its dedicated asset config; a fallback should covers types in the dataAsset alias without a config (e.g. glossary, apiCollection). Each entity-type bucket count therefore equals what the dedicated index returns for the same query, by construction.
Tighten buildColumnSearchBuilderV2 to Operator.And so multi-token queries like first_name require every analyzer sub-token to match somewhere (the previous Or + min_should_match=0 over-matched on any single sub-token).
Add name.ngram, name.compound, displayName.ngram, displayName.compound to ColumnSearchIndex.getFields() so prefix queries (e.g. fir) still match column docs from both index=tableColumn and the dataAsset bucket.

Issue: open-metadata/openmetadata-collate#3851

Reproduction / verification

scripts/reproduce_column_agg_mismatch.sh is a multi-query probe that exits non-zero on any divergence between the dataAsset aggregation bucket and the index=tableColumn total. After this PR, all probed queries return matching counts:

[OK] q='first_name'           agg.tableColumnBucket=85   tableColumn.total=85
[OK] q='last_name'            agg.tableColumnBucket=45   tableColumn.total=45
[OK] q='first name'           agg.tableColumnBucket=85   tableColumn.total=85
[OK] q='shipping address'     agg.tableColumnBucket=0    tableColumn.total=0
[OK] q='first name address'   agg.tableColumnBucket=0    tableColumn.total=0
[OK] q='fir'                  agg.tableColumnBucket=45   tableColumn.total=45

Tests covering bucket-parity for tableColumn/table and the sub-token over-match guard are coming in a follow-up commit.

Test plan

Restart OM, run ./scripts/reproduce_column_agg_mismatch.sh — all probed queries return OK.
Spot-check the explore search bar in the UI: type first_name, confirm the tableColumn tab badge equals the entity-type aggregation panel count.
Type fir — column results appear, and the bucket count matches the tableColumn tab.
Confirm specific-index queries (index=table, index=topic, etc.) still behave as before.

🤖 Generated with Claude Code

Summary by Gitar

Search Performance Optimization:
- Optimized buildUnconfiguredAssetFallbackV2 in both ElasticSearchSourceBuilderFactory and OpenSearchSourceBuilderFactory by replacing iterative mustNot term queries with a single, more efficient termsQuery.
- Added utility methods termsQuery to ElasticQueryBuilder and OpenSearchQueryBuilder to support batch filtering of entity types.

_{This will update automatically on new commits.}

…n totals Refactor the `dataAsset`/`all` alias query path so each entity-type bucket in the aggregation matches what its dedicated index returns for the same query. The composite asset config used to merge fields from every type, then apply phrase/ngram-fuzzy semantics to all docs; column docs got semantics different from `buildColumnSearchBuilderV2`, which is why the explore search bar's two calls disagreed on the tableColumn count. The new `buildAllAssetsSearchBuilderV2` builds a per-entity-type bool union: each clause is `filter(entityType=<type>) must(<type's own query>)`. The column branch reuses `buildColumnMultiMatchV2`; every other type goes through `buildBaseQueryV2` with its dedicated config; an extra `should` covers asset types in the `dataAsset` alias that lack a config (e.g. `glossary`, `apiCollection`) using the default config. Also tightens the column builder to `Operator.And` so multi-token queries like `first_name` require every sub-token to match somewhere — fixes the `om_analyzer`-driven over-match where the lenient `Or` + `min_should_match=0` variant matched any column whose name contained just `first` or just `name`. `ColumnSearchIndex.getFields()` gains `name.ngram`, `name.compound`, `displayName.ngram`, `displayName.compound` so prefix queries like `fir` still match column docs from both `index=tableColumn` and the dataAsset bucket. Includes `scripts/reproduce_column_agg_mismatch.sh` — multi-query probe that exits non-zero on any divergence between `index=dataAsset` (aggregation bucket) and `index=tableColumn` (total) for the same query. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds integration tests under ColumnSearchIndexIT that pin the behavior of the fix in PR #27846: - testDataAssetTableColumnAggregationMatchesTableColumnTotal: dataAsset bucket count for tableColumn equals index=tableColumn total for a multi-token query against seeded columns. - testColumnQueryRequiresAllSubtokensToMatch: query "<tag>_first_name" must match the seeded "<tag>_first_name" column but NOT "<tag>_first_id" — pins the Operator.And fix that closed the om_analyzer sub-token over-match. - testDataAssetTableBucketMatchesTableIndexTotal: same parity guarantee for the "table" entity-type bucket, exercising the per-type-union path for a non-column type. - testPrefixQueryMatchesViaNgramOnBothPaths: short prefix queries (e.g. the first few chars of the seeded tag) must match seeded columns via name.ngram and stay in parity across both endpoints. - testUnconfiguredAssetTypeFallbackMatchesViaDataAsset: a Glossary doc (an asset type without an explicit searchSettings.assetTypeConfigurations entry) must still surface via index=dataAsset, exercising the fallback should clause in buildPerTypeUnionQueryV2. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates the search query-building logic for the all / dataAsset aliases so that entity-type aggregation bucket counts (notably tableColumn) align with the totals returned by the corresponding dedicated index queries, and tightens column query matching to avoid underscore sub-token overmatching.

Changes:

Route index=all and index=dataAsset through a new per-entity-type union query builder (ElasticSearch + OpenSearch) to align aggregation bucket counts with dedicated index totals.
Tighten tableColumn query behavior by reusing a shared column multi-match builder with Operator.And.
Expand column searchable fields to include name.ngram, name.compound, displayName.ngram, and displayName.compound to preserve prefix/partial matching behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

File	Description
scripts/reproduce_column_agg_mismatch.sh	Adds a probe script to reproduce/verify the `dataAsset` aggregation vs `tableColumn` totals mismatch.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java	Implements the per-entity-type union query for `all`/`dataAsset` and centralizes the stricter column multi-match.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java	Mirrors the per-entity-type union query approach for ElasticSearch and centralizes the stricter column multi-match.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ColumnSearchIndex.java	Adds ngram/compound subfields to the column field boost map to support prefix/partial matching.

Copilot · 2026-04-30T11:07:12Z

+   * field. Without {@code And}, a query like {@code first_name} matches any column whose name
+   * contains just {@code first} or just {@code name}, which both inflates the column index hits
+   * and creates the dataAsset/tableColumn count mismatch tracked in github issue #3851.


The Javadoc references the old behavior as Operator.Or + min_should_match=0, but the helper used here takes fuzziness as the last parameter and only sets minimum_should_match when fuzziness is enabled and operator is OR. Please adjust the comment to match the real previous query shape to avoid confusion.

Suggested change

* field. Without {@code And}, a query like {@code first_name} matches any column whose name

* contains just {@code first} or just {@code name}, which both inflates the column index hits

* and creates the dataAsset/tableColumn count mismatch tracked in github issue #3851.

* field. Previously this query used the same helper with {@code Operator.Or} and a last

* argument of {@code "0"}; with that OR-style query, a search like {@code first_name} matched

* any column whose name contained just {@code first} or just {@code name}, which inflated the

* column index hits and created the dataAsset/tableColumn count mismatch tracked in github

* issue #3851.

Copilot · 2026-04-30T11:07:13Z

+set -euo pipefail
+
+HOST="${OM_HOST:-http://localhost:8585}"
+TOKEN="${OM_TOKEN:-eyJraWQiOiJHYjM4OWEtOWY3Ni1nZGpzLWE5MmotMDI0MmJrOTQzNTYiLCJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZG1pbiIsImlzQm90IjpmYWxzZSwiaXNzIjoib3Blbi1tZXRhZGF0YS5vcmciLCJpYXQiOjE2NjM5Mzg0NjIsImVtYWlsIjoiYWRtaW5Ab3Blbm1ldGFkYXRhLm9yZyJ9.tS8um_5DKu7HgzGBzS1VTA5uUjKWOCU0B_j08WXBiEC0mr0zNREkqVfwFDD-d24HlNEbrqioLsBuFRiwIWKc1m_ZlVQbG7P36RUxhuv2vbSp80FKyNM-Tj93FDzq91jsyNmsQhyNv_fNr3TXfzzSPjHt8Go0FMMP66weoKMgW2PbXlhVKwEuXUHyakLLzewm9UMeQaEiRzhiTMU3UkLXcKbYEJJvfNFcLwSl9W8JCO_l0Yj3ud-qt_nQYEZwqW6u5nfdQllN133iikV4fM5QZsMCnm8Rq1mvLR0y9bmJiD7fwM1tmJ791TUWqmKaTnP49U493VanKpUAfzIiOiIbhg}"


The script hard-codes a full JWT as the default OM_TOKEN value. Even if it’s intended for local dev, committing tokens is a security risk and encourages running with a privileged credential. Please remove the embedded token and instead require OM_TOKEN to be provided (or fail fast with a clear message / optionally prompt).

Suggested change

TOKEN="${OM_TOKEN:-eyJraWQiOiJHYjM4OWEtOWY3Ni1nZGpzLWE5MmotMDI0MmJrOTQzNTYiLCJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJhZG1pbiIsImlzQm90IjpmYWxzZSwiaXNzIjoib3Blbi1tZXRhZGF0YS5vcmciLCJpYXQiOjE2NjM5Mzg0NjIsImVtYWlsIjoiYWRtaW5Ab3Blbm1ldGFkYXRhLm9yZyJ9.tS8um_5DKu7HgzGBzS1VTA5uUjKWOCU0B_j08WXBiEC0mr0zNREkqVfwFDD-d24HlNEbrqioLsBuFRiwIWKc1m_ZlVQbG7P36RUxhuv2vbSp80FKyNM-Tj93FDzq91jsyNmsQhyNv_fNr3TXfzzSPjHt8Go0FMMP66weoKMgW2PbXlhVKwEuXUHyakLLzewm9UMeQaEiRzhiTMU3UkLXcKbYEJJvfNFcLwSl9W8JCO_l0Yj3ud-qt_nQYEZwqW6u5nfdQllN133iikV4fM5QZsMCnm8Rq1mvLR0y9bmJiD7fwM1tmJ791TUWqmKaTnP49U493VanKpUAfzIiOiIbhg}"

TOKEN="${OM_TOKEN:?ERROR: OM_TOKEN must be set to a valid OpenMetadata JWT before running this script.}"

Copilot · 2026-04-30T11:07:13Z

+  /**
+   * Build a search source for the {@code all} / {@code dataAsset} alias as a per-entity-type
+   * union: each asset type contributes a clause built with its own configuration (column docs go
+   * through {@link #buildColumnMultiMatchV2(String)}, every other type through {@link
+   * #buildBaseQueryV2(String, AssetTypeConfiguration)}), filtered by {@code entityType=<type>}.
+   * Each entity-type bucket in the aggregation therefore equals what the dedicated index returns
+   * for the same query, by construction. Avoids the composite-config divergence behind
+   * github.com/open-metadata/openmetadata-collate#3851.
+   */
+  public ElasticSearchRequestBuilder buildAllAssetsSearchBuilderV2(
+      String query, int from, int size, boolean explain, boolean includeAggregations) {
+    AssetTypeConfiguration compositeConfig = buildCompositeAssetConfig(searchSettings);
+    es.co.elastic.clients.elasticsearch._types.query_dsl.Query baseQuery =
+        buildPerTypeUnionQueryV2(query);
+    es.co.elastic.clients.elasticsearch._types.query_dsl.Query finalQuery =
+        applyFunctionScoringV2(baseQuery, compositeConfig);
+    es.co.elastic.clients.elasticsearch.core.search.Highlight highlightBuilder =
+        buildHighlightingIfNeededV2(query, compositeConfig);
+
+    ElasticSearchRequestBuilder searchRequestBuilder =
+        createSearchSourceBuilderV2(finalQuery, from, size);
+    if (highlightBuilder != null) {
+      searchRequestBuilder.highlighter(highlightBuilder);
+    }
+    if (includeAggregations) {
+      addConfiguredAggregationsV2(searchRequestBuilder, compositeConfig);
+    }
+    searchRequestBuilder.explain(explain);
+    return searchRequestBuilder;
+  }
+
+  private es.co.elastic.clients.elasticsearch._types.query_dsl.Query buildPerTypeUnionQueryV2(
+      String query) {
+    if (isMatchAllQuery(query)) {
+      return ElasticQueryBuilder.boolQuery().must(ElasticQueryBuilder.matchAllQuery()).build();
+    }
+    ElasticQueryBuilder.BoolQueryBuilder union = ElasticQueryBuilder.boolQuery();
+    Set<String> configuredTypes = new HashSet<>();
+    for (AssetTypeConfiguration typeConfig : searchSettings.getAssetTypeConfigurations()) {
+      String assetType = typeConfig.getAssetType();
+      if (assetType == null || assetType.equals(INDEX_ALL)) {
+        continue;
+      }
+      configuredTypes.add(assetType);
+      union.should(buildAssetTypeClauseV2(query, assetType, typeConfig));
+    }
+    union.should(buildUnconfiguredAssetFallbackV2(query, configuredTypes));
+    union.minimumShouldMatch(1);
+    return union.build();


This refactor changes the query semantics for the high-traffic all/dataAsset alias (per-entity-type bool union) and tightens column matching, but there are no automated tests added here to lock in the new bucket-parity and underscore sub-token behavior. Please add/extend tests (unit and/or integration) that assert: (1) index=dataAsset entityType bucket counts match the totals from the corresponding dedicated index for at least table and tableColumn, and (2) a query like first_name does not match columns that only contain first or only name.

Copilot · 2026-04-30T11:07:13Z

+  /**
+   * Build a search source for the {@code all} / {@code dataAsset} alias as a per-entity-type
+   * union: each asset type contributes a clause built with its own configuration (column docs go
+   * through {@link #buildColumnMultiMatchV2(String)}, every other type through {@link
+   * #buildBaseQueryV2(String, AssetTypeConfiguration)}), filtered by {@code entityType=<type>}.
+   * Each entity-type bucket in the aggregation therefore equals what the dedicated index returns
+   * for the same query, by construction. Avoids the composite-config divergence behind
+   * github.com/open-metadata/openmetadata-collate#3851.
+   */
+  public OpenSearchRequestBuilder buildAllAssetsSearchBuilderV2(
+      String query, int from, int size, boolean explain, boolean includeAggregations) {
+    AssetTypeConfiguration compositeConfig = getOrBuildCompositeConfig();
+    os.org.opensearch.client.opensearch._types.query_dsl.Query baseQuery =
+        buildPerTypeUnionQueryV2(query);
+    os.org.opensearch.client.opensearch._types.query_dsl.Query finalQuery =
+        applyFunctionScoringV2(baseQuery, compositeConfig);
+    os.org.opensearch.client.opensearch.core.search.Highlight highlightBuilder =
+        buildHighlightingIfNeededV2(query, compositeConfig);
+
+    OpenSearchRequestBuilder searchRequestBuilder =
+        createSearchSourceBuilderV2(finalQuery, from, size);
+    if (highlightBuilder != null) {
+      searchRequestBuilder.highlighter(highlightBuilder);
+    }
+    if (includeAggregations) {
+      addConfiguredAggregationsV2(searchRequestBuilder, compositeConfig);
+    }
+    searchRequestBuilder.explain(explain);
+    return searchRequestBuilder;
+  }
+
+  private os.org.opensearch.client.opensearch._types.query_dsl.Query buildPerTypeUnionQueryV2(
+      String query) {
+    if (isMatchAllQuery(query)) {
+      return OpenSearchQueryBuilder.boolQuery()
+          .must(OpenSearchQueryBuilder.matchAllQuery())
+          .build();
+    }
+    OpenSearchQueryBuilder.BoolQueryBuilder union = OpenSearchQueryBuilder.boolQuery();
+    Set<String> configuredTypes = new HashSet<>();
+    for (AssetTypeConfiguration typeConfig : searchSettings.getAssetTypeConfigurations()) {
+      String assetType = typeConfig.getAssetType();
+      if (assetType == null || assetType.equals(INDEX_ALL)) {
+        continue;
+      }
+      configuredTypes.add(assetType);
+      union.should(buildAssetTypeClauseV2(query, assetType, typeConfig));
+    }
+    union.should(buildUnconfiguredAssetFallbackV2(query, configuredTypes));
+    union.minimumShouldMatch(1);
+    return union.build();
+  }
+
+  private static boolean isMatchAllQuery(String query) {
+    return query == null || query.trim().isEmpty() || query.trim().equals("*");
+  }
+
+  private os.org.opensearch.client.opensearch._types.query_dsl.Query buildAssetTypeClauseV2(
+      String query, String assetType, AssetTypeConfiguration typeConfig) {
+    os.org.opensearch.client.opensearch._types.query_dsl.Query inner =
+        Entity.TABLE_COLUMN.equals(assetType)
+            ? buildColumnMultiMatchV2(query)
+            : buildBaseQueryV2(query, typeConfig);
+    return OpenSearchQueryBuilder.boolQuery()
+        .filter(OpenSearchQueryBuilder.termQuery(ENTITY_TYPE_FIELD, assetType))
+        .must(inner)
+        .build();
+  }
+
+  /**
+   * Catches asset types that are part of the {@code dataAsset} alias but lack a dedicated entry in
+   * {@code searchSettings.assetTypeConfigurations} (e.g. {@code glossary}, {@code apiCollection}).
+   * Without this, docs of those types would silently disappear from the dataAsset alias after the
+   * per-type-union refactor.
+   */
+  private os.org.opensearch.client.opensearch._types.query_dsl.Query
+      buildUnconfiguredAssetFallbackV2(String query, Set<String> configuredTypes) {
+    OpenSearchQueryBuilder.BoolQueryBuilder fallback =
+        OpenSearchQueryBuilder.boolQuery()
+            .must(buildBaseQueryV2(query, getOrCreateDefaultConfig()));
+    for (String configured : configuredTypes) {
+      fallback.mustNot(OpenSearchQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));
+    }
+    return fallback.build();
+  }


The new all/dataAsset per-type-union behavior and the stricter column multi-match are not covered by tests in this PR. Please add/extend tests to ensure (a) dataAsset entityType bucket counts stay in sync with dedicated-index totals (especially tableColumn), and (b) underscore-split identifier queries (e.g. first_name) require all sub-tokens to match (no overmatching on just one token).

Copilot · 2026-04-30T11:07:13Z

+   * by {@code om_analyzer} must hit some field. Without {@code And}, a query like {@code
+   * first_name} matches any column whose name contains just {@code first} or just {@code name},
+   * which both inflates the column index hits and creates the dataAsset/tableColumn count
+   * mismatch tracked in github issue #3851.


The Javadoc mentions the previous column builder used min_should_match=0, but the underlying multiMatchQuery(..., tieBreaker, fuzziness) helper’s last argument is fuzziness (and no minimum_should_match is set when fuzziness is "0"). Please update the comment to reflect the actual previous behavior so it doesn’t mislead future debugging/tuning.

Suggested change

* mismatch tracked in github issue #3851.

* mismatch tracked in github issue #3851. The previous builder behavior here was equivalent to

* passing {@code fuzziness="0"} to {@code multiMatchQuery(..., tieBreaker, fuzziness)}, which

* disables fuzziness; this helper invocation does not set {@code minimum_should_match}.

gitar-bot · 2026-04-30T11:08:14Z

+  private Table createTableWithMultiTokenColumns(TestNamespace ns, String baseName, String tag) {
+    String shortId = ns.shortPrefix();
+
+    org.openmetadata.schema.services.connections.database.PostgresConnection conn =
+        DatabaseServices.postgresConnection().hostPort("localhost:5432").username("test").build();
+
+    DatabaseService dbService =
+        DatabaseServices.builder()
+            .name("agg_svc_" + shortId + "_" + baseName)
+            .connection(conn)
+            .description("Test service for dataAsset/tableColumn aggregation parity")
+            .create();
+
+    CreateDatabase dbReq = new CreateDatabase();
+    dbReq.setName("agg_db_" + shortId + "_" + baseName);


💡 Quality: createTableWithMultiTokenColumns duplicates boilerplate from other helpers

The new createTableWithMultiTokenColumns method (lines 764-809) repeats ~30 lines of service/database/schema creation that are identical to createTableWithColumns, createTableWithNestedColumns, and createTableWithDeeplyNestedColumns. Per the project guidelines (no duplication, extract shared logic), consider extracting the common infra setup into a shared helper that accepts a List<Column> and returns a Table.

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

gitar-bot · 2026-04-30T11:08:16Z

+    void testUnconfiguredAssetTypeFallbackMatchesViaDataAsset(TestNamespace ns) throws Exception {
+      OpenMetadataClient client = SdkClients.adminClient();
+      String name = ns.prefix("glossary_unconfigured_fallback");
+      CreateGlossary req = new CreateGlossary().withName(name).withDescription(name);
+      Glossary glossary = client.glossaries().create(req);
+      assertNotNull(glossary);
+
+      Awaitility.await()
+          .atMost(90, TimeUnit.SECONDS)
+          .pollInterval(500, TimeUnit.MILLISECONDS)
+          .until(() -> totalHitsForIndex(client, name, "dataAsset") >= 1);
+
+      long total = totalHitsForIndex(client, name, "dataAsset");
+      assertTrue(
+          total >= 1,


💡 Quality: Glossary test doesn't clean up created glossary entity

testUnconfiguredAssetTypeFallbackMatchesViaDataAsset creates a glossary via client.glossaries().create(req) but never deletes it. The other helpers use TestNamespace-scoped names that presumably get cleaned up, but this glossary is created ad-hoc. If the test framework doesn't auto-clean glossaries, this is a resource leak across test runs. Verify the cleanup strategy or add an @AfterEach / try-finally deletion.

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

…n totals Refactor the `dataAsset`/`all` alias query path so each entity-type bucket in the aggregation matches what its dedicated index returns for the same query. The composite asset config used to merge fields from every type, then apply phrase/ngram-fuzzy semantics to all docs; column docs got semantics different from `buildColumnSearchBuilderV2`, which is why the explore search bar's two calls disagreed on the tableColumn count. The new `buildAllAssetsSearchBuilderV2` builds a per-entity-type bool union: each clause is `filter(entityType=<type>) must(<type's own query>)`. The column branch reuses `buildColumnMultiMatchV2`; every other type goes through `buildBaseQueryV2` with its dedicated config; an extra `should` covers asset types in the `dataAsset` alias that lack a config (e.g. `glossary`, `apiCollection`) using the default config. Also tightens the column builder to `Operator.And` so multi-token queries like `first_name` require every sub-token to match somewhere — fixes the `om_analyzer`-driven over-match where the lenient `Or` + `min_should_match=0` variant matched any column whose name contained just `first` or just `name`. `ColumnSearchIndex.getFields()` gains `name.ngram`, `name.compound`, `displayName.ngram`, `displayName.compound` so prefix queries like `fir` still match column docs from both `index=tableColumn` and the dataAsset bucket. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds integration tests under ColumnSearchIndexIT that pin the behavior of the fix in PR #27846: - testDataAssetTableColumnAggregationMatchesTableColumnTotal: dataAsset bucket count for tableColumn equals index=tableColumn total for a multi-token query against seeded columns. - testColumnQueryRequiresAllSubtokensToMatch: query "<tag>_first_name" must match the seeded "<tag>_first_name" column but NOT "<tag>_first_id" — pins the Operator.And fix that closed the om_analyzer sub-token over-match. - testDataAssetTableBucketMatchesTableIndexTotal: same parity guarantee for the "table" entity-type bucket, exercising the per-type-union path for a non-column type. - testPrefixQueryMatchesViaNgramOnBothPaths: short prefix queries (e.g. the first few chars of the seeded tag) must match seeded columns via name.ngram and stay in parity across both endpoints. - testUnconfiguredAssetTypeFallbackMatchesViaDataAsset: a Glossary doc (an asset type without an explicit searchSettings.assetTypeConfigurations entry) must still surface via index=dataAsset, exercising the fallback should clause in buildPerTypeUnionQueryV2. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- testTopicBucketAndResultSetMatchIndexTopic: pins parity for a non-column entity type at both the count and the FQN-set level, exercising the per-type-union path for `topic`. The dataAsset alias must return the same set of topic FQNs that index=topic returns for the same query. - testComplexSyntaxQueriesKeepParity: runs four representative complex- syntax shapes (quoted phrase, AND, OR, mixed parenthesised) through both endpoints and asserts the tableColumn bucket count equals index=tableColumn total. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Copilot · 2026-04-30T11:21:14Z

+    for (String configured : configuredTypes) {
+      fallback.mustNot(ElasticQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));


buildUnconfiguredAssetFallbackV2 currently emits one mustNot term(entityType=...) per configured type. With many asset types this can create a large bool query and add overhead. Use a single mustNot with a terms query over configuredTypes instead (ElasticQueryBuilder supports termsQuery).

Suggested change

for (String configured : configuredTypes) {

fallback.mustNot(ElasticQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));

if (!configuredTypes.isEmpty()) {

fallback.mustNot(ElasticQueryBuilder.termsQuery(ENTITY_TYPE_FIELD, configuredTypes));

Copilot · 2026-04-30T11:21:15Z

+
+      String multiTokenQuery = tag + "_first " + tag + "_address";
+
+      Awaitility.await()
+          .atMost(90, TimeUnit.SECONDS)
+          .pollInterval(500, TimeUnit.MILLISECONDS)
+          .until(
+              () -> {
+                String r =
+                    client
+                        .search()
+                        .query(multiTokenQuery)
+                        .index("tableColumn")
+                        .size(0)
+                        .deleted(false)
+                        .execute();
+                JsonNode root = OBJECT_MAPPER.readTree(r);
+                long total = root.path("hits").path("total").path("value").asLong(-1);
+                return total >= 3;


The awaited query tag + "_first " + tag + "_address" is unlikely to ever reach total >= 3 now that the column builder uses a multi_match with Operator.And (each hit must contain all analyzed tokens, so no single column will match both "first" and "*_address"). This can make the Awaitility wait time out and fail the test. Use a query that is expected to match at least one seeded column under the new AND semantics (e.g., target a single column name) and wait for that expected minimum instead of >= 3.

Copilot · 2026-04-30T11:21:15Z

+      JsonNode aggBuckets =
+          OBJECT_MAPPER
+              .readTree(aggResponse)
+              .path("aggregations")
+              .path("sterms#entityType")
+              .path("buckets");


This code assumes the entity-type aggregation is always under aggregations.sterms#entityType. In this repo, other integration tests account for responses that use either entityType or sterms#entityType depending on backend/aggregation builder. To avoid backend-specific failures, locate the aggregation node by checking both keys (or factor this into a shared helper).

Suggested change

JsonNode aggBuckets =

OBJECT_MAPPER

.readTree(aggResponse)

.path("aggregations")

.path("sterms#entityType")

.path("buckets");

JsonNode aggregations = OBJECT_MAPPER.readTree(aggResponse).path("aggregations");

JsonNode entityTypeAggregation = aggregations.path("entityType");

if (entityTypeAggregation.isMissingNode()) {

entityTypeAggregation = aggregations.path("sterms#entityType");

}

JsonNode aggBuckets = entityTypeAggregation.path("buckets");

Copilot · 2026-04-30T11:21:15Z

+      Awaitility.await()
+          .atMost(90, TimeUnit.SECONDS)
+          .pollInterval(500, TimeUnit.MILLISECONDS)
+          .until(() -> totalHitsForIndex(client, prefixQuery, "tableColumn") >= 5);


The Awaitility condition totalHitsForIndex(...) >= 5 looks too high for the data created in this test (the helper creates 3 columns). If the random tag is unique (likely), the prefix query may only ever match those seeded columns, causing an unnecessary timeout/flaky failure. Consider waiting for the expected minimum (e.g., >= 3) or simply > 0.

Suggested change

.until(() -> totalHitsForIndex(client, prefixQuery, "tableColumn") >= 5);

.until(() -> totalHitsForIndex(client, prefixQuery, "tableColumn") >= 3);

Copilot · 2026-04-30T11:21:16Z

+      long aggTopicBucket = bucketCountFromDataAsset(client, tag, Entity.TOPIC);
+      assertEquals(
+          topicTotal,
+          aggTopicBucket,
+          "dataAsset topic bucket must equal index=topic total for query " + tag);
+


Helper bucketCountFromDataAsset hard-codes aggregations.sterms#entityType. Elsewhere (e.g., SearchResourceIT) the code handles both entityType and sterms#entityType. To prevent backend-dependent failures, update this helper to select whichever aggregation key is present.

Copilot · 2026-04-30T11:21:16Z

+    for (String configured : configuredTypes) {
+      fallback.mustNot(OpenSearchQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));


buildUnconfiguredAssetFallbackV2 adds one mustNot term(entityType=...) clause per configured type. With many configured asset types this can bloat the query and slow execution. Prefer a single mustNot with a terms query over the configuredTypes set (the query builder already supports termsQuery).

Suggested change

for (String configured : configuredTypes) {

fallback.mustNot(OpenSearchQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));

if (!configuredTypes.isEmpty()) {

fallback.mustNot(

OpenSearchQueryBuilder.termsQuery(

ENTITY_TYPE_FIELD,

configuredTypes.stream().map(FieldValue::of).toList()));

…n totals Refactor the `dataAsset`/`all` alias query path so each entity-type bucket in the aggregation matches what its dedicated index returns for the same query. The composite asset config used to merge fields from every type, then apply phrase/ngram-fuzzy semantics to all docs; column docs got semantics different from `buildColumnSearchBuilderV2`, which is why the explore search bar's two calls disagreed on the tableColumn count. The new `buildAllAssetsSearchBuilderV2` builds a per-entity-type bool union: each clause is `filter(entityType=<type>) must(<type's own query>)`. The column branch reuses `buildColumnMultiMatchV2`; every other type goes through `buildBaseQueryV2` with its dedicated config; an extra `should` covers asset types in the `dataAsset` alias that lack a config (e.g. `glossary`, `apiCollection`) using the default config. Also tightens the column builder to `Operator.And` so multi-token queries like `first_name` require every sub-token to match somewhere — fixes the `om_analyzer`-driven over-match where the lenient `Or` + `min_should_match=0` variant matched any column whose name contained just `first` or just `name`. `ColumnSearchIndex.getFields()` gains `name.ngram`, `name.compound`, `displayName.ngram`, `displayName.compound` so prefix queries like `fir` still match column docs from both `index=tableColumn` and the dataAsset bucket. Adds integration tests in ColumnSearchIndexIT covering: - multi-token query parity for the tableColumn bucket vs index=tableColumn - sub-token over-match guard (query `<tag>_first_name` must not match a seeded `<tag>_first_id` column) - table bucket parity (count) and topic parity (count + FQN-set match) - prefix queries via `name.ngram` keeping parity across both endpoints - complex-syntax queries (quoted phrase, AND/OR, mixed parenthesised) keeping the tableColumn bucket parity - unconfigured asset type fallback (Glossary docs reachable via dataAsset) Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-30T11:32:29Z

+      JsonNode buckets =
+          OBJECT_MAPPER
+              .readTree(response)
+              .path("aggregations")
+              .path("sterms#entityType")
+              .path("buckets");


bucketCountFromDataAsset assumes the entity-type aggregation lives under aggregations.sterms#entityType, but other integration tests in this repo already need to support aggregations.entityType as well. This helper should check for both keys; otherwise it will silently return 0 and break the parity assertions depending on search backend/serialization.

Suggested change

JsonNode buckets =

OBJECT_MAPPER

.readTree(response)

.path("aggregations")

.path("sterms#entityType")

.path("buckets");

JsonNode root = OBJECT_MAPPER.readTree(response);

JsonNode aggregations = root.path("aggregations");

JsonNode buckets = aggregations.path("sterms#entityType").path("buckets");

if (buckets.isMissingNode() || !buckets.isArray()) {

buckets = aggregations.path("entityType").path("buckets");

}

Copilot · 2026-04-30T11:32:30Z

+      String multiTokenQuery = tag + "_first " + tag + "_address";
+
+      Awaitility.await()
+          .atMost(90, TimeUnit.SECONDS)
+          .pollInterval(500, TimeUnit.MILLISECONDS)
+          .until(
+              () -> {
+                String r =
+                    client
+                        .search()
+                        .query(multiTokenQuery)
+                        .index("tableColumn")
+                        .size(0)
+                        .deleted(false)
+                        .execute();
+                JsonNode root = OBJECT_MAPPER.readTree(r);
+                long total = root.path("hits").path("total").path("value").asLong(-1);
+                return total >= 3;
+              });


multiTokenQuery is constructed as two different column-name prefixes separated by a space (<tag>_first <tag>_address). With the updated column multi-match using Operator.And, this query is unlikely to match any single column document (no column contains both “first” and “address”), which would cause the Awaitility block to time out and make the test fail. Use a query that is expected to match at least one seeded column doc (e.g., the actual <tag>_first_name term or another query where all required tokens are present in the same column).

Copilot · 2026-04-30T11:32:30Z

+     * into {@code [first, name]}; with the old {@code Operator.Or} + {@code min_should_match=0}
+     * column builder, a column called {@code <tag>_first_id} matched a query of {@code
+     * <tag>_first_name} because the single token {@code first} was enough. The fix moves the
+     * column builder to {@code Operator.And}, so every sub-token must match.


The comment says the old column builder used Operator.Or + min_should_match=0, but the implementation was passing the string "0" as the fuzziness parameter (see ElasticQueryBuilder/OpenSearchQueryBuilder.multiMatchQuery(..., fuzziness)). minimum_should_match was not set at all when fuzziness was "0". Please update the comment to reflect the actual behavior so future debugging isn’t misled.

Suggested change

* into {@code [first, name]}; with the old {@code Operator.Or} + {@code min_should_match=0}

* column builder, a column called {@code <tag>_first_id} matched a query of {@code

* <tag>_first_name} because the single token {@code first} was enough. The fix moves the

* column builder to {@code Operator.And}, so every sub-token must match.

* into {@code [first, name]}; with the old {@code Operator.Or} column builder, the string

* {@code "0"} was passed as the fuzziness value rather than setting {@code

* minimum_should_match}, so a column called {@code <tag>_first_id} matched a query of

* {@code <tag>_first_name} because the single token {@code first} was enough. The fix moves

* the column builder to {@code Operator.And}, so every sub-token must match.

Copilot · 2026-04-30T11:32:30Z

+      Awaitility.await()
+          .atMost(90, TimeUnit.SECONDS)
+          .pollInterval(500, TimeUnit.MILLISECONDS)
+          .until(() -> totalHitsForIndex(client, tag, "tableColumn") >= 5);


Similar to the prefix test, this Awaitility condition requires totalHitsForIndex(tag, "tableColumn") >= 5 even though the table fixture seeds 3 columns. In a properly isolated namespace this is likely to never reach 5 and will make the test flaky/fail. Lower the threshold to what the fixture guarantees (e.g., >= 1 or >= 3) or wait for a specific known seeded column to appear.

Suggested change

.until(() -> totalHitsForIndex(client, tag, "tableColumn") >= 5);

.until(() -> totalHitsForIndex(client, tag, "tableColumn") >= 3);

Bug fixes the reviewer flagged: - buildPerTypeUnionQueryV2 (both factories) now guards against searchSettings.getAssetTypeConfigurations() returning null/empty by falling back to the default config so dataAsset/all queries can't NPE if the search settings haven't been initialized. - testDataAssetTableColumnAggregationMatchesTableColumnTotal previously awaited a multi-token query that no single seeded column could satisfy under the new Operator.And semantics, leaving the parity assertion trivially 0 == 0; switched to a query that actually matches multiple seeded columns. - testColumnQueryRequiresAllSubtokensToMatch was relying on a column name (`<tag>_first_id`) that was never created, so the negative assertion was trivially true. createTableWithMultiTokenColumns now seeds first_id (and alpha/bravo columns used by the complex-syntax and prefix tests) so all the parity assertions exercise real indexed documents. - bucketCountFromDataAsset accepts both `entityType` and `sterms#entityType` aggregation key shapes so the helper doesn't break on backends that label the bucket differently. - Javadoc on buildColumnMultiMatchV2 (both factories) corrected to describe the actual previous shape (`Operator.Or` + `fuzziness="0"`, which left minimum_should_match unset) instead of the inaccurate `min_should_match=0`. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-04-30T11:38:05Z

+    return SdkClients.adminClient().tables().create(tableRequest);
+  }
+
+  private static final int MULTI_TOKEN_SEED_COUNT = 7;


💡 Quality: MULTI_TOKEN_SEED_COUNT constant is defined but never used

The constant MULTI_TOKEN_SEED_COUNT = 7 was added at line 916 but is never referenced anywhere in the file. The magic numbers 5 in the await conditions (e.g., >= 5 at lines 505 and 605) and 2 at line 396 appear to be the intended use sites but still use raw literals. Either use the constant or remove it.

Suggested fix:

Either replace the magic numbers with the constant: .until(() -> totalHitsForIndex(client, tag, "tableColumn") >= MULTI_TOKEN_SEED_COUNT - 2); or remove the unused constant if it was only added for documentation purposes.

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

Adds a `termsQuery(String, Collection<String>)` helper to both OpenSearchQueryBuilder and ElasticQueryBuilder, then uses it in buildUnconfiguredAssetFallbackV2 to replace the per-type `mustNot term(entityType=...)` chain with a single `mustNot terms(entityType=[...])`. Keeps the bool query small regardless of how many configured asset types exist — the previous shape grew one clause per configured type. Issue: open-metadata/openmetadata-collate#3851 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-04-30T12:01:00Z

Code Review 👍 Approved with suggestions 4 resolved / 7 findings

Aligns dataAsset aggregation counts with index totals while resolving several test and null-pointer edge cases. Please refactor the redundant boilerplate in createTableWithMultiTokenColumns, remove the unused MULTI_TOKEN_SEED_COUNT constant, and ensure the glossary entity is properly cleaned up in tests.

💡 Quality: createTableWithMultiTokenColumns duplicates boilerplate from other helpers

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:764-778 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:759-762

The new createTableWithMultiTokenColumns method (lines 764-809) repeats ~30 lines of service/database/schema creation that are identical to createTableWithColumns, createTableWithNestedColumns, and createTableWithDeeplyNestedColumns. Per the project guidelines (no duplication, extract shared logic), consider extracting the common infra setup into a shared helper that accepts a List<Column> and returns a Table.

💡 Quality: Glossary test doesn't clean up created glossary entity

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:567-581

testUnconfiguredAssetTypeFallbackMatchesViaDataAsset creates a glossary via client.glossaries().create(req) but never deletes it. The other helpers use TestNamespace-scoped names that presumably get cleaned up, but this glossary is created ad-hoc. If the test framework doesn't auto-clean glossaries, this is a resource leak across test runs. Verify the cleanup strategy or add an @AfterEach / try-finally deletion.

💡 Quality: MULTI_TOKEN_SEED_COUNT constant is defined but never used

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:916

The constant MULTI_TOKEN_SEED_COUNT = 7 was added at line 916 but is never referenced anywhere in the file. The magic numbers 5 in the await conditions (e.g., >= 5 at lines 505 and 605) and 2 at line 396 appear to be the intended use sites but still use raw literals. Either use the constant or remove it.

Suggested fix

Either replace the magic numbers with the constant:
  .until(() -> totalHitsForIndex(client, tag, "tableColumn") >= MULTI_TOKEN_SEED_COUNT - 2);
or remove the unused constant if it was only added for documentation purposes.

✅ 4 resolved

✅ Bug: Sub-token over-match test is trivially true (missing first_id column)

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:461-475 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:790-804
The test testColumnQueryRequiresAllSubtokensToMatch asserts that querying <tag>_first_name must NOT return a hit for <tag>_first_id. However, createTableWithMultiTokenColumns never creates a column named <tag>_first_id — it only seeds first_name, last_name, and address. The assertFalse(hits.contains(firstIdColumn)) therefore passes trivially regardless of whether the Operator.And fix is in place, making this test unable to catch a regression.

Add a <tag>_first_id column to createTableWithMultiTokenColumns so the negative assertion is actually exercised against a real indexed document.

✅ Bug: NPE if getAssetTypeConfigurations() returns null

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java:493 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java:510
Both buildPerTypeUnionQueryV2 and buildCompositeAssetConfig (called from buildAllAssetsSearchBuilderV2) iterate searchSettings.getAssetTypeConfigurations() via an enhanced for-loop without a null guard. Other call sites in the codebase (e.g. SearchSettingsHandler:26) explicitly check for null before iterating. If the search settings are misconfigured or not yet initialized, this will throw a NullPointerException on every dataAsset/all query — a high-traffic path.

✅ Bug: Complex-syntax test waits for ≥5 columns but only 3 are created

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:639-642 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:916-930
In testComplexSyntaxQueriesKeepParity, the Awaitility poll waits for totalHitsForIndex(client, tag, "tableColumn") >= 5, but the setup calls createTableWithMultiTokenColumns which creates exactly 3 columns (tag_first_name, tag_last_name, tag_address). Since each test gets its own TestNamespace, no other test contributes columns with the same tag. The await condition can never be satisfied, so the test will always time out after 90 seconds and fail.

✅ Edge Case: Complex-syntax quoted-phrase query may match zero hits trivially

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:644-649 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:916-930
The first complex query is "tag alpha" (quoted phrase). None of the columns created by createTableWithMultiTokenColumns contain the word alpha — column names are tag_first_name, tag_last_name, tag_address. This means totalHitsForIndex and bucketCountFromDataAsset will both return 0, making the parity assertion trivially true. Similarly, tag AND alpha and (tag AND alpha) OR (tag AND bravo) will match 0 columns. Only tag OR alpha might produce a meaningful count. Consider adding a column whose name includes alpha (e.g. tag_alpha_metric) so these complex-syntax shapes exercise real matching.

🤖 Prompt for agents

Code Review: Aligns dataAsset aggregation counts with index totals while resolving several test and null-pointer edge cases. Please refactor the redundant boilerplate in `createTableWithMultiTokenColumns`, remove the unused `MULTI_TOKEN_SEED_COUNT` constant, and ensure the glossary entity is properly cleaned up in tests.

1. 💡 Quality: createTableWithMultiTokenColumns duplicates boilerplate from other helpers
   Files: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:764-778, openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:759-762

   The new `createTableWithMultiTokenColumns` method (lines 764-809) repeats ~30 lines of service/database/schema creation that are identical to `createTableWithColumns`, `createTableWithNestedColumns`, and `createTableWithDeeplyNestedColumns`. Per the project guidelines (no duplication, extract shared logic), consider extracting the common infra setup into a shared helper that accepts a `List<Column>` and returns a `Table`.

2. 💡 Quality: Glossary test doesn't clean up created glossary entity
   Files: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:567-581

   `testUnconfiguredAssetTypeFallbackMatchesViaDataAsset` creates a glossary via `client.glossaries().create(req)` but never deletes it. The other helpers use TestNamespace-scoped names that presumably get cleaned up, but this glossary is created ad-hoc. If the test framework doesn't auto-clean glossaries, this is a resource leak across test runs. Verify the cleanup strategy or add an `@AfterEach` / try-finally deletion.

3. 💡 Quality: MULTI_TOKEN_SEED_COUNT constant is defined but never used
   Files: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java:916

   The constant `MULTI_TOKEN_SEED_COUNT = 7` was added at line 916 but is never referenced anywhere in the file. The magic numbers `5` in the `await` conditions (e.g., `>= 5` at lines 505 and 605) and `2` at line 396 appear to be the intended use sites but still use raw literals. Either use the constant or remove it.

   Suggested fix:
   Either replace the magic numbers with the constant:
     .until(() -> totalHitsForIndex(client, tag, "tableColumn") >= MULTI_TOKEN_SEED_COUNT - 2);
   or remove the unused constant if it was only added for documentation purposes.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

sonarqubecloud · 2026-04-30T13:04:24Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-30T18:15:52Z

🔴 Playwright Results — 156 failure(s), 16 flaky

✅ 1494 passed · ❌ 156 failed · 🟡 16 flaky · ⏭️ 152 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	210	26	1	66
🔴 Shard 2	704	39	8	11
🔴 Shard 3	580	91	7	75

Genuine Failures (failed on all attempts)

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the ApiEndpoint Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Store Procedure Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Dashboard Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Pipeline Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the MlModel Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the DashboardDataModel Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Chart Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Directory Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the File Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Spreadsheet Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DataAssetRulesEnabled.spec.ts › Verify the Worksheet Entity Action items after rules is Enabled (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › Dashboard - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › Ml Model - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › Pipeline - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › Dashboard Data Model - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › Stored Procedure - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/CustomizeDetailPage.spec.ts › API Endpoint - customization should work (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/Dashboards.spec.ts › should be able to toggle between deleted and non-deleted charts (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/DescriptionSuggestion.spec.ts › should add and accept a requested topic schema field description (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/TagsSuggestion.spec.ts › should add and accept requested tags for a topic schema field (shard 1)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Flow/Metric.spec.ts › Metric creation flow should work (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Flow/Metric.spec.ts › Verify Metric Type Update (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Flow/Metric.spec.ts › Verify Unit of Measurement Update (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Flow/Metric.spec.ts › Verify Granularity Update (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Flow/Metric.spec.ts › verify metric expression update (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Flow/Metric.spec.ts › Verify Related Metrics Update (shard 1)

�[31mTest timeout of 180000ms exceeded while running "beforeEach" hook.�[39m

❌ Features/ChangeSummaryBadge.spec.ts › AI badge should appear on entity description with Suggested source (shard 2)

Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m200�[39m
Received: �[31m500�[39m

❌ Features/ChangeSummaryBadge.spec.ts › AI badge should appear on column description with Suggested source (shard 2)

�[31mTest timeout of 60000ms exceeded.�[39m

❌ Features/Container.spec.ts › expand / collapse should not appear after updating nested fields for container (shard 2)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/Container.spec.ts › Copy column link button should copy the column URL to clipboard (shard 2)

�[31mTest timeout of 180000ms exceeded.�[39m

... and 126 more failures

🟡 16 flaky test(s) (passed on retry)

Features/DataAssetRulesEnabled.spec.ts › Verify the Metric Entity Action items after rules is Enabled (shard 1, 2 retries)
Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
Features/ExploreQuickFilters.spec.ts › should search for multiple values along with null filters (shard 2, 1 retry)
Features/ExploreQuickFilters.spec.ts › tier with assigned asset appears in dropdown, tier without asset does not (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Flow/NestedChildrenUpdates.spec.ts › should update nested column description immediately without page refresh (shard 3, 2 retries)
Flow/PlatformLineage.spec.ts › Verify Platform Lineage View (shard 3, 2 retries)
Flow/ServiceDocPanel.spec.ts › should render headings not raw markdown (shard 3, 2 retries)
Flow/ServiceDocPanel.spec.ts › should render image in Mssql doc panel (shard 3, 1 retry)
Flow/ServiceDocPanel.spec.ts › should only ever have one section highlighted at a time (shard 3, 2 retries)
Flow/ServiceDocPanel.spec.ts › should load the correct doc file for the selected service type (shard 3, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings April 30, 2026 11:01

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 30, 2026

Copilot started reviewing on behalf of mohityadav766 April 30, 2026 11:01 View session

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread ...in/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java Outdated

Copilot AI reviewed Apr 30, 2026

View reviewed changes

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

mohityadav766 and others added 4 commits April 30, 2026 16:39

Merge remote-tracking branch 'origin/fix-column-agg' into fix-column-agg

201bc7f

remove local script

caf8580

Copilot AI review requested due to automatic review settings April 30, 2026 11:13

Copilot started reviewing on behalf of mohityadav766 April 30, 2026 11:13 View session

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ColumnSearchIndexIT.java

Copilot AI reviewed Apr 30, 2026

View reviewed changes

mohityadav766 and others added 3 commits April 30, 2026 16:52

Merge remote-tracking branch 'origin/fix-column-agg' into fix-column-agg

b3cff35

Merge branch 'main' into fix-column-agg

d81ecc3

Copilot AI review requested due to automatic review settings April 30, 2026 11:24

Copilot started reviewing on behalf of mohityadav766 April 30, 2026 11:25 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

gitar-bot Bot reviewed Apr 30, 2026

View reviewed changes

mohityadav766 had a problem deploying to test April 30, 2026 11:45 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings April 30, 2026 11:59

Copilot started reviewing on behalf of mohityadav766 April 30, 2026 11:59 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

mohityadav766 had a problem deploying to test April 30, 2026 12:10 — with GitHub Actions Error

mohityadav766 had a problem deploying to test April 30, 2026 12:10 — with GitHub Actions Failure

mohityadav766 had a problem deploying to test April 30, 2026 12:10 — with GitHub Actions Error

mohityadav766 had a problem deploying to test April 30, 2026 12:10 — with GitHub Actions Failure

mohityadav766 had a problem deploying to test April 30, 2026 12:10 — with GitHub Actions Error

-   * field. Without {@code And}, a query like {@code first_name} matches any column whose name
-   * contains just {@code first} or just {@code name}, which both inflates the column index hits
-   * and creates the dataAsset/tableColumn count mismatch tracked in github issue #3851.
+   * field. Previously this query used the same helper with {@code Operator.Or} and a last
+   * argument of {@code "0"}; with that OR-style query, a search like {@code first_name} matched
+   * any column whose name contained just {@code first} or just {@code name}, which inflated the
+   * column index hits and created the dataAsset/tableColumn count mismatch tracked in github
+   * issue #3851.

-   * mismatch tracked in github issue #3851.
+   * mismatch tracked in github issue #3851. The previous builder behavior here was equivalent to
+   * passing {@code fuzziness="0"} to {@code multiMatchQuery(..., tieBreaker, fuzziness)}, which
+   * disables fuzziness; this helper invocation does not set {@code minimum_should_match}.

		for (String configured : configuredTypes) {
		fallback.mustNot(ElasticQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));

	.until(() -> totalHitsForIndex(client, prefixQuery, "tableColumn") >= 5);
	.until(() -> totalHitsForIndex(client, prefixQuery, "tableColumn") >= 3);

		for (String configured : configuredTypes) {
		fallback.mustNot(OpenSearchQueryBuilder.termQuery(ENTITY_TYPE_FIELD, configured));

-     * into {@code [first, name]}; with the old {@code Operator.Or} + {@code min_should_match=0}
-     * column builder, a column called {@code <tag>_first_id} matched a query of {@code
-     * <tag>_first_name} because the single token {@code first} was enough. The fix moves the
-     * column builder to {@code Operator.And}, so every sub-token must match.
+     * into {@code [first, name]}; with the old {@code Operator.Or} column builder, the string
+     * {@code "0"} was passed as the fuzziness value rather than setting {@code
+     * minimum_should_match}, so a column called {@code <tag>_first_id} matched a query of
+     * {@code <tag>_first_name} because the single token {@code first} was enough. The fix moves
+     * the column builder to {@code Operator.And}, so every sub-token must match.

	.until(() -> totalHitsForIndex(client, tag, "tableColumn") >= 5);
	.until(() -> totalHitsForIndex(client, tag, "tableColumn") >= 3);

Conversation

mohityadav766 commented Apr 30, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction / verification

Test plan

Summary by Gitar

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitar-bot Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

mohityadav766 commented Apr 30, 2026 •

edited by gitar-bot Bot

Loading

gitar-bot Bot commented Apr 30, 2026 •

edited

Loading