Skip to content

Fix tag usage count performance#27850

Open
sonika-shah wants to merge 3 commits intomainfrom
fix-tag-usage-count-batch-query
Open

Fix tag usage count performance#27850
sonika-shah wants to merge 3 commits intomainfrom
fix-tag-usage-count-batch-query

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Apr 30, 2026

Summary

The bulk tag-usage-count query was hitting ~240 seconds per call on instances with many classification tags, and silently returning zero counts for every multi-component classification tag (i.e. essentially every tag in every tenant — Classification.TagName is the canonical form). One correctness bug + one performance bug, same root.

Root cause

Correctness bug

tag_usage.tagFQNHash is stored via @BindFQNFullyQualifiedName.buildHash, which produces a hierarchical hash:

buildHash(\"Classification.TagName\") = MD5(\"Classification\") + \".\" + MD5(\"TagName\")
                                     ≈ \"5fae21….8a2c1e…\"   (~65 chars)

The old query computed:

WHERE tagFQNHash = MD5('Classification.TagName')   -- a SINGLE 32-char MD5 of the dotted string
   OR tagFQNHash LIKE CONCAT(MD5('Classification.TagName'), '.%')

These two formats never match for any multi-component FQN. So tag-usage counts in the UI rendered as 0 for every classification tag with a parent classification — which is essentially every tag.

Performance bug

On top of the correctness issue:

  • N × UNION ALL blocks — one block per tag, scanned the table N times
  • COUNT(DISTINCT targetFQNHash) — forced a sort/hash dedup that the unique key already guarantees
  • OR between equality and prefix LIKE — defeats single-index plans, degrades to BitmapOr on two scans
  • Inline MD5() calls — prevent prepared-statement caching

For 535 tags: ~240 s/call, returning wrong data.

Changes

CollectionDAO.TagUsageDAO

  • Removed deprecated getTagCountsBulkComplex (had the same inline-MD5 bug, no callers).
  • Added getTagUsageCountsByExactHashes — a single batched GROUP BY for exact matches.
  • Added getTagUsageCountByHashPrefix — a single indexed prefix-LIKE for descendant counts.
  • Rewrote getTagCountsBulk default method:
    1. Pre-computes hierarchical hashes via FullyQualifiedName.buildHash in Java.
    2. Issues 1 batched GROUP BY for all exact-match counts.
    3. Issues N fast indexed prefix-LIKEs for descendants.
    4. Sums exact + descendants per tag via Map.merge.

TagRepository.batchFetchUsageCounts

  • Removed the broken inline UNION-ALL builder (was using inline MD5(fqn) which never matched).
  • Now a one-liner delegating to getTagCountsBulk.

Why dropping COUNT(DISTINCT) is safe

tag_usage_source_tagfqnhash_targetfqnhash_key is a UNIQUE constraint on (source, tagFQNHash, targetFQNHash). For a fixed (source, tagFQNHash), each targetFQNHash appears at most once. So COUNT(*) ≡ COUNT(DISTINCT targetFQNHash).

Cross-DB compatibility

Construct MySQL Postgres
IN (<hashes>)
GROUP BY tagFQNHash
tagFQNHash LIKE 'prefix%'
COUNT(*)

No @ConnectionAwareSqlQuery split needed.

Index usage

  • Exact-match GROUP BY → uses idx_tag_usage_source_target (1.9.5) or idx_tag_usage_join_source (1.11.0) — both have (source, tagFQNHash) indexed
  • Prefix LIKE on tagFQNHash → uses idx_tag_usage_join_source on Postgres (tagFQNHash leading column), or idx_tag_usage_tag_fqn_hash on MySQL

Test

Added test_classificationAndTagUsageCount in ClassificationResourceIT:

  • Creates a Classification + 2 tags
  • Applies tag A to 2 tables, tag B to 1 table
  • Asserts Classification.usageCount == 3 — exercises the prefix-match path and is the regression test for the hierarchical-hash correctness bug
  • Asserts tag_a.usageCount == 2, tag_b.usageCount == 1 — exercises the exact-match path
  • Asserts the same counts via the bulk LIST tags endpoint — exercises the batched getTagCountsBulk path

Without this fix the assertions would all fail with counts of 0.

Performance impact

Before After
Per-call latency for ~500 tags ~240 s tens of ms
Counts returned in UI wrong (0) correct
Total CPU during DI window bounded contributor (~5–10 min) negligible

🤖 Generated with Claude Code


Summary by Gitar

  • Configuration changes:
    • Updated conf/openmetadata.yaml to default the database driver to org.postgresql.Driver and the URL scheme to postgresql.
    • Changed the default searchType in conf/openmetadata.yaml from elasticsearch to opensearch.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 30, 2026 17:51
…INCT), batch correctly

The bulk tag-usage-count query (TagRepository.batchFetchUsageCounts and
CollectionDAO.getTagCountsBulk) was hitting ~240 seconds per call on
instances with heavy classification hierarchies — and silently returning
zero counts for any multi-component tag FQN.

Two issues, one correctness + one performance:

1. Correctness — the query computed `MD5('Classification.TagName')` (a
   single MD5 of the joined FQN) and compared against `tag_usage.tagFQNHash`,
   which is stored via FullyQualifiedName.buildHash as the hierarchical
   form `MD5('Classification') + '.' + MD5('TagName')`. These never match
   for any multi-component FQN — meaning every classification tag's usage
   count silently rendered as 0 in the UI, regardless of actual usage.

2. Performance — N×UNION-ALL with `COUNT(DISTINCT targetFQNHash)` and an
   `OR` between exact-equality and prefix-LIKE per block. The `OR`
   defeats single-index plans (BitmapOr on two scans), `COUNT(DISTINCT)`
   forces a sort/hash dedup that the unique key already guarantees,
   inline `MD5()` calls prevent prepared-statement caching, and N stacked
   blocks scan the table N times. For 535 tags this cost ~240 s/call and
   returned wrong data.

Fix:
- Pre-compute hashes in Java via FullyQualifiedName.buildHash (matches
  storage format).
- Issue ONE batched `GROUP BY` for all exact-match counts:
    SELECT tagFQNHash, COUNT(*) FROM tag_usage
    WHERE source = ? AND tagFQNHash IN (<hashes>) GROUP BY tagFQNHash
- Issue N indexed prefix-LIKEs (one per tag) for descendant counts:
    SELECT COUNT(*) FROM tag_usage
    WHERE source = ? AND tagFQNHash LIKE :hashPrefix
- Drop COUNT(DISTINCT) -> COUNT(*). The
  tag_usage_source_tagfqnhash_targetfqnhash_key UNIQUE constraint
  guarantees no duplicate rows for a fixed (source, tagFQNHash), so
  COUNT(*) is exact.
- Remove the dead getTagCountsBulkComplex (deprecated, no callers, same
  inline-MD5 bug).
- Remove the broken inline UNION-ALL builder in
  TagRepository.batchFetchUsageCounts; delegate to the now-correct
  getTagCountsBulk.

Same SQL on MySQL and Postgres (no @ConnectionAwareSqlQuery split).
Hits idx_tag_usage_target_source / idx_tag_usage_join_source on both.

Test:
- Added test_classificationAndTagUsageCount in ClassificationResourceIT.
  Creates a Classification + 2 tags, applies them to 3 tables (2 with
  tag A, 1 with tag B), then asserts:
  * Classification usage count = 3 (prefix match across child tags) -
    catches the hierarchical-hash correctness regression.
  * Tag A usage count = 2 (exact match)
  * Tag B usage count = 1 (exact match)
  * Bulk LIST of tags with fields=usageCount returns the same correct
    counts (exercises batched getTagCountsBulk).
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes tag usage count correctness and performance by aligning bulk counting with the hierarchical tagFQNHash format stored in tag_usage, and replacing an expensive UNION-based query pattern with batched/grouped queries.

Changes:

  • Replaced TagRepository’s inline UNION/MD5 query builder with a DAO call to getTagCountsBulk.
  • Updated CollectionDAO.TagUsageDAO bulk counting to (1) batch exact-hash counts via IN (...) GROUP BY, and (2) count descendants via hash-prefix LIKE.
  • Added an integration test validating usageCount for a classification and its tags (including bulk list path).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java Removes broken inline UNION/MD5 query; delegates bulk usage counts to DAO.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Introduces new tag usage count queries and rewrites getTagCountsBulk to use hierarchical hashes.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ClassificationResourceIT.java Adds regression test for classification/tag usageCount correctness and bulk listing counts.

Comment on lines 6566 to +6573
@SqlQuery(
"SELECT tagFQN, count FROM ("
+ " SELECT ? as tagFQN, COUNT(DISTINCT targetFQNHash) as count "
+ " FROM tag_usage "
+ " WHERE source = ? AND (tagFQNHash = MD5(?) OR tagFQNHash LIKE CONCAT(MD5(?), '.%'))"
+ ") t WHERE tagFQN IN (<tagFQNs>)")
"SELECT tagFQNHash AS tagFQN, COUNT(*) AS count "
+ "FROM tag_usage "
+ "WHERE source = :source AND tagFQNHash IN (<hashes>) "
+ "GROUP BY tagFQNHash")
@RegisterRowMapper(TagCountMapper.class)
@Deprecated
List<Map.Entry<String, Integer>> getTagCountsBulkComplex(
@Bind("tagFQN") String sampleTagFQN,
@Bind("source") int source,
@Bind("tagFQNHash") String tagFQNHash,
@Bind("tagFQNHashPrefix") String tagFQNHashPrefix,
@BindList("tagFQNs") List<String> tagFQNs);
List<Map.Entry<String, Integer>> getTagUsageCountsByExactHashes(
@Bind("source") int source, @BindList("hashes") List<String> hashes);
Comment on lines +385 to +391
table1.setTags(List.of(new TagLabel().withTagFQN(tagA.getFullyQualifiedName())));
tableResourceIT.patchEntity(table1.getId().toString(), table1);

table2.setTags(List.of(new TagLabel().withTagFQN(tagA.getFullyQualifiedName())));
tableResourceIT.patchEntity(table2.getId().toString(), table2);

table3.setTags(List.of(new TagLabel().withTagFQN(tagB.getFullyQualifiedName())));
Comment on lines +6575 to +6579
@SqlQuery(
"SELECT COUNT(*) FROM tag_usage "
+ "WHERE source = :source AND tagFQNHash LIKE :hashPrefix")
int getTagUsageCountByHashPrefix(
@Bind("source") int source, @Bind("hashPrefix") String hashPrefix);
Previously getTagCountsBulk passed the full set of input hashes as a single
IN list. For callers that batch many tags at once (e.g. listing all
classifications/tags in a tenant) the IN could grow unbounded, hitting
DB protocol parameter caps and degrading planner choices.

Adds TAG_COUNT_BATCH_CHUNK_SIZE = 1000 and chunks the IN clause at that
size, matching the existing pattern used by getTagsByTargetFQNHashes
(#27836). Each chunk is a fast indexed GROUP BY; results are merged in
Java.

Also expands the javadoc to clarify why the per-tag prefix-LIKE branch is
not a fan-out concern (returns a single COUNT, scans a bounded index
range, and tag hierarchies are typically 1-2 levels deep so the
descendant set is small or empty).
@sonika-shah sonika-shah force-pushed the fix-tag-usage-count-batch-query branch from 87bab5d to c803343 Compare April 30, 2026 18:05
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Replaces the per-tag prefix-LIKE loop with a single batched query that
joins tag_usage to a UNION-ALL of (rootHash, hashPrefix) inputs and
GROUPs by rootHash. All values are bound as named parameters — no string
interpolation, safe against injection.

For a batch of N tags:
  Before: 1 exact-match GROUP BY + N prefix-LIKE queries (N+1 round-trips)
  After:  1 exact-match GROUP BY + 1 batched descendant GROUP BY (2 round-trips per chunk)

Each chunk processes up to TAG_COUNT_BATCH_CHUNK_SIZE (1000) hashes,
keeping IN-list and UNION-ALL size bounded.

Scale impact:
  100 tags:   from 101 to 2 round-trips
  1000 tags:  from 1001 to 2 round-trips
  10000 tags: from 10001 to 20 round-trips (10 chunks * 2 queries)

Same DB-side work (same N index range scans for descendant lookups), but
RTT cost reduced from O(N) to O(N / chunk_size) — significant in
high-latency or cross-region DB connections.

Removes getTagUsageCountByHashPrefix (no longer used).
Copilot AI review requested due to automatic review settings April 30, 2026 18:26
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 30, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Consolidates tag usage counting into a single query and adds deduplication for FQN inputs, resolving the performance bottleneck and silent count drops.

✅ 2 resolved
Performance: Descendant counts still issue N individual queries

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:6610-6615
The new getTagCountsBulk issues one getTagUsageCountByHashPrefix query per tag (lines 6610-6615). For a classification with 500 tags, this is 500 sequential round-trips. While each individual query is fast with an index, the cumulative latency from network round-trips can still add up.

This is a massive improvement over the 240s baseline and is likely acceptable for now, but could be further optimized if it becomes a bottleneck again.

Edge Case: Duplicate FQNs in input silently drop one tag's count

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:6592-6600
If tagFQNs contains duplicate entries (same FQN appears twice), fqnByHash.put() at line 6594 overwrites the first mapping since the hash is identical. Meanwhile, result at lines 6598-6600 creates separate entries (though with the same key, so HashMap also deduplicates). This isn't a bug per se since duplicates shouldn't occur in normal usage, but the behavior is worth noting — the method is silently idempotent for duplicate inputs, which is arguably correct.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@sonika-shah sonika-shah changed the title Fix tag usage count: silent zero-count bug and 240s query Fix tag usage count: silent zero-count bug Apr 30, 2026
@sonika-shah sonika-shah changed the title Fix tag usage count: silent zero-count bug Fix tag usage count performance Apr 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes correctness and performance issues in bulk tag usage count computation by aligning hash matching with the hierarchical FullyQualifiedName.buildHash format and replacing the prior UNION-heavy approach with batched aggregations. It also includes changes to default runtime configuration in conf/openmetadata.yaml that appear unrelated to the tag usage work.

Changes:

  • Reworked bulk tag usage counting to use precomputed hierarchical hashes and batched GROUP BY queries for exact + descendant counts.
  • Simplified TagRepository.batchFetchUsageCounts to delegate to the DAO bulk method (removing the inline-MD5 UNION builder).
  • Added an integration test covering classification + tag usageCount correctness via both single-entity GET and bulk LIST paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java Removes the broken inline UNION/MD5 query construction and delegates to DAO bulk counting.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Introduces batched exact-hash counting + batched descendant prefix-LIKE aggregation using hierarchical hashes.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ClassificationResourceIT.java Adds regression test validating correct usageCount for classification and tags, including bulk LIST behavior.
conf/openmetadata.yaml Changes default DB and search configuration (driver/scheme/host and searchType).

Comment thread conf/openmetadata.yaml
Comment on lines +273 to +278
driverClass: ${DB_DRIVER_CLASS:-org.postgresql.Driver}
# the username and password
user: ${DB_USER:-openmetadata_user}
password: ${DB_USER_PASSWORD:-openmetadata_password}
# the JDBC URL; the database is called openmetadata_db
url: jdbc:${DB_SCHEME:-mysql}://${DB_HOST:-localhost}:${DB_PORT:-3306}/${OM_DATABASE:-openmetadata_db}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC}
url: jdbc:${DB_SCHEME:-postgresql}://${DB_HOST:-192.168.29.172}:${DB_PORT:-5432}/${OM_DATABASE:-openmetadata_db}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC}
Comment thread conf/openmetadata.yaml
Comment on lines 476 to +477
elasticsearch:
searchType: ${SEARCH_TYPE:- "elasticsearch"}
searchType: ${SEARCH_TYPE:- "opensearch"}
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (13 flaky)

✅ 3985 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 749 0 5 8
🟡 Shard 3 744 0 2 7
🟡 Shard 4 773 0 2 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 734 0 3 8
🟡 13 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DataProductRenameConsolidation.spec.ts › Rename then change owner - assets should be preserved (shard 2, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Values To Be Between (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 2 retries)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Reset Password for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants