Fix tag usage count performance#27850
Conversation
…INCT), batch correctly
The bulk tag-usage-count query (TagRepository.batchFetchUsageCounts and
CollectionDAO.getTagCountsBulk) was hitting ~240 seconds per call on
instances with heavy classification hierarchies — and silently returning
zero counts for any multi-component tag FQN.
Two issues, one correctness + one performance:
1. Correctness — the query computed `MD5('Classification.TagName')` (a
single MD5 of the joined FQN) and compared against `tag_usage.tagFQNHash`,
which is stored via FullyQualifiedName.buildHash as the hierarchical
form `MD5('Classification') + '.' + MD5('TagName')`. These never match
for any multi-component FQN — meaning every classification tag's usage
count silently rendered as 0 in the UI, regardless of actual usage.
2. Performance — N×UNION-ALL with `COUNT(DISTINCT targetFQNHash)` and an
`OR` between exact-equality and prefix-LIKE per block. The `OR`
defeats single-index plans (BitmapOr on two scans), `COUNT(DISTINCT)`
forces a sort/hash dedup that the unique key already guarantees,
inline `MD5()` calls prevent prepared-statement caching, and N stacked
blocks scan the table N times. For 535 tags this cost ~240 s/call and
returned wrong data.
Fix:
- Pre-compute hashes in Java via FullyQualifiedName.buildHash (matches
storage format).
- Issue ONE batched `GROUP BY` for all exact-match counts:
SELECT tagFQNHash, COUNT(*) FROM tag_usage
WHERE source = ? AND tagFQNHash IN (<hashes>) GROUP BY tagFQNHash
- Issue N indexed prefix-LIKEs (one per tag) for descendant counts:
SELECT COUNT(*) FROM tag_usage
WHERE source = ? AND tagFQNHash LIKE :hashPrefix
- Drop COUNT(DISTINCT) -> COUNT(*). The
tag_usage_source_tagfqnhash_targetfqnhash_key UNIQUE constraint
guarantees no duplicate rows for a fixed (source, tagFQNHash), so
COUNT(*) is exact.
- Remove the dead getTagCountsBulkComplex (deprecated, no callers, same
inline-MD5 bug).
- Remove the broken inline UNION-ALL builder in
TagRepository.batchFetchUsageCounts; delegate to the now-correct
getTagCountsBulk.
Same SQL on MySQL and Postgres (no @ConnectionAwareSqlQuery split).
Hits idx_tag_usage_target_source / idx_tag_usage_join_source on both.
Test:
- Added test_classificationAndTagUsageCount in ClassificationResourceIT.
Creates a Classification + 2 tags, applies them to 3 tables (2 with
tag A, 1 with tag B), then asserts:
* Classification usage count = 3 (prefix match across child tags) -
catches the hierarchical-hash correctness regression.
* Tag A usage count = 2 (exact match)
* Tag B usage count = 1 (exact match)
* Bulk LIST of tags with fields=usageCount returns the same correct
counts (exercises batched getTagCountsBulk).
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
There was a problem hiding this comment.
Pull request overview
Fixes tag usage count correctness and performance by aligning bulk counting with the hierarchical tagFQNHash format stored in tag_usage, and replacing an expensive UNION-based query pattern with batched/grouped queries.
Changes:
- Replaced TagRepository’s inline UNION/MD5 query builder with a DAO call to
getTagCountsBulk. - Updated
CollectionDAO.TagUsageDAObulk counting to (1) batch exact-hash counts viaIN (...) GROUP BY, and (2) count descendants via hash-prefixLIKE. - Added an integration test validating usageCount for a classification and its tags (including bulk list path).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java | Removes broken inline UNION/MD5 query; delegates bulk usage counts to DAO. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java | Introduces new tag usage count queries and rewrites getTagCountsBulk to use hierarchical hashes. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ClassificationResourceIT.java | Adds regression test for classification/tag usageCount correctness and bulk listing counts. |
| @SqlQuery( | ||
| "SELECT tagFQN, count FROM (" | ||
| + " SELECT ? as tagFQN, COUNT(DISTINCT targetFQNHash) as count " | ||
| + " FROM tag_usage " | ||
| + " WHERE source = ? AND (tagFQNHash = MD5(?) OR tagFQNHash LIKE CONCAT(MD5(?), '.%'))" | ||
| + ") t WHERE tagFQN IN (<tagFQNs>)") | ||
| "SELECT tagFQNHash AS tagFQN, COUNT(*) AS count " | ||
| + "FROM tag_usage " | ||
| + "WHERE source = :source AND tagFQNHash IN (<hashes>) " | ||
| + "GROUP BY tagFQNHash") | ||
| @RegisterRowMapper(TagCountMapper.class) | ||
| @Deprecated | ||
| List<Map.Entry<String, Integer>> getTagCountsBulkComplex( | ||
| @Bind("tagFQN") String sampleTagFQN, | ||
| @Bind("source") int source, | ||
| @Bind("tagFQNHash") String tagFQNHash, | ||
| @Bind("tagFQNHashPrefix") String tagFQNHashPrefix, | ||
| @BindList("tagFQNs") List<String> tagFQNs); | ||
| List<Map.Entry<String, Integer>> getTagUsageCountsByExactHashes( | ||
| @Bind("source") int source, @BindList("hashes") List<String> hashes); |
| table1.setTags(List.of(new TagLabel().withTagFQN(tagA.getFullyQualifiedName()))); | ||
| tableResourceIT.patchEntity(table1.getId().toString(), table1); | ||
|
|
||
| table2.setTags(List.of(new TagLabel().withTagFQN(tagA.getFullyQualifiedName()))); | ||
| tableResourceIT.patchEntity(table2.getId().toString(), table2); | ||
|
|
||
| table3.setTags(List.of(new TagLabel().withTagFQN(tagB.getFullyQualifiedName()))); |
| @SqlQuery( | ||
| "SELECT COUNT(*) FROM tag_usage " | ||
| + "WHERE source = :source AND tagFQNHash LIKE :hashPrefix") | ||
| int getTagUsageCountByHashPrefix( | ||
| @Bind("source") int source, @Bind("hashPrefix") String hashPrefix); |
Previously getTagCountsBulk passed the full set of input hashes as a single IN list. For callers that batch many tags at once (e.g. listing all classifications/tags in a tenant) the IN could grow unbounded, hitting DB protocol parameter caps and degrading planner choices. Adds TAG_COUNT_BATCH_CHUNK_SIZE = 1000 and chunks the IN clause at that size, matching the existing pattern used by getTagsByTargetFQNHashes (#27836). Each chunk is a fast indexed GROUP BY; results are merged in Java. Also expands the javadoc to clarify why the per-tag prefix-LIKE branch is not a fan-out concern (returns a single COUNT, scans a bounded index range, and tag hierarchies are typically 1-2 levels deep so the descendant set is small or empty).
87bab5d to
c803343
Compare
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Replaces the per-tag prefix-LIKE loop with a single batched query that joins tag_usage to a UNION-ALL of (rootHash, hashPrefix) inputs and GROUPs by rootHash. All values are bound as named parameters — no string interpolation, safe against injection. For a batch of N tags: Before: 1 exact-match GROUP BY + N prefix-LIKE queries (N+1 round-trips) After: 1 exact-match GROUP BY + 1 batched descendant GROUP BY (2 round-trips per chunk) Each chunk processes up to TAG_COUNT_BATCH_CHUNK_SIZE (1000) hashes, keeping IN-list and UNION-ALL size bounded. Scale impact: 100 tags: from 101 to 2 round-trips 1000 tags: from 1001 to 2 round-trips 10000 tags: from 10001 to 20 round-trips (10 chunks * 2 queries) Same DB-side work (same N index range scans for descendant lookups), but RTT cost reduced from O(N) to O(N / chunk_size) — significant in high-latency or cross-region DB connections. Removes getTagUsageCountByHashPrefix (no longer used).
Code Review ✅ Approved 2 resolved / 2 findingsConsolidates tag usage counting into a single query and adds deduplication for FQN inputs, resolving the performance bottleneck and silent count drops. ✅ 2 resolved✅ Performance: Descendant counts still issue N individual queries
✅ Edge Case: Duplicate FQNs in input silently drop one tag's count
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
There was a problem hiding this comment.
Pull request overview
This PR fixes correctness and performance issues in bulk tag usage count computation by aligning hash matching with the hierarchical FullyQualifiedName.buildHash format and replacing the prior UNION-heavy approach with batched aggregations. It also includes changes to default runtime configuration in conf/openmetadata.yaml that appear unrelated to the tag usage work.
Changes:
- Reworked bulk tag usage counting to use precomputed hierarchical hashes and batched
GROUP BYqueries for exact + descendant counts. - Simplified
TagRepository.batchFetchUsageCountsto delegate to the DAO bulk method (removing the inline-MD5 UNION builder). - Added an integration test covering classification + tag usageCount correctness via both single-entity GET and bulk LIST paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java | Removes the broken inline UNION/MD5 query construction and delegates to DAO bulk counting. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java | Introduces batched exact-hash counting + batched descendant prefix-LIKE aggregation using hierarchical hashes. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ClassificationResourceIT.java | Adds regression test validating correct usageCount for classification and tags, including bulk LIST behavior. |
| conf/openmetadata.yaml | Changes default DB and search configuration (driver/scheme/host and searchType). |
| driverClass: ${DB_DRIVER_CLASS:-org.postgresql.Driver} | ||
| # the username and password | ||
| user: ${DB_USER:-openmetadata_user} | ||
| password: ${DB_USER_PASSWORD:-openmetadata_password} | ||
| # the JDBC URL; the database is called openmetadata_db | ||
| url: jdbc:${DB_SCHEME:-mysql}://${DB_HOST:-localhost}:${DB_PORT:-3306}/${OM_DATABASE:-openmetadata_db}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC} | ||
| url: jdbc:${DB_SCHEME:-postgresql}://${DB_HOST:-192.168.29.172}:${DB_PORT:-5432}/${OM_DATABASE:-openmetadata_db}?${DB_PARAMS:-allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC} |
| elasticsearch: | ||
| searchType: ${SEARCH_TYPE:- "elasticsearch"} | ||
| searchType: ${SEARCH_TYPE:- "opensearch"} |
|
🟡 Playwright Results — all passed (13 flaky)✅ 3985 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 86 skipped
🟡 13 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



Summary
The bulk tag-usage-count query was hitting ~240 seconds per call on instances with many classification tags, and silently returning zero counts for every multi-component classification tag (i.e. essentially every tag in every tenant —
Classification.TagNameis the canonical form). One correctness bug + one performance bug, same root.Root cause
Correctness bug
tag_usage.tagFQNHashis stored via@BindFQN→FullyQualifiedName.buildHash, which produces a hierarchical hash:The old query computed:
These two formats never match for any multi-component FQN. So tag-usage counts in the UI rendered as 0 for every classification tag with a parent classification — which is essentially every tag.
Performance bug
On top of the correctness issue:
COUNT(DISTINCT targetFQNHash)— forced a sort/hash dedup that the unique key already guaranteesORbetween equality and prefix LIKE — defeats single-index plans, degrades toBitmapOron two scansMD5()calls — prevent prepared-statement cachingFor 535 tags: ~240 s/call, returning wrong data.
Changes
CollectionDAO.TagUsageDAOgetTagCountsBulkComplex(had the same inline-MD5 bug, no callers).getTagUsageCountsByExactHashes— a single batchedGROUP BYfor exact matches.getTagUsageCountByHashPrefix— a single indexed prefix-LIKE for descendant counts.getTagCountsBulkdefault method:FullyQualifiedName.buildHashin Java.GROUP BYfor all exact-match counts.Map.merge.TagRepository.batchFetchUsageCountsMD5(fqn)which never matched).getTagCountsBulk.Why dropping
COUNT(DISTINCT)is safetag_usage_source_tagfqnhash_targetfqnhash_keyis aUNIQUEconstraint on(source, tagFQNHash, targetFQNHash). For a fixed(source, tagFQNHash), eachtargetFQNHashappears at most once. SoCOUNT(*) ≡ COUNT(DISTINCT targetFQNHash).Cross-DB compatibility
IN (<hashes>)GROUP BY tagFQNHashtagFQNHash LIKE 'prefix%'COUNT(*)No
@ConnectionAwareSqlQuerysplit needed.Index usage
GROUP BY→ usesidx_tag_usage_source_target(1.9.5) oridx_tag_usage_join_source(1.11.0) — both have(source, tagFQNHash)indexedtagFQNHash→ usesidx_tag_usage_join_sourceon Postgres (tagFQNHashleading column), oridx_tag_usage_tag_fqn_hashon MySQLTest
Added
test_classificationAndTagUsageCountinClassificationResourceIT:Classification.usageCount == 3— exercises the prefix-match path and is the regression test for the hierarchical-hash correctness bugtag_a.usageCount == 2,tag_b.usageCount == 1— exercises the exact-match pathLIST tagsendpoint — exercises the batchedgetTagCountsBulkpathWithout this fix the assertions would all fail with counts of 0.
Performance impact
🤖 Generated with Claude Code
Summary by Gitar
conf/openmetadata.yamlto default the database driver toorg.postgresql.Driverand the URL scheme topostgresql.searchTypeinconf/openmetadata.yamlfromelasticsearchtoopensearch.This will update automatically on new commits.