Add db-tune ops subcommand + production RDS runbook by harshach · Pull Request #27890 · open-metadata/OpenMetadata

harshach · 2026-05-04T18:55:51Z

Summary

New `./openmetadata-ops.sh db-tune` subcommand: read-only tuning report by default, with `--apply` to execute and `--analyze` to refresh planner stats. Postgres autovacuum reloptions; MySQL InnoDB persistent-stats reloptions. Pure heuristic + formatter is unit-tested; live-DB flow is covered by `DbTuneIT` against the IT bootstrap's Postgres / MySQL Testcontainer.

Sample input / output

Report mode (default — no DB writes)

```
$ ./openmetadata-ops.sh db-tune

Database engine: PostgreSQL 17.2

These cannot be applied by this tool — change them in your DB parameter group / RDS console.

Next steps:
./openmetadata-ops.sh db-tune --apply --analyze # apply + refresh planner stats
./openmetadata-ops.sh db-tune --apply # apply only; run analyze-tables later
```

Apply mode (interactive prompt)

```
$ ./openmetadata-ops.sh db-tune --apply

[ ... same report as above ... ]

About to apply 5 ALTER statements:
ALTER TABLE "entity_relationship" SET (autovacuum_analyze_scale_factor = 0.005, autovacuum_vacuum_cost_limit = 4000, autovacuum_vacuum_scale_factor = 0.01);
ALTER TABLE "tag_usage" SET (autovacuum_analyze_scale_factor = 0.005, autovacuum_vacuum_cost_delay = 0, autovacuum_vacuum_cost_limit = 4000, autovacuum_vacuum_scale_factor = 0.01);
ALTER TABLE "storage_container_entity" SET (autovacuum_analyze_scale_factor = 0.01, autovacuum_vacuum_scale_factor = 0.02);
ALTER TABLE "table_entity" SET (autovacuum_analyze_scale_factor = 0.01, autovacuum_vacuum_scale_factor = 0.02);
ALTER TABLE "change_event" SET (autovacuum_analyze_scale_factor = 0.1, autovacuum_vacuum_scale_factor = 0.2);

Apply now? [y/N]: y

Apply + refresh planner stats

```
$ ./openmetadata-ops.sh db-tune --apply --yes --analyze

[ ... report ... ]

+--------------------------+---------+--------+--------------------+
| Table | Action | Status | Details |
+--------------------------+---------+--------+--------------------+
| entity_relationship | APPLY | OK | Applied + analyzed |
| tag_usage | TIGHTEN | OK | Applied + analyzed |
| storage_container_entity | APPLY | OK | Applied + analyzed |
| ... | ... | ... | ... |
+--------------------------+---------+--------+--------------------+
```

MySQL output

```
$ ./openmetadata-ops.sh db-tune

Database engine: MySQL 8.0.36

=== Server-level parameter compliance ===
+--------------------------------------+----------+-----------------------+-------------+
| Parameter | Current | Recommended | Status |
+--------------------------------------+----------+-----------------------+-------------+
| innodb_buffer_pool_size | 1.0 GB | 40-60% of RAM | UNTUNED |
| innodb_io_capacity | 200 | 2000 | UNDERSIZED |
| innodb_io_capacity_max | 2000 | 4000 | UNDERSIZED |
| innodb_stats_persistent_sample_pages | 20 | 64 | UNDERSIZED |
| ... | ... | ... | ... |
+--------------------------------------+----------+-----------------------+-------------+

=== Per-table recommendations (25 tables) ===
+--------------------------+-----------+--------+-----------+--------------------------+--------+----------------------------------+
| Table | Rows | Size | Current | Recommended | Action | Reason |
+--------------------------+-----------+--------+-----------+--------------------------+--------+----------------------------------+
| storage_container_entity | 580,123 | 1.5 GB | (default) | STATS_AUTO_RECALC=1, | APPLY | Large entity table; bump InnoDB |
| | | | | STATS_PERSISTENT=1, | | stats sampling |
| | | | | STATS_SAMPLE_PAGES=64 | | |
| ... | ... | ... | ... | ... | ... | ... |
+--------------------------+-----------+--------+-----------+--------------------------+--------+----------------------------------+
```

The `--apply` SQL on MySQL uses `ALTER TABLE \`name\` STATS_PERSISTENT=1, STATS_AUTO_RECALC=1, STATS_SAMPLE_PAGES=64` (no parens, no equals-spaces — MySQL syntax).

Architecture

```
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/
├── Action.java # APPLY / TIGHTEN / RELAX / OK / SKIP
├── TableStats.java # input record (rows, bytes, current settings)
├── TableRecommendation.java # output record (action, current, recommended, reason)
├── ServerParamCheck.java # per-parameter compliance row
├── DbTuneResult.java # bundle: engine, version, params, recommendations
├── AutoTuner.java # interface: analyze / recommend / apply / analyzeOne
├── PostgresTuningCatalog.java # tier map: HOT_RELATIONSHIP, HOT_TAG_USAGE, ENTITY_LARGE, ENTITY_SERVICE, APPEND_ONLY
├── PostgresAutoTuner.java # autovacuum reloptions + ANALYZE
├── MysqlTuningCatalog.java # tier map: HOT, ENTITY_LARGE, ENTITY_SERVICE
├── MysqlAutoTuner.java # InnoDB STATS_* + ANALYZE TABLE
└── DbTuneReport.java # ASCII renderer + ALTER-statement preview
```

`recommend(stats)` is a pure function — input is observed table state, output is the action. Tested without a database. `apply()` and `analyzeOne()` are the I/O methods, tested in `DbTuneIT` against a live Testcontainer.

Heuristic

Tier	Tables	Recommended (Postgres)	Recommended (MySQL)	Row threshold
HOT	entity_relationship, tag_usage	scale=0.005/0.01, cost_limit=4000 (+cost_delay=0 for tag_usage)	STATS_SAMPLE_PAGES=100	0
ENTITY_LARGE	22 entity tables	scale=0.01/0.02	STATS_SAMPLE_PAGES=64	10,000
ENTITY_SERVICE	database_entity, database_schema_entity	scale=0.02/0.05	STATS_SAMPLE_PAGES=32	5,000
APPEND_ONLY	change_event	scale=0.1/0.2 (RELAX)	(n/a — InnoDB has no relax knob)	50,000

Tests

38 unit tests, all green: `PostgresAutoTunerTest` (20), `MysqlAutoTunerTest` (11), `DbTuneReportTest` (7).
`DbTuneIT` runs against the IT bootstrap's Testcontainer (Postgres by default; `-DdatabaseType=mysql` for MySQL): asserts `analyze()` returns recommendations, `apply()` writes reloptions and is idempotent, `analyzeOne()` runs without error, dry-run analyze does not mutate settings. Resets the modified table options in `finally`.

Test plan

CI: unit tests pass (`mvn test -Dtest='PostgresAutoTunerTest,MysqlAutoTunerTest,DbTuneReportTest'`)
CI: `DbTuneIT` passes against the Postgres profile
CI: `DbTuneIT` passes against the MySQL profile (`-DdatabaseType=mysql`)
Local: `./openmetadata-ops.sh db-tune` against a populated dev DB — verify report shape + sane recommendations
Local: `./openmetadata-ops.sh db-tune --apply --yes` followed by a second `db-tune` shows everything as OK (idempotency)
Local: `./openmetadata-ops.sh db-tune --apply --yes --analyze` completes without error and refreshes planner stats

🤖 Generated with Claude Code

Summary by Gitar

Added diagnostic ops:
- Added --diagnose flag to db-tune for read-only DBA reports (unused indexes, bloat, slow queries).
- Implemented PostgresDiagnostic and MysqlDiagnostic using sys, performance_schema, and pg_stat_* views.
Diagnostic reporting architecture:
- Introduced DbTuneDiagnosis and DiagnosticCategory to support structured reporting and predictable table layouts.
- Added DiagnosticReportTest for logic coverage of findings grouping and report rendering.
Enhanced IT coverage:
- Added diagnoseCompletesWithoutError to DbTuneIT to verify end-to-end execution against live Testcontainers.

_{This will update automatically on new commits.}

Generate a per-table autovacuum (Postgres) / InnoDB stats (MySQL) tuning report from the live database; --apply executes the ALTER statements after operator confirmation; --analyze refreshes planner stats on the changed tables. Heuristic skips small dev tables (row-count gated) and recognizes already-tightened settings as OK. Catalog values come from the 600k-container production audit. The companion runbook in docs/rds-postgres-production-runbook.md captures the parameter-group settings, extensions, and the indexes that must be applied by hand because the migration runner can't dedup cross-version. Tests: 38 unit tests on the heuristic + report formatter; DbTuneIT covers analyze / apply / analyzeOne / dry-run against the IT bootstrap's live Postgres or MySQL Testcontainer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds an operator-facing db-tune ops subcommand plus a production RDS Postgres runbook, backed by a new dbtune Java utility package that can analyze table stats, produce tuning recommendations, and (optionally) apply per-table settings and refresh planner stats for Postgres/MySQL.

Changes:

Introduces org.openmetadata.service.util.dbtune (catalogs, engine-specific tuners, report renderer, result models).
Wires a new db-tune subcommand into OpenMetadataOperations with --apply/--yes/--analyze.
Adds unit tests for heuristic/report formatting and an integration test (DbTuneIT), plus an RDS Postgres production runbook.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java	Adds `db-tune` CLI command, prompts, and apply/analyze execution flow
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/Action.java	Defines actionable vs non-actionable recommendation actions
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/AutoTuner.java	Defines the engine-specific tuner contract (analyze/recommend/apply/analyzeOne/sql builder)
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/DbTuneReport.java	Renders ASCII report output and ALTER statement previews
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/DbTuneResult.java	Bundles engine/version + server param checks + per-table recommendations
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/MysqlAutoTuner.java	Implements MySQL stats/options inspection, recommendations, and ALTER/ANALYZE generation
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/MysqlTuningCatalog.java	Maps MySQL tables to stats sampling “profiles” and thresholds
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/PostgresAutoTuner.java	Implements Postgres reloptions inspection, recommendations, and ALTER/ANALYZE generation
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/PostgresTuningCatalog.java	Maps Postgres tables to autovacuum “profiles” and thresholds
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/ServerParamCheck.java	Models server-level parameter compliance rows
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/TableRecommendation.java	Models per-table recommendation outputs
openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/TableStats.java	Models per-table observed stats/settings inputs
openmetadata-service/src/test/java/org/openmetadata/service/util/dbtune/DbTuneReportTest.java	Unit tests for report formatting helpers and sections
openmetadata-service/src/test/java/org/openmetadata/service/util/dbtune/MysqlAutoTunerTest.java	Unit tests for MySQL heuristic decisions, parsing, and SQL formatting
openmetadata-service/src/test/java/org/openmetadata/service/util/dbtune/PostgresAutoTunerTest.java	Unit tests for Postgres heuristic decisions, parsing, quoting, and SQL formatting
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DbTuneIT.java	Integration test exercising analyze/apply/analyze-one against Testcontainers DB
docs/rds-postgres-production-runbook.md	Consolidated ops runbook for RDS Postgres parameter groups, extensions, table overrides, and manual indexes

harshach · 2026-05-04T19:19:52Z

+/**
+ * End-to-end tests for {@link AutoTuner} against the live Testcontainers database. The bootstrap
+ * runs every migration up to the current version, so the tracked entity tables (e.g.
+ * {@code storage_container_entity}) exist; we exercise the analyze → apply → analyze-one path
+ * against the real schema and reset the modified reloptions / table options at the end.
+ *
+ * <p>Sequential because each test mutates table-level reloptions on shared production tables;
+ * parallel execution would race between read-stats and apply.
+ */


Fixed in a719d47: added @execution(ExecutionMode.SAME_THREAD) on the class so the tests don't race on shared reloptions when the suite runs in parallel.

github-actions · 2026-05-04T21:31:09Z

🔴 Playwright Results — 2 failure(s), 11 flaky

✅ 3991 passed · ❌ 2 failed · 🟡 11 flaky · ⏭️ 86 skipped

Shard	Passed	Failed	Flaky	Skipped
🟡 Shard 1	298	0	1	4
🔴 Shard 2	744	2	4	8
🟡 Shard 3	753	0	2	7
🟡 Shard 4	774	0	1	18
🟡 Shard 5	686	0	1	41
🟡 Shard 6	736	0	2	8

Genuine Failures (failed on all attempts)

❌ Features/Container.spec.ts › Container page children pagination (shard 2)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="DraftTerm1778032067151"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="DraftTerm1778032067151"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'a551e0a7.Merrya9c086c9".DraftTerm1778032067151-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

🟡 11 flaky test(s) (passed on retry)

Pages/Bots.spec.ts › Bots Page should work properly (shard 1, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/DataProductPersonaCustomization.spec.ts › Data Product - customize tab label should only render if it's customized by user (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/UserProfileOnlineStatus.spec.ts › Should show "Active recently" for users active within last hour (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Teams.spec.ts › Teams Page Flow (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

…/match disambiguation, script path - ServerParamCheck.STATUS_UNDERSIZED -> STATUS_MISMATCH. Some recommended values are deliberately lower than engine defaults (random_page_cost 1.1 vs 4.0, autovacuum scale factors). Labelling those as "UNDERSIZED" was wrong. Added a regression test that pins random_page_cost as MISMATCH when the current value is higher than recommended. - dbTune --apply: distinguish "no tracked tables on this DB" from "all tables already match" in the Nothing-to-apply log line. Mirrors the same disambiguation already in DbTuneReport.appendNextSteps. - DbTuneReport next-steps now points at ./bootstrap/openmetadata-ops.sh (the actual location of the wrapper script) instead of the bare ./openmetadata-ops.sh which would be file-not-found from the repo root. - DbTuneIT analyzeReturnsRecommendationsForKnownTables: assertion message now references TEST_TABLE so it stays correct after the switch from storage_container_entity to entity_relationship. Tests: 40 unit tests passing (+1 for the higher-than-recommended MISMATCH case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

harshach · 2026-05-05T17:38:56Z

Round 2 review (4230257768) — addressed all four findings in f5a945a.

Finding	Fix
`--apply` empty == "already matches" is wrong when 0 tracked tables exist	Branched: logs "no tracked tables exist on this database" vs "already matches" so the message matches what `DbTuneReport.appendNextSteps` already does.
Next-step snippet uses `./openmetadata-ops.sh` (wrong path from repo root)	Changed to `./bootstrap/openmetadata-ops.sh` to match the actual script location.
Stale assertion message in `DbTuneIT` after switching `TEST_TABLE`	Now interpolates `TEST_TABLE` so the message stays accurate if the constant moves again.
`STATUS_UNDERSIZED` is misleading when recommended is lower than default (e.g. `random_page_cost = 1.1` vs default `4.0`)	Renamed to `STATUS_MISMATCH` (direction-agnostic) with a Javadoc note explaining why. New regression test `buildServerCheck_currentHigherThanRecommendedIsAlsoMismatch` pins the `random_page_cost(current=4.0, recommended=1.1) → MISMATCH` case.

Unit tests: 40 passing (was 39, +1 for the new direction-agnostic mismatch case).

🤖 Generated with Claude Code

The earlier IT had applyChangesReloptionsAndIsIdempotent + analyzeOne running ALTER TABLE / ANALYZE TABLE against entity_relationship — a real catalog table that the rest of the suite writes to. On MySQL this broke ~17 downstream ITs (GlossaryResourceIT, GlossaryTermRelationsIT, DomainResourceIT, LineageBrokenReferenceIT, OpenLineageLineageResolutionIT) with "Cannot create a JSON value from a string with CHARACTER SET 'binary'" on inserts into entity_relationship.json. Root cause: ALTER TABLE bumps MySQL's per-table metadata version, which invalidates JDBC prepared-statement caches across the whole shared Testcontainer. After re-prepare, the JSON column metadata sometimes comes back as VARBINARY in Connector/J's cache, breaking subsequent JSON inserts. Confirmed via CI log timing: DbTuneIT finished at 17:58:12, GlossaryTermRelationsIT failed starting 17:58:53. The recommendations themselves are correct — STATS_PERSISTENT=1, STATS_AUTO_RECALC=1, STATS_SAMPLE_PAGES=100 are safe and modern InnoDB defaults except for SAMPLE_PAGES. The test just cannot afford the side effect on a shared DB. Fix: introduce dbtune_it_isolated_table created/dropped per test in @BeforeEach/@AfterEach, build a TableRecommendation for it directly (bypassing the static catalog), and run apply + analyzeOne against it. The read-only tests (analyze, dry-run) continue to use the real catalog since they don't mutate anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-05T23:21:18Z

+    jdbi.useHandle(handle -> tuner.apply(handle, rec));
+
+    String built = tuner.buildAlterStatement(rec);
+    assertTrue(built.contains(ISOLATED_TABLE), "ALTER target table mismatch: " + built);
+
+    // Apply twice — second invocation must complete without throwing.
+    jdbi.useHandle(handle -> tuner.apply(handle, rec));


💡 Quality: applyExecutesAndIsIdempotent no longer verifies DB state

The old test used assertSettingsMatch to confirm settings were actually persisted in the database after apply(). The new test only verifies that buildAlterStatement returns a string containing the table name and that apply() doesn't throw. While the rationale for using an isolated table is solid (avoiding MySQL metadata invalidation), the test could still read back the settings from the isolated table to confirm they took effect — there's no risk of side-effects on shared tables since ISOLATED_TABLE is private to this test.

This would restore the end-to-end confidence that apply() actually mutates the DB, rather than just not crashing.

Suggested fix:

// After the first apply, verify the settings were actually written: Map<String, String> after = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE); assertSettingsMatch(rec.recommendedSettings(), after); // Apply twice — must be idempotent jdbi.useHandle(handle -> tuner.apply(handle, rec)); Map<String, String> afterSecond = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE); assertEquals(after, afterSecond, "Apply should be idempotent");

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

+
+    jdbi.useHandle(handle -> tuner.apply(handle, rec));
+
+    String built = tuner.buildAlterStatement(rec);
+    assertTrue(built.contains(ISOLATED_TABLE), "ALTER target table mismatch: " + built);
+
+    // Apply twice — second invocation must complete without throwing.
+    jdbi.useHandle(handle -> tuner.apply(handle, rec));
+  }
+


+ *   <li>Read observed table stats and current parameter-group settings from the database.
+ *   <li>Compute a recommended table-level reloption set per table (pure logic — see
+ *       {@link #recommend(TableStats)}).
+ *   <li>Apply the recommendations via {@code ALTER TABLE ... SET (...)} when the operator opts in.


+ * the planner picks the right index against multi-GB JSONB heaps. {@code STATS_PERSISTENT=1} +
+ * {@code STATS_AUTO_RECALC=1} are the modern InnoDB defaults; we assert them explicitly so a
+ * tenant with stale my.cnf overrides converges.


+  @Command(
+      name = "db-tune",
+      description =
+          "Generate a per-table autovacuum / InnoDB stats tuning report and optionally apply it. "
+              + "Default mode is read-only — pass --apply to execute the ALTER TABLE statements "
+              + "and --analyze to refresh planner stats on changed tables.")
+  public Integer dbTune(


The tuning report we already had is a static playbook (table → recipe + row-count gate). Operators rightly asked whether db-tune actually inspects the live database for *its own* signals — unused indexes, bloat, slow queries, cache hit ratios — and surfaces recommendations from those measurements. Until now: no. This adds a parallel read-only path: ./bootstrap/openmetadata-ops.sh db-tune --diagnose Postgres categories (each isolated in its own try block; missing extension or permissions surfaces in DbTuneDiagnosis.notes rather than failing the run): - UNUSED_INDEX — pg_stat_user_indexes idx_scan=0, size > 10 MB, non-unique non-pkey - HIGH_DEAD_TUPLES — n_dead_tup/n_live_tup > 0.2 with n_live_tup > 10k (autovacuum falling behind) - LOW_CACHE_HIT — heap_blks_read > 1000 AND hit ratio < 90% - STALE_STATS — last_autoanalyze NULL or > 14 days, n_live_tup > 1000 - SEQ_SCAN_HEAVY — seq_scan/idx_scan > 10 with > 1000 seq scans (suggests missing index) - SLOW_QUERY — pg_stat_statements top 10 by mean_exec_time, calls > 100 (gracefully skipped if extension absent) MySQL categories: - UNUSED_INDEX — sys.schema_unused_indexes filtered to current schema - LOW_BUFFER_POOL_HIT — Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests < 99% - SLOW_QUERY — performance_schema.events_statements_summary_by_digest top 10 by avg_timer_wait - FULL_TABLE_SCAN — sys.statements_with_full_table_scans Concept stays separate from AutoTuner: Diagnostic is read-only and never participates in --apply. Categories with zero findings are suppressed in the report; notes capture what couldn't be checked. Each finding is structured as (category, severity, attributes) with the attribute keys driven by DiagnosticCategory.columns() so the renderer dispatches a category-specific layout. Tests: 50 unit tests passing (40 → 50, +10 in DiagnosticReportTest covering EnumMap grouping order, empty/non-empty rendering, suppression, notes appending, query truncation, and column-list immutability). DbTuneIT gains diagnoseCompletesWithoutErrorAndReturnsStructuredResult that exercises the full end-to-end against the live Testcontainer (read-only, safe). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

harshach · 2026-05-05T23:38:41Z

Added `--diagnose` in 2d9f047. The tuning report stays static (curated playbook of known-hot tables); the new diagnostic path is the data-driven half — it reads the live database for unused indexes, bloat, slow queries, cache hit, etc., and surfaces what it finds.

Usage

```
./bootstrap/openmetadata-ops.sh db-tune # tuning report only (static)
./bootstrap/openmetadata-ops.sh db-tune --diagnose # tuning + read-only diagnostic
./bootstrap/openmetadata-ops.sh db-tune --apply --analyze --diagnose # all of the above
```

`--diagnose` is purely read-only. It never participates in `--apply` — diagnostic findings (e.g. "this index has 0 scans and is 1.2 GB") require human review before any DROP, so we deliberately don't auto-act on them.

Postgres categories

Category	Threshold	Source
UNUSED_INDEX	`idx_scan = 0` AND size > 10 MB, non-unique non-pkey	`pg_stat_user_indexes`
HIGH_DEAD_TUPLES	`n_dead_tup/n_live_tup > 0.2` with `n_live_tup > 10k`	`pg_stat_user_tables`
LOW_CACHE_HIT	reads > 1000 AND hit ratio < 90%	`pg_statio_user_tables`
STALE_STATS	last_autoanalyze NULL or > 14 days	`pg_stat_user_tables`
SEQ_SCAN_HEAVY	`seq_scan/idx_scan > 10` with > 1000 seq scans	`pg_stat_user_tables`
SLOW_QUERY	top 10 by `mean_exec_time`, calls > 100	`pg_stat_statements` (note if absent)

MySQL categories

Category	Threshold	Source
UNUSED_INDEX	scans=0 in current schema	`sys.schema_unused_indexes`
LOW_BUFFER_POOL_HIT	hit ratio < 99%	`performance_schema.global_status`
SLOW_QUERY	top 10 by `avg_timer_wait` in current schema	`performance_schema.events_statements_summary_by_digest`
FULL_TABLE_SCAN	top 10 by `exec_count`	`sys.statements_with_full_table_scans`

Sample output (Postgres)

```
=== Diagnostic findings ===

Unused indexes (3 found):
Indexes with zero scans since last stats reset; candidates for DROP after a usage review.
+--------------------------+-----------------------------------+--------+--------+
| table | index | size | scans |
+--------------------------+-----------------------------------+--------+--------+
| tag_usage | idx_tag_usage_target_source | 3.2 GB | 0 |
| tag_usage | idx_tag_usage_target_exact | 3.3 GB | 0 |
| tag_usage | idx_tag_usage_join_source | 3.6 GB | 0 |
+--------------------------+-----------------------------------+--------+--------+

Tables with low cache hit ratio (1 found):
Heap reads exceed 1000 with hit ratio < 90%; suggests undersized buffers or hot seq scans.
+-----------+-------------+-------------+---------+
| table | heap_reads | heap_hits | hit_pct |
+-----------+-------------+-------------+---------+
| tag_usage | 292,973,695 | 950,528,269 | 76.44% |
+-----------+-------------+-------------+---------+

Top slowest queries (10 found):
From pg_stat_statements / events_statements_summary_by_digest. Truncated to 100 chars.
+----------------------------------------------------------------------------------------------------+--------+----------+
| query | calls | mean_ms |
+----------------------------------------------------------------------------------------------------+--------+----------+
| SELECT ... FROM container_entity ce LEFT JOIN entity_relationship er ON ... WHERE er.toId IS NULL | 4823 | 2845.7 |
| ... | ... | ... |
+----------------------------------------------------------------------------------------------------+--------+----------+

Notes:

(none — all checks succeeded)
```

The first three rows mirror exactly the slack-thread audit: 9 of 12 `tag_usage` indexes had 0 scans and consumed ~24 GB combined. `db-tune --diagnose` would have flagged that automatically.

Architecture

```
dbtune/
├── Diagnostic.java # interface: DbTuneDiagnosis diagnose(Handle)
├── DbTuneDiagnosis.java # record(findings, notes); EnumMap-grouped accessor
├── DiagnosticCategory.java # enum with title, description, column layout per category
├── Finding.java # record(category, severity, attributes Map)
├── Severity.java # INFO, WARN
├── PostgresDiagnostic.java # 6 categories, each in its own try block for graceful degrade
└── MysqlDiagnostic.java # 4 categories
```

`Diagnostic` is intentionally separate from `AutoTuner` — diagnostics are read-only by definition and shouldn't be conflated with the apply path. Categories with zero findings are suppressed in the report; `notes` lists what couldn't be checked (missing extension, view permissions, etc.).

Tests

50 unit tests passing (40 → 50, +10 in `DiagnosticReportTest`): EnumMap grouping order, empty-vs-populated render, category section suppression, notes appending, query truncation, and column-list immutability.
DbTuneIT gains `diagnoseCompletesWithoutErrorAndReturnsStructuredResult` — exercises the full diagnose path end-to-end against the live Testcontainer. Read-only, no shared-table side effects.

🤖 Generated with Claude Code

gitar-bot · 2026-05-05T23:39:59Z

+  List<Finding> seqScanHeavy(final Handle handle) {
+    return handle
+        .createQuery(
+            "SELECT relname AS table_name, seq_scan, idx_scan "
+                + "FROM pg_stat_user_tables "
+                + "WHERE seq_scan > :min_seq "
+                + "  AND seq_scan::numeric / NULLIF(idx_scan, 0) > :ratio "
+                + "ORDER BY seq_scan DESC "
+                + "LIMIT 25")
+        .bind("min_seq", SEQ_SCAN_MIN)
+        .bind("ratio", SEQ_SCAN_RATIO)
+        .map(
+            (rs, ctx) ->
+                new Finding(
+                    DiagnosticCategory.SEQ_SCAN_HEAVY,


💡 Edge Case: seqScanHeavy excludes worst-case tables (idx_scan=0)

The SQL WHERE clause uses seq_scan::numeric / NULLIF(idx_scan, 0) > :ratio. When idx_scan = 0, NULLIF returns NULL making the whole comparison NULL (falsy), so those rows are excluded from results. Ironically, tables with many sequential scans and zero index scans are the most concerning candidates for a missing index, yet they'll never appear in findings.

The Java mapper at line 204 has a dead-code branch handling idx_scan == 0 (returns "∞") that can never execute given the query filter.

Suggested fix:

Change the WHERE clause to handle idx_scan=0 explicitly: "WHERE seq_scan > :min_seq " + " AND (idx_scan = 0 OR seq_scan::numeric / idx_scan > :ratio) "

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

gitar-bot · 2026-05-05T23:40:05Z

Code Review 👍 Approved with suggestions 1 resolved / 3 findings

Adds a db-tune ops subcommand for database performance diagnostics and tuning, resolving an NPE in the catch block. Verify the DB state persistence in applyExecutesAndIsIdempotent and consider including tables with zero index scans in the seqScanHeavy analysis.

💡 Quality: applyExecutesAndIsIdempotent no longer verifies DB state

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DbTuneIT.java:117-123

The old test used assertSettingsMatch to confirm settings were actually persisted in the database after apply(). The new test only verifies that buildAlterStatement returns a string containing the table name and that apply() doesn't throw. While the rationale for using an isolated table is solid (avoiding MySQL metadata invalidation), the test could still read back the settings from the isolated table to confirm they took effect — there's no risk of side-effects on shared tables since ISOLATED_TABLE is private to this test.

This would restore the end-to-end confidence that apply() actually mutates the DB, rather than just not crashing.

Suggested fix

// After the first apply, verify the settings were actually written:
Map<String, String> after = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE);
assertSettingsMatch(rec.recommendedSettings(), after);

// Apply twice — must be idempotent
jdbi.useHandle(handle -> tuner.apply(handle, rec));
Map<String, String> afterSecond = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE);
assertEquals(after, afterSecond, "Apply should be idempotent");

💡 Edge Case: seqScanHeavy excludes worst-case tables (idx_scan=0)

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/PostgresDiagnostic.java:183-197

The SQL WHERE clause uses seq_scan::numeric / NULLIF(idx_scan, 0) > :ratio. When idx_scan = 0, NULLIF returns NULL making the whole comparison NULL (falsy), so those rows are excluded from results. Ironically, tables with many sequential scans and zero index scans are the most concerning candidates for a missing index, yet they'll never appear in findings.

The Java mapper at line 204 has a dead-code branch handling idx_scan == 0 (returns "∞") that can never execute given the query filter.

Suggested fix

Change the WHERE clause to handle idx_scan=0 explicitly:

"WHERE seq_scan > :min_seq "
+ "  AND (idx_scan = 0 OR seq_scan::numeric / idx_scan > :ratio) "

✅ 1 resolved

✅ Edge Case: NPE in catch block when exception message is null

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java:2570
List.of() does not accept null elements. If the caught exception has a null message (e.g., a raw NullPointerException or IllegalStateException without a message string), e.getMessage() returns null and List.of(...) throws a secondary NullPointerException, swallowing the original error and producing an unhelpful stack trace.

🤖 Prompt for agents

Code Review: Adds a db-tune ops subcommand for database performance diagnostics and tuning, resolving an NPE in the catch block. Verify the DB state persistence in `applyExecutesAndIsIdempotent` and consider including tables with zero index scans in the `seqScanHeavy` analysis.

1. 💡 Quality: applyExecutesAndIsIdempotent no longer verifies DB state
   Files: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DbTuneIT.java:117-123

   The old test used `assertSettingsMatch` to confirm settings were actually persisted in the database after `apply()`. The new test only verifies that `buildAlterStatement` returns a string containing the table name and that `apply()` doesn't throw. While the rationale for using an isolated table is solid (avoiding MySQL metadata invalidation), the test could still read back the settings from the isolated table to confirm they took effect — there's no risk of side-effects on shared tables since `ISOLATED_TABLE` is private to this test.
   
   This would restore the end-to-end confidence that `apply()` actually mutates the DB, rather than just not crashing.

   Suggested fix:
   // After the first apply, verify the settings were actually written:
   Map<String, String> after = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE);
   assertSettingsMatch(rec.recommendedSettings(), after);
   
   // Apply twice — must be idempotent
   jdbi.useHandle(handle -> tuner.apply(handle, rec));
   Map<String, String> afterSecond = currentSettingsFor(tuner, jdbi, ISOLATED_TABLE);
   assertEquals(after, afterSecond, "Apply should be idempotent");

2. 💡 Edge Case: seqScanHeavy excludes worst-case tables (idx_scan=0)
   Files: openmetadata-service/src/main/java/org/openmetadata/service/util/dbtune/PostgresDiagnostic.java:183-197

   The SQL WHERE clause uses `seq_scan::numeric / NULLIF(idx_scan, 0) > :ratio`. When `idx_scan = 0`, `NULLIF` returns NULL making the whole comparison NULL (falsy), so those rows are excluded from results. Ironically, tables with many sequential scans and *zero* index scans are the most concerning candidates for a missing index, yet they'll never appear in findings.
   
   The Java mapper at line 204 has a dead-code branch handling `idx_scan == 0` (returns "∞") that can never execute given the query filter.

   Suggested fix:
   Change the WHERE clause to handle idx_scan=0 explicitly:
   
   "WHERE seq_scan > :min_seq "
   + "  AND (idx_scan = 0 OR seq_scan::numeric / idx_scan > :ratio) "

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-06T00:42:19Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings May 4, 2026 18:55

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 4, 2026

Copilot started reviewing on behalf of harshach May 4, 2026 18:56 View session

gitar-bot Bot reviewed May 4, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java Outdated

Copilot AI reviewed May 4, 2026

View reviewed changes

harshach had a problem deploying to test May 4, 2026 19:06 — with GitHub Actions Error

Remove file

a719d47

harshach temporarily deployed to test May 4, 2026 19:18 — with GitHub Actions Inactive

Merge branch 'main' into harshach/rds-perf-doc

8b3c3e2

Copilot AI review requested due to automatic review settings May 5, 2026 17:22

Copilot started reviewing on behalf of harshach May 5, 2026 17:23 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

harshach had a problem deploying to test May 5, 2026 17:34 — with GitHub Actions Error

harshach temporarily deployed to test May 5, 2026 17:49 — with GitHub Actions Inactive

harshach had a problem deploying to test May 5, 2026 17:49 — with GitHub Actions Failure

harshach temporarily deployed to test May 5, 2026 17:49 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings May 5, 2026 23:19

Copilot started reviewing on behalf of harshach May 5, 2026 23:20 View session

gitar-bot Bot reviewed May 5, 2026

View reviewed changes

Copilot AI reviewed May 5, 2026

View reviewed changes

harshach had a problem deploying to test May 5, 2026 23:31 — with GitHub Actions Error

gitar-bot Bot reviewed May 5, 2026

View reviewed changes

harshach temporarily deployed to test May 5, 2026 23:49 — with GitHub Actions Inactive

harshach had a problem deploying to test May 5, 2026 23:49 — with GitHub Actions Failure

harshach temporarily deployed to test May 5, 2026 23:49 — with GitHub Actions Inactive

Conversation

harshach commented May 4, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Sample input / output

Report mode (default — no DB writes)

Apply mode (interactive prompt)

Apply + refresh planner stats

MySQL output

Architecture

Heuristic

Tests

Test plan

Summary by Gitar

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

harshach May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 2 failure(s), 11 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshach commented May 5, 2026

Uh oh!

gitar-bot Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

harshach commented May 5, 2026

Usage

Postgres categories

MySQL categories

Sample output (Postgres)

Architecture

Tests

Uh oh!

gitar-bot Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 6, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshach commented May 4, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

gitar-bot Bot commented May 5, 2026 •

edited

Loading