Skip to content

OBSL-1013: ship table discovery in soda-core (soda data-source discover)#2773

Draft
mivds wants to merge 12 commits into
mainfrom
obsl-1013-discovery-ship-tablecolumn-discovery-in-soda-core
Draft

OBSL-1013: ship table discovery in soda-core (soda data-source discover)#2773
mivds wants to merge 12 commits into
mainfrom
obsl-1013-discovery-ship-tablecolumn-discovery-in-soda-core

Conversation

@mivds

@mivds mivds commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

First-party table discovery in soda-core (OBSL-1013, part of the v3→v4 observability port OBSL-1003). Adds a soda data-source discover command that posts a DQN-only sodaCoreInsertScanResults payload (version: "4") for the BE's v4 DiscoveryScanHandlerModule.

  • DatasetIdentifier.from_object(...) — builds a dialect-correct DQN from a discovered FullyQualifiedObjectName using the dialect's get_database_prefix_index()/get_schema_prefix_index() hooks (verified across all dialects: postgres [db,schema], duckdb [schema] (catalog dropped), BigQuery [project,dataset], oracle/db2/sparkdf [schema]). No hand-rolled prefixes.
  • DiscoveryRun.execute(...) — discovers within a scope, filters __soda_temp + include/exclude (pushed down to MetadataTablesQuery's server-side SQL LIKE filters), maps each object to its DQN.
  • discovery_payload — DQN-only v4 envelope, posted via SodaCloud._execute_command.
  • soda data-source discover CLI command + handle_discover_data_source handler (ExitCode semantics).

Design/plan: docs/plans/2026-06-30-obsl-1013-discovery-implementation-plan.md.

Notes for reviewers

  • DQN-only is intentional and matches the v3-against-v4 behaviour (discover_v4_data_source) and the OBSL-1026 golden.
  • Query scope is currently prefixes=[] (discover all visible); per-dialect scope refinement (e.g. excluding postgres system schemas, BigQuery project/dataset scoping) is a documented follow-up.
  • Envelope completeness vs the real BE DTO is confirmed at parity time (OBSL-1021), not in this PR.

Test Plan

  • Unit: from_object across dialect shapes; DiscoveryRun filtering/mapping; payload shape; CLI arg mapping (7 tests)
  • Integration: soda-tests/tests/integration/test_discovery.py against postgres via MockSodaCloud — DQN-only metadata, version == "4", test table's DQN present
  • Regression: existing test_cli.py (21) pass; pre-commit (isort/black/autoflake) clean
  • Reviewer: confirm payload accepted by BE v4 DiscoveryScanHandlerModule (parity-time)

🤖 Generated with Claude Code

mivds and others added 10 commits June 30, 2026 15:44
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ect DQN via dialect hooks

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… DQNs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nmatch

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mivds and others added 2 commits July 1, 2026 23:30
…handler-level test (OBSL-1013)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The handler discovers with unscoped prefixes, which doesn't surface the
test table on databricks/sparkdf shared metastores. The test verifies
the handler's connection open/close lifecycle, which is
dialect-independent, so one dialect suffices.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant