Skip to content

fix(ingestion): skip mark-deleted for schemas whose table listing failed#27831

Open
vaishnavidixit12 wants to merge 2 commits into
open-metadata:mainfrom
vaishnavidixit12:fix/18187-mark-deleted-tables-skip-on-listing-failure
Open

fix(ingestion): skip mark-deleted for schemas whose table listing failed#27831
vaishnavidixit12 wants to merge 2 commits into
open-metadata:mainfrom
vaishnavidixit12:fix/18187-mark-deleted-tables-skip-on-listing-failure

Conversation

@vaishnavidixit12
Copy link
Copy Markdown

Why

Fixes #18187

When get_tables_name_and_type() raises an exception for a schema (e.g. a transient connectivity failure, permission error, or driver bug), the current code silently swallows the error and leaves database_source_state empty for that schema. On the next call to mark_tables_as_deleted(), OpenMetadata interprets the empty state as "all tables were deleted" and soft-deletes every table it knew about for that schema — silent data loss on every partial connectivity failure.

What changed

common_db_source.pyget_tables_name_and_type() exception handler
When an exception is caught, the method now builds the schema FQN and records it in self.schemas_with_table_listing_errors (inherited from DatabaseServiceSource).

database_service.pyDatabaseServiceSource

  • Added schemas_with_table_listing_errors: Set = set() class-level field.
  • mark_tables_as_deleted() now checks this set before calling delete_entity_from_source; schemas that failed listing are skipped with a WARNING log.

The fix is in the shared base classes, so all database connectors (BigQuery, Snowflake, Redshift, …) are protected automatically.

Test plan

  • test_mark_deleted_tables_safety.py — 8 new unit tests covering:
    • Failed schema is skipped while healthy schemas are still processed
    • All-failed case: delete_entity_from_source is never called
    • markDeletedTables=False still short-circuits everything
    • Listing error adds the schema FQN to the error set
    • Successful listing leaves the error set empty
    • None FQN (schema not found in OM) is not added to the error set

🤖 Generated with Claude Code

When get_tables_name_and_type() raises an exception (e.g. transient
connectivity failure), the schema's FQN is now recorded in
`schemas_with_table_listing_errors`. mark_tables_as_deleted() skips
those schemas instead of wiping all their OM tables — which was a
silent data-loss bug when partial connectivity caused an empty
database_source_state for that schema.

Fixes open-metadata#18187

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vaishnavidixit12 vaishnavidixit12 requested a review from a team as a code owner April 29, 2026 17:13
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/common_db_source.py Outdated
…tracking

Two issues raised in code review of open-metadata#27831:

1. schemas_with_table_listing_errors not instance-scoped: added
   self.schemas_with_table_listing_errors = set() in
   CommonDbSourceService.__init__ for consistency with
   database_source_state.

2. Connectors overriding get_tables_name_and_type bypassed the guard:
   extracted _record_schema_listing_error() helper into
   DatabaseServiceSource and called it from UnityCatalog, Deltalake,
   Glue, and Salesforce overrides so the safety guard is universal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Vaishnavi-Cailum
Copy link
Copy Markdown

Finding 1 — shared mutable class state: Added self.schemas_with_table_listing_errors = set() in CommonDbSourceService.init, making it instance-scoped like database_source_state.

Finding 2 — connectors bypassing the guard: Extracted _record_schema_listing_error(schema_name) helper into DatabaseServiceSource (builds the FQN, adds to the set), then wired it into:

  • common_db_source.py — simplified the existing except block
  • unitycatalog/metadata.py — wraps client.tables.list() call
  • deltalake/metadata.py — wraps client.get_table_info() call
  • glue/metadata.py — wraps _get_glue_tables() call
  • salesforce/metadata.py — adds the helper call to the existing except block

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 29, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Skips mark-deleted operations for schemas with failed table listings to prevent data loss. Addresses the missing safety guard for overridden table listings and the inconsistent state initialization.

✅ 2 resolved
Bug: Connectors overriding get_tables_name_and_type bypass the safety guard

📄 ingestion/src/metadata/ingestion/source/database/common_db_source.py:412-423
The PR description states "The fix is in the shared base classes, so all database connectors (BigQuery, Snowflake, Redshift, …) are protected automatically." However, several connectors override get_tables_name_and_type with their own exception handling and never populate schemas_with_table_listing_errors. These include at least UnityCatalog, Deltalake, Salesforce, and Glue. If table listing fails in any of these connectors, the original bug (all tables marked as deleted) will still occur.

For example, in UnityCatalog's override, if self.client.tables.list() throws at line 294 (before the per-table try/except), the exception propagates without recording the failed schema.

Consider either:

  1. Moving the error-tracking logic into the topology runner (closer to where get_tables_name_and_type is called), so it catches failures regardless of which subclass is running, or
  2. Adding the same schemas_with_table_listing_errors.add(...) pattern to each overriding connector's exception handler.
Quality: schemas_with_table_listing_errors not reset in __init__ unlike database_source_state

📄 ingestion/src/metadata/ingestion/source/database/database_service.py:221 📄 ingestion/src/metadata/ingestion/source/database/common_db_source.py:136
database_source_state is explicitly re-initialized as an instance attribute in CommonDbSourceService.__init__ (line 136), but the new schemas_with_table_listing_errors is not. It relies on the class-level mutable set() default, which means all instances of the same class share the same set object. In practice this is unlikely to cause issues since typically only one source instance exists per ingestion run, and the existing sets (stored_procedure_source_state, etc.) follow the same pattern. Still, for consistency with database_source_state, consider adding self.schemas_with_table_listing_errors = set() in __init__.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: intermittent issues with connection to database service cause unwanted table deletion

2 participants