Skip to content

feat: clickhouse staging-optimized#3927

Open
filipesilva wants to merge 4 commits into
dlt-hub:develfrom
filipesilva:ch-staging-optimized
Open

feat: clickhouse staging-optimized#3927
filipesilva wants to merge 4 commits into
dlt-hub:develfrom
filipesilva:ch-staging-optimized

Conversation

@filipesilva
Copy link
Copy Markdown

Description

Adds the staging-optimized replace strategy to the Clickhouse destination, using the EXCHANGE TABLES statement for atomic swaps.

Related Issues

Additional Context

Python is not my primary language, and I have used LLM agent assistance to produce this PR.

I have tested it locally with a local clickhouse, but wasn't able to test the *-staging-s3-* and *-staging-az-* tests because those seem to need CI credentials.

@zilto zilto added the destination Issue with a specific destination label May 8, 2026
@filipesilva
Copy link
Copy Markdown
Author

Heya @zilto is there anything I can do to help move this forward?

@rudolfix rudolfix self-assigned this Jun 3, 2026
Copy link
Copy Markdown
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipesilva this is pretty cool, I didn't notice you can swap tables in clickhouse. I think we should not truncate table in replace job. see my review. thanks!

staging_table_name = sql_client.make_qualified_table_name(table["name"])
table_name = sql_client.make_qualified_table_name(table["name"])
sql.append(f"EXCHANGE TABLES {staging_table_name} AND {table_name}")
sql.append(f"TRUNCATE TABLE {staging_table_name}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not truncate tables here. this makes that job non idempotent. if truncation fails the job will be retried and you'll exchange again and truncate the table with data. - dlt truncates staging dataset before the load. also user can do that with the sql client

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger, sounds good to me.

monkeypatch: pytest.MonkeyPatch,
) -> None:
"""Test ClickHouse atomic swap via EXCHANGE TABLES with sequential loads, nested tables, and empty resource."""
from dlt.destinations.sql_jobs import SqlStagingFollowupJob, SqlStagingReplaceFollowupJob
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import at the top

assert {int(r["id"]) for r in table_dicts["items"]} == {100, 101, 102}

# third load: schema evolution adds a new column, EXCHANGE must work after ALTER
@dlt.resource(name="items", write_disposition="replace", primary_key="id")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the core thing that this test is checking? otherwise - we have test that will check optimized replace for all destinations that enable it.

if so this is clickhouse specific test (we check if EXCHANGE works after ALTER so this tests clickhouse engine not dlt) - you can still keep it but plese move to load/pipeline/test_clickhouse.py

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's not testing anything clickhouse specific. Will remove. Thanks for the pointer!

@filipesilva
Copy link
Copy Markdown
Author

@rudolfix thanks for taking the time to review, the comments should be addressed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

destination Issue with a specific destination

Projects

None yet

Development

Successfully merging this pull request may close these issues.

staging-optimized strategy for Clickhouse

3 participants