Implement SQLAlchemy reflection for the cc_sqlalchemy dialect by coby-astsec · Pull Request #766 · ClickHouse/clickhouse-connect

coby-astsec · 2026-05-27T21:19:35Z

Summary

cc_sqlalchemy doesn't implement SQLAlchemy reflection — MetaData.reflect() and Inspector.get_multi_columns() raise NotImplementedError on any CH database. This breaks tools like sqlacodegen or other tools which introspect schema.

Mechanism: SQLAlchemy's reflection path is Inspector -> Dialect.get_multi_columns -> Dialect.get_columns. The dialect only ever defined Inspector.get_columns (on ChInspector), which MetaData.reflect() never calls on its own. Reflection falls through to the DefaultDialect base, which raises.

This PR fills in the missing dialect method. No runtime client paths change.

Changes

dialect.py — add ClickHouseDialect.get_columns().
inspector.py — promote get_columns to a module-level function shared between dialect and inspector.

datatypes/sqltypes.py — concrete python_types. SQLAlchemy's TypeEngine.python_type contract is "return a class or raise NotImplementedError." Returning None (the current behavior on every UDT-based type) makes python_type.__module__ / .__name__ raise AttributeError, which is what breaks sqlacodegen:

Type	`python_type`
`UUID`	`uuid.UUID`
`IPv4` / `IPv6`	`ipaddress.IPv4Address` / `IPv6Address`
`Nothing`	`type(None)`
`Point`	`tuple`
`Ring` / `Polygon` / `MultiPolygon` / `LineString` / `MultiLineString`	`list`
`JSON`	`dict`
`Nested`	`list`
`(Simple)AggregateFunction`	`str`

datatypes/sqltypes.py — Array now subclasses sqlalchemy.types.ARRAY alongside ChSqlaType, exposes item_type as a plain instance attribute (so sqlacodegen's fix_column_types adaptation pass can reassign it), and sets dimensions = 1. ARRAY.__init__ is not called cooperatively because it rejects nested ARRAY item_types, which CH supports natively (Array(Array(T))).

Tests

tests/integration_tests/test_sqlalchemy/test_reflect.py covers the dialect path (MetaData.reflect()) and a direct Table(autoload_with=...) call against a MergeTree with ORDER BY (org_id, id), plus a user-declared composite primary key surviving reflection.

Local: SQLAlchemy tests pass.

Checklist

Unit and integration tests covering the common scenarios were added
CHANGELOG entry included
Docs - n/a

CLAassistant · 2026-05-27T21:19:51Z

All committers have signed the CLA.

joe-clickhouse

Hi @coby-astsec thanks for this! Appreciate the work. A few things:

Dialect-level get_columns and the shared module-level extraction are a good fix for MetaData.reflect() and get_multi_columns which would otherwise fail.
I like the concrete python_type compatibility fix as well.
Rebasing Array on sqlalchemy.types.ARRAY is useful too, but I think we need to set self.as_tuple = False in the definition. I put a comment in the code.

Then as far as the primary key stuff goes, I'm going to request that we cut it and we revert back to

    def get_primary_keys(self, connection, table_name, schema=None, **kw):
        return []

    def get_pk_constraint(self, connection, table_name, schema=None, **kw):
        return {"constrained_columns": [], "name": None}

The reason is because PRIMARY KEY in ClickHouse doesn't guarantee uniqueness even if specified. And if it's not specified it defaults from ORDER BY. The point is I think the identity assertion should come from application code, not from default dialect reflection.

So MetaData.reflect() & SQLAlchemy core will work fine without a PK once get_columns() is implemented. However, SQLAlchemy ORM does need a logical identity key and we're in luck because SQLAlchemy explicitly lets users provide one even if the database does not declare or enforce it. E.g.

events = Table(
    "events",
    metadata,
    Column("tenant_id", UInt64, primary_key=True),
    Column("event_id", UInt64, primary_key=True),
    autoload_with=engine,
)

or with ORM mapping:

class Event(Base):
    __table__ = events
    __mapper_args__ = {
        "primary_key": [events.c.tenant_id, events.c.event_id]
    }

Again, the point is I don't think the dialect should allow defining a sparse primary index as a safe ORM identity because downstream consumers are going to assume it is. But the app developer can explicitly say that it is if that's the case.

For the record, a lot of the other clients for db's without a PK concept follow this pattern as well, e.g. pydruid, pinotdb.

Happy to discuss further or help out if needed!

Per review on ClickHouse#766, the dialect no longer reflects a primary key. ClickHouse PRIMARY KEY / ORDER BY is a sparse index, not a uniqueness guarantee, so get_primary_keys / get_pk_constraint return empty results and the identity key is left for application code to declare. Removes the is_in_primary_key query and the PK-application path in reflect_table. Also set Array.as_tuple = False. Array bypasses ARRAY.__init__ to allow nested arrays, but as_tuple has no class-level default and ARRAY.hashable reads it, so select(arr).unique() raised AttributeError before.

coby-astsec · 2026-05-29T20:00:31Z

Thanks for the review,
Regarding the Primary Keys, I agree and have removed those changes, and concerning as_tuple, I implemented what you asked, but also what you tried to do wouldn't work either way - I added more detail about this as a reply to your comment.

Let me know if there's anything else!

joe-clickhouse · 2026-06-01T23:26:13Z

Thanks @coby-astsec! Looking good. Only things left I'd request:

The PR summary is now stale so let's please get that updated to reflect the actual changes
Optional, but I think it's worth adding a test to make sure a user-defined primary key survives reflection:

def test_user_declared_primary_key(test_engine: Engine, test_db: str):
    """A user-declared primary key on a pre-declared column survives reflection."""
    common.set_setting("invalid_setting_action", "drop")
    with test_engine.begin() as conn:
        conn.execute(text(f"DROP TABLE IF EXISTS {test_db}.reflect_pk_test"))
        conn.execute(
            text(
                f"CREATE TABLE {test_db}.reflect_pk_test (org_id UInt32, id UInt64, payload String) "
                "ENGINE MergeTree ORDER BY (org_id, id)"
            )
        )

    table = db.Table(
        "reflect_pk_test",
        db.MetaData(schema=test_db),
        db.Column("org_id", UInt32, primary_key=True),
        db.Column("id", db.BigInteger, primary_key=True),
        autoload_with=test_engine,
    )
    assert [c.name for c in table.primary_key.columns] == ["org_id", "id"]
    assert {c.name for c in table.columns} == {"org_id", "id", "payload"}

Rebase

Thanks!

The cc_sqlalchemy dialect did not support SQLAlchemy reflection (MetaData.reflect / Inspector multi-table reflection), which broke sqlacodegen and any tool that calls `dialect.get_columns()` directly. This change fills in the missing dialect methods so reflection works end-to-end against a ClickHouse server. Changes: 1. dialect.py: add `ClickHouseDialect.get_columns()`. Previously only `ChInspector.get_columns()` existed, but SQLAlchemy's reflection path goes through `Dialect.get_multi_columns` -> `Dialect.get_columns` and never touches `Inspector.get_columns` on its own. Without a dialect implementation, `MetaData.reflect()` raised `NotImplementedError` from the SQLAlchemy base class. `get_pk_constraint()` / `get_primary_keys()` now return the actual primary key columns derived from `system.columns.is_in_primary_key` (which mirrors MergeTree's ORDER BY / PRIMARY KEY) instead of empty lists. This lets sqlacodegen generate declarative classes instead of bare `Table(...)` definitions for any MergeTree table. 2. inspector.py: promote `get_columns` and `get_pk_constraint` to module-level functions so the dialect can call the same logic. `ChInspector.reflect_table()` now applies the PK constraint to reflected columns (it was building columns with no PK info, so even direct `Table('asset', md, autoload_with=engine)` reflection lost the primary key). 3. datatypes/sqltypes.py: replace `python_type = None` on UDT-based types with concrete Python types. SQLAlchemy's contract for `TypeEngine.python_type` is that it either returns a class or raises `NotImplementedError`; returning `None` makes any consumer that does `python_type.__module__` / `__name__` crash with `AttributeError: 'NoneType' object has no attribute '__module__'` (sqlacodegen, and anything else that walks python_type for annotations or metadata). - UUID -> uuid.UUID - IPv4 / IPv6 -> ipaddress.IPv4Address / IPv6Address - Nothing -> type(None) - Point -> tuple - Ring / Polygon -> list - LineString etc. -> list - JSON -> dict - Nested -> list - (Simple)AggregateFunction -> str 4. datatypes/sqltypes.py: `Array` now subclasses `sqlalchemy.types.ARRAY` (alongside `ChSqlaType`) and exposes `item_type` as a regular instance attribute plus `dimensions = 1`. Two effects: - `isinstance(col.type, sqlalchemy.types.ARRAY)` now matches CH arrays, which lets sqlacodegen render `Mapped[list[T]]` annotations for single-dim arrays without special-casing. - `item_type` is mutable so sqlacodegen's `fix_column_types` adaptation pass (which reassigns `new_coltype.item_type`) works. `dimensions = 1` reflects CH's type system: every Array is one-dimensional and nested arrays (`Array(Array(String))`) are represented via the inner item type, not via a dimension count. Tests: - tests/integration_tests/test_sqlalchemy/test_reflect.py: `test_metadata_reflect_and_primary_keys` exercises the `Dialect.get_columns` reflection path via `MetaData.reflect()` and asserts composite primary key reflection from a MergeTree ORDER BY clause, both via `MetaData.reflect()` and via direct `Table(autoload_with=...)`. End-to-end effect: `MetaData.reflect()` and `sqlacodegen <clickhousedb+connect://...>` now produce a complete, importable Python module with declarative ORM classes, composite primary keys, and typed `Mapped[...]` annotations against a real ClickHouse schema. No changes to the runtime client paths.

Per review on ClickHouse#766, the dialect no longer reflects a primary key. ClickHouse PRIMARY KEY / ORDER BY is a sparse index, not a uniqueness guarantee, so get_primary_keys / get_pk_constraint return empty results and the identity key is left for application code to declare. Removes the is_in_primary_key query and the PK-application path in reflect_table. Also set Array.as_tuple = False. Array bypasses ARRAY.__init__ to allow nested arrays, but as_tuple has no class-level default and ARRAY.hashable reads it, so select(arr).unique() raised AttributeError before.

coby-astsec · 2026-06-04T09:58:06Z

@joe-clickhouse all done, sorry for the delay 😄

joe-clickhouse · 2026-06-05T17:00:09Z

@copilot resolve the merge conflicts in this pull request

Signed-off-by: Joe Spadola <joe.spadola@clickhouse.com>

joe-clickhouse

Looks good! Thanks for the contribution @coby-astsec

coby-astsec requested review from joe-clickhouse and peter-leonov-ch as code owners May 27, 2026 21:19

joe-clickhouse reviewed May 28, 2026

View reviewed changes

coby-astsec force-pushed the fix/sqlalchemy-reflection branch from 2864d0b to e72554a Compare May 29, 2026 19:43

coby-astsec force-pushed the fix/sqlalchemy-reflection branch from e72554a to a6cf515 Compare May 29, 2026 19:46

coby-astsec added 2 commits June 4, 2026 12:54

coby-astsec force-pushed the fix/sqlalchemy-reflection branch from a6cf515 to 3d83412 Compare June 4, 2026 09:55

Add test for user-declared primary key surviving reflection

78fd6aa

Merge branch 'main' into fix/sqlalchemy-reflection

eede904

Signed-off-by: Joe Spadola <joe.spadola@clickhouse.com>

joe-clickhouse approved these changes Jun 5, 2026

View reviewed changes

joe-clickhouse merged commit e8c5284 into ClickHouse:main Jun 5, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SQLAlchemy reflection for the cc_sqlalchemy dialect#766

Implement SQLAlchemy reflection for the cc_sqlalchemy dialect#766
joe-clickhouse merged 4 commits into
ClickHouse:mainfrom
coby-astsec:fix/sqlalchemy-reflection

coby-astsec commented May 27, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 27, 2026 •

edited

Loading

Uh oh!

joe-clickhouse left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coby-astsec commented May 29, 2026 •

edited

Loading

Uh oh!

joe-clickhouse commented Jun 1, 2026

Uh oh!

coby-astsec commented Jun 4, 2026

Uh oh!

joe-clickhouse commented Jun 5, 2026

Uh oh!

joe-clickhouse left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

coby-astsec commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Checklist

Uh oh!

CLAassistant commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joe-clickhouse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coby-astsec commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joe-clickhouse commented Jun 1, 2026

Uh oh!

coby-astsec commented Jun 4, 2026

Uh oh!

joe-clickhouse commented Jun 5, 2026

Uh oh!

joe-clickhouse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coby-astsec commented May 27, 2026 •

edited

Loading

CLAassistant commented May 27, 2026 •

edited

Loading

coby-astsec commented May 29, 2026 •

edited

Loading