fix(clickhouse): extend MV TO-clause regex to cover POPULATE, UUID, and three-part names (#26265)#27832
Conversation
… clause (open-metadata#26265) Signed-off-by: Abhay Singh <abhaysingh0293@gmail.com>
Signed-off-by: Abhay Singh <abhaysingh0293@gmail.com>
…nd three-part names Fixes gaps in MATERIALIZED_VIEW_TO_PATTERN missed by PR open-metadata#27553: - POPULATE keyword before TO (valid ClickHouse DDL form) - UUID clause emitted by system.tables DDL output - Three-part qualified names (catalog.schema.table) - FROM keyword after target (bare FROM without AS keyword) Adds four regression tests for each new case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… annotation
Tuple from typing was shadowed by Tuple = create_sqlalchemy_type("Tuple")
on line 81. Using the built-in tuple[str, str] (Python 3.10+) removes the
ambiguity entirely and is safe under from __future__ import annotations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| def _strip_quotes(identifier: str) -> str: | ||
| return ".".join(part.strip('`"') for part in identifier.split(".")) |
There was a problem hiding this comment.
💡 Edge Case: _strip_quotes mishandles quoted identifiers containing dots
The _strip_quotes function splits on . then strips quotes from each part. If a backtick-quoted identifier contains a literal dot (e.g., `my.db`.mv), split('.') will split inside the quoted portion, producing ['my', 'db', 'mv'] instead of ['my.db', 'mv']. The resulting synthetic query INSERT INTO my.db.mv … would be parsed as a three-part name rather than two-part, potentially breaking lineage resolution.
This is an uncommon edge case (dots inside quoted identifiers are rare in ClickHouse), but worth noting for correctness.
Suggested fix:
def _strip_quotes(identifier: str) -> str:
import re
parts = re.findall(r'`[^`]+`|"[^"]+"|[^.]+', identifier)
return '.'.join(p.strip('`"') for p in parts if p != '.')
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| return ".".join(part.strip('`"') for part in identifier.split(".")) | ||
|
|
||
|
|
||
| def get_mv_to_target_table(query: str) -> Optional[tuple[str, str]]: |
There was a problem hiding this comment.
💡 Quality: Type annotation allows None at runtime but not statically
get_mv_to_target_table(query: str) is annotated as accepting str, but the guard if not query on line 70 and the test on line 224 (get_mv_to_target_table(None)) show it intentionally accepts None. Static type checkers (mypy, pyright) would flag the test as an error.
Suggested fix:
def get_mv_to_target_table(query: Optional[str]) -> Optional[tuple[str, str]]:
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 2 findingsExpands the ClickHouse regex to correctly parse complex TO-clause structures. Please update the _strip_quotes identifier handling for dotted strings and align the 💡 Edge Case:
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
Why
Fixes #26265
ClickHouse
CREATE MATERIALIZED VIEW ... TO <target>DDL has several optional clauses that the existing regex inget_mv_to_target_table()did not handle, causing downstream lineage edges to be silently dropped.What changed
Extended
MATERIALIZED_VIEW_TO_PATTERNinclickhouse/utils.pyto handle:POPULATEkeyword beforeTOCREATE MATERIALIZED VIEW mv POPULATE TO target AS …UUIDclause (system.tables DDL)CREATE MATERIALIZED VIEW db.mv UUID '550e8…' TO db.target AS …CREATE MATERIALIZED VIEW cat.schema.mv TO cat2.schema2.target AS …FROMas valid post-target tokenCREATE MATERIALIZED VIEW mv TO target FROM sourceAlso fixed a
typing.Tupleshadowing issue — removedTuplefrom thetypingimport and used the built-intuple[str, str]annotation (Python 3.10+).Test plan
test_clickhouse_utils.py:test_populate_keyword_before_totest_on_cluster_and_populate_before_totest_uuid_clausetest_three_part_qualified_namesTestGetMvToTargetTabletests continue to pass🤖 Generated with Claude Code