Antalya 26.1 - Add ALTER TABLE MODIFY SETTING support for Hybrid watermarks by mkmkme · Pull Request #1659 · Altinity/ClickHouse

mkmkme · 2026-04-17T12:38:09Z

Introduce hybridParam('name', 'type') pseudo-function for Hybrid engine
predicates, allowing segment boundaries to be changed at runtime via
ALTER TABLE ... MODIFY SETTING without recreating the table.

Key design decisions:

hybridParam() takes exactly two arguments (name, type); all values
must be provided via ENGINE SETTINGS, keeping the API surface minimal
and eliminating default-value complexity.
Watermark names must start with 'hybrid_watermark_' and exactly match
a declared hybridParam() in the predicates. Typos in both CREATE SETTINGS and ALTER are rejected.
Values are validated against the declared type at CREATE and ALTER
time, so invalid values never reach the runtime substitution path.
Only hybrid_watermark_* settings are allowed on Hybrid tables;
regular DistributedSettings and RESET SETTING are rejected.
Runtime watermark state uses MultiVersionstd::map for lock-free
reads and deterministic serialization order in SHOW CREATE TABLE.
In StorageDistributed::alter(), the watermark snapshot is published
before in-memory metadata to prevent concurrent readers from
observing new metadata with stale watermark values.
Predicate validation at CREATE time substitutes the effective
SETTINGS value (not the type default) so value-sensitive expressions
are checked against realistic data.

Example:

CREATE TABLE t 
ENGINE = Hybrid(
    cluster('cluster1', currentDatabase(), 'hot'),
        ts > hybridParam('hybrid_watermark_day')::Date,
    cluster('cluster2', currentDatabase(), 'cold'),
        ts <= hybridParam('hybrid_watermark_day')::Date
)
SETTINGS hybrid_watermark_day = '2026-04-01';

ALTER TABLE t
MODIFY SETTING hybrid_watermark_day = '2026-04-02';

Syntax sugar:

CREATE TABLE t 
ENGINE = Hybrid(
    cluster('cluster1', currentDatabase(), 'hot'),
        ts > hybrid_watermark_day::Date,
    cluster('cluster2', currentDatabase(), 'cold'),
        ts <= hybrid_watermark_day::Date
)
SETTINGS hybrid_watermark_day = '2026-04-01';

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added support for moving Hybrid table watermarks

CI/CD Options

Exclude tests:

Regression jobs to run:

Introduce hybridParam('name', 'type') pseudo-function for Hybrid engine predicates, allowing segment boundaries to be changed at runtime via ALTER TABLE ... MODIFY SETTING without recreating the table. Key design decisions: - hybridParam() takes exactly two arguments (name, type); all values must be provided via ENGINE SETTINGS, keeping the API surface minimal and eliminating default-value complexity. - Watermark names must start with 'hybrid_watermark_' and exactly match a declared hybridParam() in the predicates. Typos in both CREATE SETTINGS and ALTER are rejected. - Values are validated against the declared type at CREATE and ALTER time, so invalid values never reach the runtime substitution path. - Only hybrid_watermark_* settings are allowed on Hybrid tables; regular DistributedSettings and RESET SETTING are rejected. - Runtime watermark state uses MultiVersion<std::map> for lock-free reads and deterministic serialization order in SHOW CREATE TABLE. - In StorageDistributed::alter(), the watermark snapshot is published before in-memory metadata to prevent concurrent readers from observing new metadata with stale watermark values. - Predicate validation at CREATE time substitutes the effective SETTINGS value (not the type default) so value-sensitive expressions are checked against realistic data. Tests: - Stateless tests covering CREATE, ALTER, DETACH/ATTACH persistence, multi-watermark, type conflict, invalid values, typo rejection, RESET SETTING rejection, and DistributedSettings rejection. - Integration tests covering single-node, cross-node, and cluster() table function flows, each self-contained with own setup/teardown. Documentation updated with hybridParam() syntax, ALTER examples, multi-watermark usage, and restriction notes. Made-with: Cursor

github-actions · 2026-04-17T12:39:22Z

Workflow [PR], commit [a3fa8a7]

mkmkme · 2026-04-17T12:39:27Z

AI audit note: This review comment was generated by AI (gpt-5.4).

Audit update for PR #1659 (ALTER TABLE MODIFY SETTING support for Hybrid watermarks):

Confirmed defects:

Medium: ALTER MODIFY SETTING skips full predicate revalidation
Impact: a watermark change can be accepted and persisted even though the resulting Hybrid predicate becomes invalid for query rewrite/execution, leaving the table unreadable until the watermark is changed again.
Anchor: src/Storages/StorageDistributed.cpp / StorageDistributed::checkAlterIsPossible() vs Hybrid CREATE validation path
Trigger: change a watermark used in a value-sensitive predicate expression to a same-typed value that is semantically invalid for that expression.
Why defect: CREATE substitutes effective watermark values and revalidates every predicate with ExpressionAnalyzer, but ALTER only checks that the new setting parses as the declared type and never reruns predicate validation, so ALTER can admit states that CREATE would reject.
Fix direction (short): build the post-ALTER effective watermark map and reuse the same substituted-predicate validation used during CREATE.
Regression test direction (short): add a Hybrid predicate where only some same-typed watermark values are valid, and assert both CREATE and ALTER reject the bad value.

Example:

CREATE TABLE t
ENGINE = Hybrid(
    remote('localhost:9000', currentDatabase(), 'hot'),
        dateDiff(hybridParam('hybrid_watermark_unit', 'String'), ts, now()) < 30,
    remote('localhost:9000', currentDatabase(), 'cold'),
        dateDiff(hybridParam('hybrid_watermark_unit', 'String'), ts, now()) >= 30
)
SETTINGS hybrid_watermark_unit = 'day'
AS hot;

This CREATE should pass because 'day' is a valid String and also a semantically valid first argument for dateDiff().

But this ALTER is currently accepted too:

ALTER TABLE t MODIFY SETTING hybrid_watermark_unit = 'banana';

ALTER only validates that 'banana' parses as the declared type String. It does not revalidate the full predicate with substituted effective values. The next read then rewrites the predicate to:

dateDiff('banana', ts, now()) < 30

which is semantically invalid for dateDiff(). So CREATE and ALTER are not validation-equivalent for hybridParam() used in nontrivial expressions.

Coverage summary:

Scope reviewed: last commit in this PR, focusing on StorageDistributed Hybrid create/alter/read paths plus the added stateless/integration tests and docs
Categories failed: alter-time validation parity with create-time predicate validation
Categories passed: watermark name/type checks, missing-setting rejection, metadata persistence across detach/attach, multi-watermark update merge, concurrent read snapshot publication
Assumptions/limits: static audit only; I did not run the new tests or a live repro against ClickHouse

mkmkme · 2026-04-17T12:41:01Z

The above comment is a very good point by /audit-review. I think we should fix it. TBH I didn't think of such a use case when implementing the feature.

FWIW the previous /audit-report run before the PR went smoothly without any discovered defects.

UnamedRus · 2026-04-20T09:15:19Z

Kinda work with ALIAS column, but it's not smart enough to correctly recognize (_table = xxxx AND ....) OR (_table = yyy AND ....)

https://fiddle.clickhouse.com/c667b318-1b36-45e7-ad14-b6db1fb55ff7

alsugiliazova · 2026-04-20T13:00:05Z

Usability Audit: ALTER TABLE MODIFY SETTING for Hybrid watermarks (PR #1659)

Scope: Altinity/ClickHouse PR #1659 @ 5c36b262f71 — adds hybridParam('name','type') pseudo-function and runtime watermark modification via ALTER TABLE ... MODIFY SETTING on Hybrid engine tables.
Persona: Operator / DBA running tiered-storage or cluster-migration Hybrid tables in production.
Environment: Distributed cluster with at least one Hybrid table whose predicate uses hybridParam(); readers run concurrently with the DBA.
Source checked: branch mkmkme/hybrid/alter-watermark fetched into /Volumes/workspace/altinity-clickhouse/ClickHouse as pr-1659; full diff (git diff pr-1659~2..pr-1659) of 11 changed files; full PR conversation thread (body + 2 issue comments + 4 inline review comments from ianton-ru).

Summary of dedup vs PR comments

Before filing, every issue below was checked against the full PR thread (gh pr view 1659 --comments, gh api repos/Altinity/ClickHouse/pulls/1659/comments, gh api repos/Altinity/ClickHouse/issues/1659/comments).

#	Issue	Status
2.1	ALTER accepts a value that makes the predicate semantically invalid	Already raised — AI `/audit-review` comment by `mkmkme` @ 2026-04-17; author agreed in follow-up
2.2	Failure after bad ALTER surfaces as cryptic function-level error, no pointer to the responsible watermark	New in this audit
2.3	Once set, a watermark cannot be removed except by recreating the table	New in this audit
1.1	Read path indexes `hybridParam()` arguments without bounds-checking	Already raised — `ianton-ru` inline on `src/Storages/StorageDistributed.cpp:1246`
1.2	`collectHybridParamTypes()` silently skips malformed `hybridParam` during ALTER validation	Already raised — `ianton-ru` inline on `src/Storages/StorageDistributed.cpp:1762`
1.3	Segment pruning doesn't recognise `(_table = 'A' AND …) OR (_table = 'B' AND …)` disjunctions	Already raised — `UnamedRus` issue comment @ 2026-04-20, fiddle
X.1	No first-class visibility of current effective watermark values (only via `SHOW CREATE TABLE`)	New in this audit
X.2	Watermark type is locked into predicates; type change requires recreating the table	New in this audit

Other PR-thread items noted but not listed above:

ianton-ru inline on cpp:1769: question about whether collectHybridParamTypes visits all hybridParam nodes or only the first. After tracing the recursion in the diff, the logic is correct — the return; inside the hybridParam branch only prevents descending into the param's literal children; sibling hybridParams elsewhere in the tree are still reached via the parent's for (child : children) walk(child). Noted as a code-clarity concern, not a defect.
ianton-ru inline on cpp:1785: NIT to use switch/case instead of multiple ifs. Style only, not a usability issue.

Primary flows (ranked by expected usage)

Create a tiered Hybrid table with a watermark and SELECT against it, e.g. ts > hybridParam('hybrid_watermark_hot','DateTime') hot / cold split. (most common — the feature's advertised use case in docs/en/engines/table-engines/special/hybrid.md)
Move the boundary at runtime via ALTER TABLE t MODIFY SETTING hybrid_watermark_hot = '...'. (the whole point of the PR)
Update multiple watermarks in one ALTER (documented in the diff).
DETACH / ATTACH / server restart with watermark metadata persisted (explicit integration test test_hybrid_watermarks/test.py).
Readers running concurrently with ALTER (explicit in PR body: "snapshot published before metadata").

Issues

Flow 2: Runtime watermark modification

2.1 ALTER accepts a watermark value that makes the predicate semantically invalid; next read fails

Status: already raised (AI /audit-review; author agreed).

Symptom: ALTER TABLE t MODIFY SETTING hybrid_watermark_unit = 'banana' succeeds silently; the next SELECT fails because the predicate rewrites to dateDiff('banana', ts, now()) < 30, which is not a valid first argument for dateDiff. The table becomes unreadable until the operator ALTERs again to a valid value. (src/Storages/StorageDistributed.cpp StorageDistributed::checkAlterIsPossible() vs CREATE path in registerStorageHybrid @ 5c36b26.)
Trigger: A watermark used in a value-sensitive expression (dateDiff, toStartOfInterval, parseDateTimeBestEffort, etc.) is ALTERed to a same-typed but semantically invalid string.
Signal quality: Delayed — ALTER returns success; failure appears later on reads only.
Severity: High.
Recovery: Engineer / DBA must identify the offending watermark and ALTER it again. No automatic rollback.

2.2 Failure from a bad ALTER surfaces as a cryptic function-level error on read

Status: new in this audit. Not in PR comments.

Symptom: After 2.1, users and ops dashboards see Illegal argument for function dateDiff (or similar) coming from the rewritten predicate; no mention of which watermark caused it, nor of the last ALTER.
Trigger: Any read that reaches 2.1.
Signal quality: Confusing — the error is about a function, not about settings.
Severity: Medium.
Recovery: Operator must correlate the read failure with the most recent ALTER MODIFY SETTING manually. substitute_watermarks in read() has the name/value in scope and could wrap the exception.

2.3 Once set, a watermark cannot be removed except by recreating the table

Status: new in this audit. Not in PR comments.

Symptom: ALTER TABLE t RESET SETTING hybrid_watermark_hot is explicitly rejected by checkAlterIsPossible() (src/Storages/StorageDistributed.cpp, AlterCommand::Type::RESET_SETTING branch). There is no "unset" path. The only way to drop a watermark is to recreate the table.
Trigger: Migration roll-back or cleanup after a staged rollout.
Signal quality: Clear (error message is explicit).
Severity: Low.
Recovery: Recreate the table.

Flow 1: Default Hybrid with watermark (create + SELECT)

1.1 Read path indexes `hybridParam()` arguments without bounds-checking

Status: already raised (ianton-ru inline on src/Storages/StorageDistributed.cpp:1246, unresolved in the branch).

Symptom: substitute_watermarks lambda in StorageDistributed::read() accesses arg_list->children[0]->as<ASTLiteral>()->value.safeGet<String>() and children[1] unconditionally. If an ASTFunction named hybridParam reaches this path with fewer than two children (e.g. corrupted metadata on reload, restored from a corrupted replica, or a future code path producing a malformed AST), the server crashes instead of throwing a clean error.
Trigger: Malformed hybridParam call reaching read(). Unreachable via CREATE today, but the symmetry with CREATE-time validation is missing.
Signal quality: Silent (crash).
Severity: Medium (unconfirmed at runtime; inferred from the diff and reviewer comment).
Recovery: Server restart; metadata surgery if repeatable.

1.2 Multi-watermark predicate: `collectHybridParamTypes()` silently skips malformed `hybridParam` calls

Status: already raised (ianton-ru inline on src/Storages/StorageDistributed.cpp:1762, author reacted 👍 but no fix in the branch).

Symptom: If a hybridParam node has children.size() != 2, collectHybridParamTypes() skips it silently (if (arg_list && arg_list->children.size() >= 2)). That name then does not appear in declared_types, and a matching ALTER is rejected with "does not match any declared hybridParam(); check for typos in the watermark name" — which is misleading because the real cause is a structural problem.
Trigger: Partially corrupt or hand-edited metadata, or a future refactor that emits a malformed node.
Signal quality: Confusing (error message points at typo, not structure).
Severity: Low.
Recovery: DBA has to read the DDL to realise the "typo" message is misleading.

1.3 Planner does not recognise `(_table = 'A' AND …) OR (_table = 'B' AND …)` for segment pruning

Status: already raised (UnamedRus issue comment @ 2026-04-20).

Symptom: When using a _table-based alias/filter scheme over Hybrid, the planner cannot restrict segment reads for disjunctions of _table = …. Hybrid reads more segments than logically required.
Trigger: Analytic queries that combine per-segment filters with OR.
Signal quality: Silent (wrong-performance, not wrong-results).
Severity: Medium. (Strictly out of scope of this PR, but directly affects the same persona on the same primary flow — tiered / migration reads.)
Recovery: Rewrite the query as UNION ALL of per-segment subqueries.

Cross-flow

X.1 No first-class visibility of current effective watermark values

Status: new in this audit. Not in PR comments.

Symptom: Operator wants to know "what is the current value of hybrid_watermark_hot on this table right now?". The only way is SHOW CREATE TABLE (which registerStorageHybrid rebuilds from the runtime snapshot) or parsing system.tables.create_table_query. No row in system.settings, no dedicated system.* view for Hybrid engine settings.
Trigger: Routine monitoring, alerting, or runbook that needs to confirm the live watermark.
Signal quality: Awkward (requires parsing DDL text).
Severity: Low.
Recovery: Monitoring integrates a DDL parser.

X.2 Declared watermark type is locked into predicates; changing type requires recreating the table

Status: new in this audit. Not in PR comments.

Symptom: Type appears in every hybridParam(name, type) call in the predicate ASTs. MODIFY SETTING changes the value only; there is no path to change the type. The same watermark name with two different types across predicates is rejected at CREATE (hybridParam() type conflict).
Trigger: Schema drift; migration from Date to DateTime at cut-over; need to widen a numeric watermark type.
Signal quality: Clear at CREATE; silent at migration time (no tooling).
Severity: Low.
Recovery: Recreate the table.

Readiness verdict

Flow 1 (create + SELECT) in isolation: ready with caveats — exposed to Update README.md #1.1 under corrupted-metadata failure modes and to Update README.md #1.3 for _table-disjunction query patterns.
Flow 2 (runtime watermark modification) in isolation: not ready, because of Update README.md #2.1 (ALTER can leave the table unreadable until another ALTER lands) and Update README.md #2.2 (error message does not point at the responsible watermark).
Flow 3 (multi-watermark ALTER in one statement) in isolation: ready; no flow-specific blockers found. The snapshot is replaced atomically by loadHybridWatermarkParams.
Flow 4 (DETACH / ATTACH / restart persistence) in isolation: ready; covered by the integration test test_hybrid_watermarks/test.py and by the SETTINGS-AST rebuild in registerStorageHybrid.
Flow 5 (concurrent readers during ALTER) in isolation: ready; alter() publishes the watermark snapshot via MultiVersion::set before setInMemoryMetadata, so readers cannot observe new metadata with stale values.
Overall in production (typos in runbooks, network blips, concurrent admin ops, scale): not ready; not yet safe for broad production use without guardrails, because of Update README.md #2.1 (availability risk on valid-type but semantically-bad ALTER) and Update README.md #1.1 (unchecked AST child access on the read path). Both are correctness / availability risks surfaced to the operator without any indication at the point of failure.

Operator guidance (until fixes land)

Avoid hybridParam() in value-sensitive positions (dateDiff(hybridParam(...), ...), parseDateTimeBestEffort(hybridParam(...)), string-enum arguments, etc.) until Update README.md #2.1 is fixed. Prefer watermarks in pure comparison positions (ts > hybridParam(...), date <= hybridParam(...)) where any same-typed value is semantically valid.
Dry-run before ALTER. Before issuing ALTER TABLE ... MODIFY SETTING hybrid_watermark_*, run a representative read against a staging copy of the table with the candidate value substituted manually. ALTER itself is not a sufficient validation for value-sensitive predicates (Update README.md #2.1).
Smoke-check after ALTER. Immediately run SELECT ... LIMIT 0 on the Hybrid table after every watermark ALTER; a function-level error on the smoke query is a bad-value indicator (Update README.md #2.2) — roll the watermark back.
Use a maintenance window for watermark ALTERs even though concurrent reads are snapshot-safe. Update README.md #2.1 + Update README.md #1.1 are availability risks, not visibility risks.
Avoid _table = 'A' OR _table = 'B' on Hybrid tables; rewrite as UNION ALL for correct segment pruning (Update README.md #1.3).
Capture SHOW CREATE TABLE periodically into monitoring so the effective watermark is always known to ops, regardless of who ALTERed it last (X.1).

Recommended fixes

Update README.md #2.1 (blocker for production readiness): in StorageDistributed::checkAlterIsPossible() for MODIFY_SETTING on Hybrid, build the post-ALTER effective watermark map (existing values + incoming changes), substitute into each predicate AST, and run the same TreeRewriter + ExpressionAnalyzer path as registerStorageHybrid's validate_predicate. This makes CREATE and ALTER validation-equivalent. Fix direction already agreed by the PR author.
Update README.md #2.2: when the substituted predicate throws on read, wrap the exception with context like "while evaluating Hybrid predicate with watermark '<name>' = '<value>'". substitute_watermarks has the name/value in scope.
Update README.md #1.1: add an arity/type check in substitute_watermarks symmetric to the one already in registerStorageHybrid::collect_hybrid_params — throw BAD_ARGUMENTS on malformed hybridParam.
Update README.md #1.2: throw instead of skip in collectHybridParamTypes(); the reviewer's inline comment at cpp:1762 already suggests this.
X.1: expose current effective watermark values via system.tables' settings column or a dedicated system.hybrid_watermarks view (name, value, declared_type, table).
Update README.md #1.3 (out of this PR): file a separate issue for planner-level _table-disjunction pruning in Hybrid; not a blocker for this PR's feature surface but affects the same operator journey.

…/alter-watermark

mkmkme · 2026-04-22T09:09:14Z

I have an update on 'banana' example. It seems to be quite a bit more complicated than I thought. The explanation is quite large unfortunately, so I will copy-paste the whole AI summary of it as I can't fold it much.

The premise

CREATE substitutes effective watermark values and revalidates every predicate with ExpressionAnalyzer, but ALTER only checks that the new setting parses as the declared type and never reruns predicate validation, so ALTER can admit states that CREATE would reject.

The first half is correct: CREATE does run TreeRewriter::analyze + ExpressionAnalyzer::getActions(true) on each substituted predicate (registerStorageHybrid, around validate_predicate). The second half — "states that CREATE would reject" — is narrower than it looks, and for the dateDiff('banana', ts, now()) example specifically, CREATE doesn't reject it either. Here's why.

Why `'banana'` isn't rejected at CREATE

ClickHouse splits function validation into two phases:

getReturnTypeImpl(arguments) — given argument types, return the result type or throw. Runs during ActionsDAG build, i.e. inside ExpressionAnalyzer::getActions(true).
executeImpl(arguments, ...) — runs at query execution over actual column values.

For dateDiff, getReturnTypeImpl (in src/Functions/dateDiff.cpp) only asserts that the unit argument is of type String via validateFunctionArguments:

{"unit", static_cast<FunctionArgumentDescriptor::TypeValidator>(&isString), nullptr, "String"},

The actual "is this a known unit name?" check lives in executeImpl and throws BAD_ARGUMENTS via "Function {} does not support '{}' unit". That throw only runs when executeImpl runs. During DAG build, ActionsDAG will const-fold a function call (triggering executeImpl) only when all inputs are constant. In dateDiff('banana', ts, now()), ts is a column input, so dateDiff is never eagerly evaluated, and the unit check never fires.

You can verify this with a minimal repro on the current branch: CREATE TABLE ... SETTINGS hybrid_watermark_unit = 'banana' with the predicate from the example is accepted today by CREATE. The bug the issue describes isn't specifically an ALTER bug — it's a pre-existing CREATE bug that ALTER mirrors.

Why running the analyzer on ALTER wouldn't help

A second thing I want to point out: for any watermark value that passes the existing per-change type parse, running the analyzer again on ALTER produces the identical result as it did at CREATE, because:

The substitute step produces an ASTLiteral whose Field is built by DataTypeFactory::instance().get(type_name)->createColumn() + deserializeWholeText(...).
The Field's type is determined by the declared watermark type, not by the value.
TreeRewriter/ExpressionAnalyzer see that type, not the value.

So the analyzer pass is invariant under a same-typed ALTER. The only way the analyzer would reject one value but accept another of the same declared type is via narrow FieldToDataType quirks (e.g., very small integers narrowing to Int8), which don't arise in typical watermark predicates.

Net: ALTER today already enforces the per-change type parse, which is effectively the same guarantee the CREATE analyzer pass provides for same-typed value changes. Adding the analyzer pass to ALTER would be defense-in-depth at best and a no-op in practice for all realistic predicates.

What it would take to actually catch `'banana'`

Two plausible paths, both with real downsides:

Force a constant-fold evaluation. Substitute each physical column with a default-typed ColumnConst so the predicate becomes all-constant, then call ExpressionActions::execute on a 1-row block. That would invoke dateDiff::executeImpl and throw on 'banana'. The cost is false positives on value-dependent predicates — e.g., intDiv(x, ts) with default ts = 0 would fail at validation with DIVISION_BY_ZERO, rejecting a perfectly legitimate configuration. Some functions also have side effects during execution (dictionary lookups, now(), random) that shouldn't run at ALTER time.
Allowlist-style per-function pre-checks. For a known set of sinks (dateDiff unit, formatDateTime format, toTimeZone timezone, regex patterns), add a syntactic check on the substituted constant literal before DAG build. That catches specific real-world issues but doesn't generalize and is fragile as new functions land.

Neither is a small change, and both should apply to CREATE as well as ALTER — otherwise we'd have the inverse of the original complaint, i.e. ALTER rejecting what CREATE accepts.

Recommendation

I don't think the originally proposed fix (reuse CREATE's analyzer pass on ALTER) would change the behavior for the 'banana' example, and it wouldn't materially strengthen ALTER for any typical Hybrid predicate. I'd suggest one of:

Close this item as "works as intended, limitation documented": both CREATE and ALTER admit value-semantic errors in executeImpl; add a note to the Hybrid docs that watermark values are validated for type-parse, not for semantic validity inside user predicates.
Open a separate ticket for forced constant-fold validation at both CREATE and ALTER, scoped as a follow-up. That's where the real user-facing improvement is, but it needs a design for the false-positive problem before we ship it.

I'm happy to go either way — but I'd rather not land a ~130-line parity change whose regression test (ALTER ... 'banana' -- { serverError BAD_ARGUMENTS }) wouldn't actually pass.

mkmkme · 2026-04-22T09:50:14Z

@alsugiliazova I'll try to briefly address all of the found bulletpoints in your report. As sources, I used discussions with Misha and summaries of planning in Cursor.

2.1. ALTER accepts a value that makes the predicate semantically invalid

See the explanation above. At this point it's going to probably stay as a known limitation.

2.2. Failure after bad ALTER surfaces as cryptic function-level error, no pointer to the responsible watermark

This should probably also stay as a known limitation for now due to CH implementation limitations. StorageDistributed::read() doesn't execute the query. Instead, it builds a QueryPlan / Pipe and returns. Runtime errors from the functions happen later, during pipeline execution, which almost always means they would be raised on remote shards. The shard would send an Exception packet over TCP, RemoteQueryExecutor would re-raise it and walk it through the pipeline executor, already after read() returned.
The only thing we can do at this stage is enrich the planning-time errors with the watermark name/value context. For examples like dateDiff('banana', ...) this would not help since the error surfaces on shards.

2.3. Once set, a watermark cannot be removed except by recreating the table

This should be fine as this ultimately changes the table itself. In that case it should be re-created from scratch.

1.1. Read path indexes hybridParam() arguments without bounds-checking

Fixed

1.2. collectHybridParamTypes() silently skips malformed hybridParam during ALTER validation

Fixed

1.3. Segment pruning doesn't recognise (_table = 'A' AND …) OR (_table = 'B' AND …) disjunctions

This is quite a significant change outside the scope of this PR, I would leave it as-is for now.

X.1. No first-class visibility of current effective watermark values (only via SHOW CREATE TABLE)

It's quite simple to implement, but the change would be within several hundreds lines of new code. I would personally defer it to a follow-up PR for the next refresh. If strongly needed, I can implement it here. But I doubt it considering the Low priority of the issue in the report.

X.2. Watermark type is locked into predicates; type change requires recreating the table

This is also fine since it requires changing a predicate and hence the whole table.

ianton-ru

LGTM

mkmkme · 2026-04-22T12:41:22Z

X.1

#1674

alsugiliazova · 2026-04-23T14:01:36Z

Regression jobs:

alsugiliazova · 2026-04-23T14:36:18Z

Verification report: Altinity/ClickHouse PR #1659

PR #1659 CI triage

Summary

Item	Assessment
`test_storage_delta/test_cdf.py::test_cdf[]`	Likely pre-existing flaky — many PRs show single failures; `pull_request_number = 0` had 9 fails in 60d; 1659 has 1. Still “new vs base” on HTML report → worth rerun and log check before merging.
`test_export_replicated_mt_partition_to_object_storage/test.py::test_concurrent_exports_to_different_targets`	Likely pre-existing flaky — PR 0: 4 fails; 1390, 1517: multiple; 1659: 1. PR touches `StorageDistributed` → keep weak regression suspicion until log proves timing-only flake.
BuzzHouse (amd_debug)	Unknown from DB alone — `Unknown error` placeholder. Pull BuzzHouse artifacts / stack.
Stateless TSAN parallel 2/2	Infrastructure / timeout — job-level `error`, not a specific `FAIL` test row. Rerun shard.

Flakiness and “related to PR #1659?” (`gh-data`.checks + `play`)

Per-failing-item stats and “related to PR?”

Failing item	Window / scope	Stats	Related to PR #1659?
`test_storage_delta/test_cdf.py::test_cdf[]` (Integration amd_asan family: `check_name` LIKE `'Integration tests (amd_asan%'`)	`gh-data`.checks, last 60d	17 `FAIL`, 250 `OK` → ~6.4% of outcomes FAIL; failures spread across 69 distinct `pull_request_number` values; PR `0`: 9 FAILs; PR 1659: 1 FAIL	No — not PR-specific. Same test fails routinely on branch/PR `0` and other PRs.
`test_storage_delta/test_cdf.py::test_cdf[]`	`play.clickhouse.com` `default.checks`, last 30d	28 `FAIL`, 24609 `OK`, 4679 `SKIPPED`	No — upstream history is low-rate but non-zero FAIL, consistent with flake.
`test_export_replicated_mt_partition_to_object_storage/test.py::test_concurrent_exports_to_different_targets`	`gh-data`.checks, last 60d	14 `FAIL`, 880 `OK` → ~1.6% FAIL rate; PR `0`: 4 FAILs; other PRs (e.g. 1390, 1517) multiple FAILs; PR 1659: 1 FAIL	Unlikely — not PR-specific (same pattern on `0` and unrelated PRs). Optional: read log once because PR touches `StorageDistributed` (overlap only; history does not implicate 1659).
`BuzzHouse (amd_debug)`	`gh-data`.checks, last 60d	58 rows with `test_status = 'FAIL'` vs 132 total rows in quick aggregate; PR `0`: 9 FAILs (by-PR breakdown); PR 1659: 1 FAIL (`test_name` = `Unknown error` in DB)	No — typical fuzzer / debug noise shared across many PRs; not evidence Hybrid caused it without the artifact error string.
`Stateless tests (amd_tsan, s3 storage, parallel, 2/2)` (job row: empty `test_name`)	`gh-data`.checks, last 60d	`check_status = error`: 14 vs `success`: 52 on job-level rows → ~21% job error rate	No — infrastructure / timeout class, not a deterministic “this PR broke one test” signal.

Plain-language summary: Failing jobs are not shown to be caused by PR #1659 when measured against months of CI database history; they align with pre-existing flakiness, BuzzHouse noise, and infra timeouts. A rerun remains reasonable because the HTML report can still mark tests “new vs base” for a single run.

mkmkme added 2 commits April 17, 2026 13:40

Merge branch 'antalya-26.1' into mkmkme/hybrid/alter-watermark

5c36b26

mkmkme added antalya port-antalya PRs to be ported to all new Antalya releases hybrid antalya-26.1 labels Apr 17, 2026

ianton-ru reviewed Apr 17, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp Outdated

Comment thread src/Storages/StorageDistributed.cpp Outdated

Comment thread src/Storages/StorageDistributed.cpp

Comment thread src/Storages/StorageDistributed.cpp

alex-zaitsev mentioned this pull request Apr 19, 2026

Project Antalya Roadmap 2026 #1359

Open

32 tasks

Merge remote-tracking branch 'origin/antalya-26.1' into mkmkme/hybrid…

3989a2a

…/alter-watermark

StorageDistributed: check number of hybridParam arguments

a3fa8a7

ianton-ru approved these changes Apr 22, 2026

View reviewed changes

Comment thread src/Storages/StorageDistributed.cpp

alsugiliazova added the verified Approved for release label Apr 23, 2026

zvonand merged commit d1f8e37 into antalya-26.1 Apr 23, 2026
300 of 309 checks passed

svb-alt added the antalya-26.1.11.20001 label Apr 23, 2026

Conversation

mkmkme commented Apr 17, 2026 • edited by alex-zaitsev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkmkme commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Confirmed defects:

Uh oh!

mkmkme commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

UnamedRus commented Apr 20, 2026

Uh oh!

alsugiliazova commented Apr 20, 2026

Usability Audit: ALTER TABLE MODIFY SETTING for Hybrid watermarks (PR #1659)

Summary of dedup vs PR comments

Primary flows (ranked by expected usage)

Issues

Flow 2: Runtime watermark modification

2.1 ALTER accepts a watermark value that makes the predicate semantically invalid; next read fails

2.2 Failure from a bad ALTER surfaces as a cryptic function-level error on read

2.3 Once set, a watermark cannot be removed except by recreating the table

Flow 1: Default Hybrid with watermark (create + SELECT)

1.1 Read path indexes hybridParam() arguments without bounds-checking

1.2 Multi-watermark predicate: collectHybridParamTypes() silently skips malformed hybridParam calls

1.3 Planner does not recognise (_table = 'A' AND …) OR (_table = 'B' AND …) for segment pruning

Cross-flow

X.1 No first-class visibility of current effective watermark values

X.2 Declared watermark type is locked into predicates; changing type requires recreating the table

Readiness verdict

Operator guidance (until fixes land)

Recommended fixes

Uh oh!

mkmkme commented Apr 22, 2026

The premise

Why 'banana' isn't rejected at CREATE

Why running the analyzer on ALTER wouldn't help

What it would take to actually catch 'banana'

Recommendation

Uh oh!

mkmkme commented Apr 22, 2026

Uh oh!

ianton-ru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mkmkme commented Apr 22, 2026

Uh oh!

alsugiliazova commented Apr 23, 2026

Uh oh!

alsugiliazova commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification report: Altinity/ClickHouse PR #1659

PR #1659 CI triage

Summary

Flakiness and “related to PR #1659?” (gh-data.checks + play)

Per-failing-item stats and “related to PR?”

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mkmkme commented Apr 17, 2026 •

edited by alex-zaitsev

Loading

github-actions Bot commented Apr 17, 2026 •

edited

Loading

mkmkme commented Apr 17, 2026 •

edited

Loading

1.1 Read path indexes `hybridParam()` arguments without bounds-checking

1.2 Multi-watermark predicate: `collectHybridParamTypes()` silently skips malformed `hybridParam` calls

1.3 Planner does not recognise `(_table = 'A' AND …) OR (_table = 'B' AND …)` for segment pruning

Why `'banana'` isn't rejected at CREATE

What it would take to actually catch `'banana'`

alsugiliazova commented Apr 23, 2026 •

edited

Loading

Flakiness and “related to PR #1659?” (`gh-data`.checks + `play`)