fix(clickhouse sink, aws_s3 sink): use dedicated batch_encoding types#25340
Merged
pront merged 13 commits intovectordotdev:masterfrom May 1, 2026
Merged
Conversation
9 tasks
…sion So adding a future S3BatchEncoding variant is a compile error rather than silently defaulting to "parquet". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoid rebinding parquet_config to a borrow of itself from config.batch_encoding; use a short p binding instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop internal type names and dev jargon; lead with the user-visible behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pront
approved these changes
May 1, 2026
Member
pront
left a comment
There was a problem hiding this comment.
Made some commits to fix a few issues. LGTM now!
Contributor
Author
Thanks, appreciate it! |
… codec at parse time Add per-sink deserialization-failure tests that pin the schema-tightening behavior introduced by the dedicated wrapper enums: - aws_s3 rejects codec: arrow_stream - clickhouse rejects codec: parquet Previously both codecs were accepted by serde and rejected later at sink-build time; the new wrapper enums move rejection up to parse time, and these tests prevent silent regression of that contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07e05e44ac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Programmatic users constructing S3SinkConfig directly need to be able to set the batch_encoding field. With config kept private in src/sinks/aws_s3/mod.rs, S3BatchEncoding was unnameable outside the crate, regressing callers that previously used BatchSerializerConfig::Parquet(...) at the same field. Gated by codecs-parquet to mirror the type and field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
clickhouseandaws_s3sinks previously shared theBatchSerializerConfigschema for theirbatch_encodingfield. That schema advertised every batch codec (arrow_stream,parquet,proto_batch) even though each sink only supports one — non-supported variants were rejected at config-build time, but showed up in generated docs and YAML schemas.This PR introduces dedicated, per-sink batch encoding types:
clickhousesink:ClickhouseBatchEncoding— onlyarrow_stream.aws_s3sink:S3BatchEncoding— onlyparquet.The generated component docs and config schemas now reflect what each sink actually accepts.
Vector configuration
How did you test this PR?
make check-clippy(clean)make check-generated-docsChange Type
Is this a breaking change?
(Configs that already used
codec: arrow_stream/codec: parquetcontinue to work; only the schema surface changes.)Does this PR include user facing changes?
changelog.d/clickhouse_aws_s3_dedicated_batch_encoding.fix.md.References
Closes #25323