Add chdb in-process backend via interface="chdb"#753
Conversation
|
@wudidapaopao thanks! This is something I've been wanting to do for a while. Before we merge, I want to do some more research on both sides, In an ideal world, If that is not viable, I think we should consider this as a backend refactor on our side rather than adding a separate client subclass. Currently, I'm thinking one public client API with pluggable execution backends, so chDB support is implemented behind the existing client instead of as a parallel client family. (Separate concern, but this would also allow room for future TCP native support as well.) Either way, I'd like to spend some time on this before merging. Thanks again for putting this together and for the very thorough test coverage. I'll post back here as I get through the research. |
|
Thanks @joe-clickhouse for the thoughtful response, and for taking the time to think through the architectural fit on both sides — really appreciate it. I strongly agree with the overall direction. If I may add a couple of thoughts from the chDB side: One thing we've consistently tried hard to preserve in chDB is minimizing serialization/deserialization overhead — it's arguably one of the main reasons users reach for an embedded engine in the first place. A loopback-only HTTP endpoint inside Relatedly, chDB already supports zero-copy read/write for pandas DataFrames. Keeping the in-process path (i.e. not going through a server boundary) preserves that property end-to-end, and I think it also opens up nicer downstream integrations — both deeper interop with So just my two cents (please take it as just a suggestion): if we can let users switch the execution engine by changing a single place — without touching any of their existing code — I think that would offer the best developer experience. Your pluggable-backend idea actually sounds very aligned with this: existing Happy to dig deeper on either direction with you — whatever helps the research move forward. Thanks again. |
Summary
Adds an in-process backend that uses the embedded
chdbengine instead of HTTP. Selected viaclickhouse_connect.get_client(interface="chdb"). No ClickHouse server required.The same
NativeTransformbyte parser the HTTP client uses is reused verbatim, so all existing type / dtype / streaming / DB-API / SQLAlchemy code paths work unchanged.Usage examples
In-memory (default):
Persistent file path:
Engine startup options as a dict:
Or inline in the path itself:
ClickHouse server settings applied for the lifetime of the client (issued via
SET k=vat construction):Async usage is symmetric:
Checklist
Delete items not relevant to your PR: