Summary
Databricks geometry columns arrive as EWKT strings (SRID=4326;POINT(55.4 25.2)) in Arrow string fields. The Arrow field metadata already labels them (Spark:DataType:SqlName: GEOMETRY(4326)), but the driver doesn't convert them to a standard geospatial Arrow format.
Request
Have the driver emit geoarrow.wkb Arrow extension metadata on geometry columns, converting EWKT→WKB in the IPC reader. This would allow consumers like DuckDB's adbc_scanner to map geometry to native GEOMETRY automatically — no ST_AsBinary() on the Databricks side or ST_GeomFromWKB() on the client side.
The Redshift ADBC driver already does this — its geometry columns arrive with ARROW:extension:name: geoarrow.wkb metadata, and DuckDB maps them to native GEOMETRY with zero conversion needed.
Proof of concept
I built a patch in ipc_reader_adapter.go that:
- Detects geometry columns via
Spark:DataType:SqlName metadata
- Converts EWKT→WKB per row using
go-geom (WKT parse + WKB marshal)
- Replaces String arrays with Binary arrays +
ARROW:extension:name: geoarrow.wkb
It works — DuckDB sees native GEOMETRY, GeoParquet output includes geo metadata with WKB encoding, bbox, geometry_types.
However, the per-row WKT parsing in Go is ~25% slower for points and much slower for complex polygons compared to just using ST_AsBinary() server-side. The ideal solution would be for the driver (or databricks-sql-go) to emit WKB directly from the server, avoiding WKT string serialization entirely.
Current workaround
-- Databricks side: explicit binary conversion
SELECT *, ST_AsBinary(geom) as geom_wkb FROM table
-- DuckDB side: explicit geometry conversion
ST_GeomFromWKB(geom_wkb) as geom
Desired behavior
-- Just SELECT * — geometry arrives as native GEOMETRY via geoarrow.wkb
SELECT * FROM adbc_scan(conn, 'SELECT * FROM table')
References
Summary
Databricks geometry columns arrive as EWKT strings (
SRID=4326;POINT(55.4 25.2)) in Arrow string fields. The Arrow field metadata already labels them (Spark:DataType:SqlName: GEOMETRY(4326)), but the driver doesn't convert them to a standard geospatial Arrow format.Request
Have the driver emit geoarrow.wkb Arrow extension metadata on geometry columns, converting EWKT→WKB in the IPC reader. This would allow consumers like DuckDB's
adbc_scannerto map geometry to native GEOMETRY automatically — noST_AsBinary()on the Databricks side orST_GeomFromWKB()on the client side.The Redshift ADBC driver already does this — its geometry columns arrive with
ARROW:extension:name: geoarrow.wkbmetadata, and DuckDB maps them to native GEOMETRY with zero conversion needed.Proof of concept
I built a patch in
ipc_reader_adapter.gothat:Spark:DataType:SqlNamemetadatago-geom(WKT parse + WKB marshal)ARROW:extension:name: geoarrow.wkbIt works — DuckDB sees native GEOMETRY, GeoParquet output includes
geometadata with WKB encoding, bbox, geometry_types.However, the per-row WKT parsing in Go is ~25% slower for points and much slower for complex polygons compared to just using
ST_AsBinary()server-side. The ideal solution would be for the driver (ordatabricks-sql-go) to emit WKB directly from the server, avoiding WKT string serialization entirely.Current workaround
Desired behavior
References
adbc_scannerextension — maps geoarrow.wkb to native GEOMETRY