Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

vgi

crates.io docs.rs

VGI Logo

Add your own functions and tables to DuckDB — written in Rust, shipped as one binary.
No C++ extension to compile, no linking against DuckDB, no version coupling.


A VGI worker is a small Rust program that DuckDB talks to over Apache Arrow IPC. It can expose scalar / table / aggregate functions and whole catalogs (schemas, tables, views) that behave like native DuckDB objects. DuckDB launches your worker for you when a query needs it — you never run a server by hand.

vgi is the Rust SDK for building those workers. It is byte-for-byte wire-compatible with the canonical Python SDK, so a Rust worker drops in behind the same ATTACH ... (TYPE vgi). Built on vgi-rpc; stock arrow-rs 58.x, MSRV 1.86.

Why a worker instead of a C++ extension?

Traditional DuckDB extension VGI worker
Written in C/C++, compiled and linked against DuckDB Written in Rust, one standalone binary
Must be rebuilt for each DuckDB version Version independent
Complex build / signing / release cycle cargo build, ship the binary
Runs in-process Process isolation

Reach for it when you want to: call REST APIs from SQL, run ML inference, expose an external database / API / filesystem as a queryable catalog, or ship domain-specific functions to your team as a single binary.

Your first worker

1. Create a project and add the dependencies (these are exactly what the example below needs):

# Cargo.toml
[dependencies]
vgi = "0.1"
vgi-rpc = "0.2"
arrow-array = "58"
arrow-schema = "58"

2. Write a function and serve it:

// src/main.rs
use std::sync::Arc;

use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
use arrow_schema::DataType;
use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
use vgi_rpc::{Result, RpcError};

/// `upper_case(s)` — uppercase a string column.
struct UpperCase;

impl ScalarFunction for UpperCase {
    fn name(&self) -> &str {
        "upper_case"
    }

    fn metadata(&self) -> FunctionMetadata {
        FunctionMetadata {
            description: "Convert string values to uppercase".into(),
            return_type: Some(DataType::Utf8),
            ..Default::default()
        }
    }

    fn argument_specs(&self) -> Vec<ArgSpec> {
        vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
    }

    fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
        let col = batch.column(0).as_string::<i32>();
        let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
        let out: ArrayRef = Arc::new(upper);
        RecordBatch::try_new(params.output_schema.clone(), vec![out])
            .map_err(|e| RpcError::runtime_error(e.to_string()))
    }
}

fn main() {
    let mut worker = Worker::new();
    worker.register_scalar(UpperCase);
    worker.run(); // serves stdio (default), --unix <path>, or --http
}

3. Build it:

cargo build --release

4. Call it from a DuckDB engine that has the vgi extension. The vgi extension currently ships with Query Farm's Haybarn DuckDB distribution, which starts with no install via uvx haybarn-cli. From your project directory:

-- Haybarn ships the `vgi` extension. DuckDB LAUNCHES the worker for you;
-- LOCATION is the command it runs, and the alias 'demo' is what you
-- qualify functions with in SQL.
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');

SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB

-- Or drop the prefix:
USE demo;
SELECT main.upper_case('hello');   -- HELLO

LOCATION gotcha: the path is resolved relative to the DuckDB process's working directory, not your project. If the worker isn't found, use an absolute path (e.g. LOCATION '/abs/path/to/target/release/my-worker').

That's it — a native-speed SQL function, shipped as one static binary, with no extension to compile.

Iterating

Change your Rust, rebuild, and re-attach. DuckDB pools the worker process per attachment, so the reliable way to pick up a new build is to re-ATTACH (or start a fresh session):

cargo build --release
DETACH demo;
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');

Troubleshooting

  • ATTACH can't find the workerLOCATION is resolved relative to DuckDB's working directory, not your project. Use an absolute path.
  • Catalog Error: ... upper_case does not exist — qualify with the attach alias (demo.main.upper_case) or run USE demo; first.
  • A runtime error in your function — anything you return as RpcError (or any panic) surfaces in DuckDB's error message; return descriptive errors from process to make debugging easy.
  • Type mismatch at the call siteargument_specs is validated at bind time, so a wrong-typed column fails fast with a clear message before any rows flow.

Function types

Register any mix of these via the typed traits in vgi:

Type Trait SQL pattern Use case
Scalar ScalarFunction SELECT f(col) FROM t Per-row transforms (1:1)
Table TableFunction SELECT * FROM f(args) Generate / scan data
Table-In-Out TableInOutFunction SELECT * FROM f((SELECT …)) Streaming transforms
Table-Buffering TableBufferingFunction SELECT * FROM f((SELECT …)) Aggregate-then-emit (sink → combine → source)
Aggregate AggregateFunction SELECT f(col) … GROUP BY … Grouped / window / streaming aggregates

Each trait is small: name, metadata, argument_specs, an on_bind to resolve the output schema, and process (or the buffering / aggregate lifecycle methods). Projection & filter pushdown, ORDER BY / TABLESAMPLE hints, settings, secrets (two-phase bind), bearer auth, and a cross-process state store are handled for you.

Beyond functions: full catalogs

Worker::set_catalog exposes a complete catalog — schemas, function-backed tables, views, and macros — with constraints, column statistics, time travel (AT), and secondary catalogs attachable by name:

ATTACH 'external_db' (TYPE vgi, LOCATION './my-catalog-worker');

SELECT * FROM external_db.main.users;            -- a function-backed table
SELECT * FROM external_db.analytics.daily_view;  -- a view
SELECT external_db.main.transform(col) FROM t;   -- a function

A worker can act as a bridge — databases, APIs, filesystems — presented to DuckDB as native catalogs.

Transports

Worker::run picks the transport from argv:

  • stdio (default) — DuckDB spawns the worker per query. Nothing to configure.
  • Unix socket (--unix <path>) — one long-lived worker (the launcher contract).
  • HTTP (--http) — Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and optional bearer auth.

Where to go next

License

Query Farm Source-Available License v1.0 — see LICENSE. Copyright © 2025, 2026 Query Farm LLC.