Name	Name	Last commit message	Last commit date
parent directory ..
src	src
Cargo.toml	Cargo.toml
README.md	README.md

vgi

Add your own functions and tables to DuckDB — written in Rust, shipped as one binary.
No C++ extension to compile, no linking against DuckDB, no version coupling.

A VGI worker is a small Rust program that DuckDB talks to over Apache Arrow IPC. It can expose scalar / table / aggregate functions and whole catalogs (schemas, tables, views) that behave like native DuckDB objects. DuckDB launches your worker for you when a query needs it — you never run a server by hand.

vgi is the Rust SDK for building those workers. It is byte-for-byte wire-compatible with the canonical Python SDK, so a Rust worker drops in behind the same ATTACH ... (TYPE vgi). Built on vgi-rpc; stock arrow-rs 58.x, MSRV 1.86.

Why a worker instead of a C++ extension?

Traditional DuckDB extension	VGI worker
Written in C/C++, compiled and linked against DuckDB	Written in Rust, one standalone binary
Must be rebuilt for each DuckDB version	Version independent
Complex build / signing / release cycle	`cargo build`, ship the binary
Runs in-process	Process isolation

Reach for it when you want to: call REST APIs from SQL, run ML inference, expose an external database / API / filesystem as a queryable catalog, or ship domain-specific functions to your team as a single binary.

Your first worker

1. Create a project and add the dependencies (these are exactly what the example below needs):

# Cargo.toml
[dependencies]
vgi = "0.1"
vgi-rpc = "0.2"
arrow-array = "58"
arrow-schema = "58"

2. Write a function and serve it:

// src/main.rs
use std::sync::Arc;

use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
use arrow_schema::DataType;
use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
use vgi_rpc::{Result, RpcError};

/// `upper_case(s)` — uppercase a string column.
struct UpperCase;

impl ScalarFunction for UpperCase {
    fn name(&self) -> &str {
        "upper_case"
    }

    fn metadata(&self) -> FunctionMetadata {
        FunctionMetadata {
            description: "Convert string values to uppercase".into(),
            return_type: Some(DataType::Utf8),
            ..Default::default()
        }
    }

    fn argument_specs(&self) -> Vec<ArgSpec> {
        vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
    }

    fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
        let col = batch.column(0).as_string::<i32>();
        let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
        let out: ArrayRef = Arc::new(upper);
        RecordBatch::try_new(params.output_schema.clone(), vec![out])
            .map_err(|e| RpcError::runtime_error(e.to_string()))
    }
}

fn main() {
    let mut worker = Worker::new();
    worker.register_scalar(UpperCase);
    worker.run(); // serves stdio (default), --unix <path>, or --http
}

3. Build it:

cargo build --release

4. Call it from a DuckDB engine that has the vgi extension. The vgi extension currently ships with Query Farm's Haybarn DuckDB distribution, which starts with no install via uvx haybarn-cli. From your project directory:

-- Haybarn ships the `vgi` extension. DuckDB LAUNCHES the worker for you;
-- LOCATION is the command it runs, and the alias 'demo' is what you
-- qualify functions with in SQL.
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');

SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB

-- Or drop the prefix:
USE demo;
SELECT main.upper_case('hello');   -- HELLO

LOCATION gotcha: the path is resolved relative to the DuckDB process's working directory, not your project. If the worker isn't found, use an absolute path (e.g. LOCATION '/abs/path/to/target/release/my-worker').

That's it — a native-speed SQL function, shipped as one static binary, with no extension to compile.

Iterating

Change your Rust, rebuild, and re-attach. DuckDB pools the worker process per attachment, so the reliable way to pick up a new build is to re-ATTACH (or start a fresh session):

cargo build --release

DETACH demo;
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');

Troubleshooting

ATTACH can't find the worker — LOCATION is resolved relative to DuckDB's working directory, not your project. Use an absolute path.
Catalog Error: ... upper_case does not exist — qualify with the attach alias (demo.main.upper_case) or run USE demo; first.
A runtime error in your function — anything you return as RpcError (or any panic) surfaces in DuckDB's error message; return descriptive errors from process to make debugging easy.
Type mismatch at the call site — argument_specs is validated at bind time, so a wrong-typed column fails fast with a clear message before any rows flow.

Function types

Type	Trait	SQL pattern	Use case
Scalar	`ScalarFunction`	`SELECT f(col) FROM t`	Per-row transforms (1:1)
Table	`TableFunction`	`SELECT * FROM f(args)`	Generate / scan data
Table-In-Out	`TableInOutFunction`	`SELECT * FROM f((SELECT …))`	Streaming transforms
Table-Buffering	`TableBufferingFunction`	`SELECT * FROM f((SELECT …))`	Aggregate-then-emit (sink → combine → source)
Aggregate	`AggregateFunction`	`SELECT f(col) … GROUP BY …`	Grouped / window / streaming aggregates

Each trait is small: name, metadata, argument_specs, an on_bind to resolve the output schema, and process (or the buffering / aggregate lifecycle methods). Projection & filter pushdown, ORDER BY / TABLESAMPLE hints, settings, secrets (two-phase bind), bearer auth, and a cross-process state store are handled for you.

Beyond functions: full catalogs

Worker::set_catalog exposes a complete catalog — schemas, function-backed tables, views, and macros — with constraints, column statistics, time travel (AT), and secondary catalogs attachable by name:

ATTACH 'external_db' (TYPE vgi, LOCATION './my-catalog-worker');

SELECT * FROM external_db.main.users;            -- a function-backed table
SELECT * FROM external_db.analytics.daily_view;  -- a view
SELECT external_db.main.transform(col) FROM t;   -- a function

A worker can act as a bridge — databases, APIs, filesystems — presented to DuckDB as native catalogs.

Transports

Worker::run picks the transport from argv:

stdio (default) — DuckDB spawns the worker per query. Nothing to configure.
Unix socket (--unix <path>) — one long-lived worker (the launcher contract).
HTTP (--http) — Arrow-IPC over HTTP with AEAD-sealed stateless stream tokens and optional bearer auth.

Where to go next

API docs (docs.rs) — every trait and type.
Example worker — a fixture worker exercising every function kind and full catalogs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

vgi

Why a worker instead of a C++ extension?

Your first worker

Iterating

Troubleshooting

Function types

Beyond functions: full catalogs

Transports

Where to go next

License

FilesExpand file tree

vgi

Directory actions

More options

Directory actions

More options

Latest commit

History

vgi

Folders and files

parent directory

README.md

vgi

Why a worker instead of a C++ extension?

Your first worker

Iterating

Troubleshooting

Function types

Beyond functions: full catalogs

Transports

Where to go next

License