Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions .github/workflows/wasm-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: WASM SDK
permissions:
contents: read

on:
push:
branches:
- main
paths:
- 'wren-core-wasm/**'
- 'wren-core/core/**'
- 'wren-core-base/**'
pull_request:
paths:
- 'wren-core-wasm/**'
- 'wren-core/core/**'
- 'wren-core-base/**'

concurrency:
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
cancel-in-progress: true

jobs:
build:
name: Build WASM + TypeScript SDK
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown

- name: Cache Cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
wren-core-wasm/target/
key: wasm-cargo-${{ hashFiles('wren-core-wasm/Cargo.toml') }}
restore-keys: wasm-cargo-

- name: Install wasm-pack
run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 20

- name: Install npm dependencies
working-directory: wren-core-wasm
run: npm install

- name: Build WASM (wasm-pack)
working-directory: wren-core-wasm
run: wasm-pack build --target web --release

- name: Build dist (TypeScript + copy)
working-directory: wren-core-wasm
run: npm run build:dist

- name: TypeScript typecheck
working-directory: wren-core-wasm
run: npm run typecheck

- name: WASM binary size check
working-directory: wren-core-wasm
run: |
raw_size=$(stat -c%s dist/wren_core_wasm_bg.wasm)
gzip_size=$(gzip -c dist/wren_core_wasm_bg.wasm | wc -c)
raw_mb=$(echo "scale=1; $raw_size / 1048576" | bc)
gzip_mb=$(echo "scale=1; $gzip_size / 1048576" | bc)
echo "WASM binary: ${raw_mb} MB raw, ${gzip_mb} MB gzip"

# Fail if gzip > 15 MB
max_gzip=$((15 * 1048576))
if [ "$gzip_size" -gt "$max_gzip" ]; then
echo "::error::WASM binary gzip size (${gzip_mb} MB) exceeds 15 MB limit"
exit 1
fi

- name: Run integration tests
working-directory: wren-core-wasm
run: npm test

- name: Upload dist artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/upload-artifact@v4
with:
name: wasm-sdk-dist
path: wren-core-wasm/dist/
retention-days: 30
92 changes: 92 additions & 0 deletions wren-core-wasm/.claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# wren-core-wasm

Wren Engine compiled to WebAssembly for browser-native analytics. Runs SQL queries on Parquet/CSV/JSON data through the MDL semantic layer entirely client-side, powered by DataFusion.

## Why a Separate Crate?

Uses **upstream DataFusion** (crates.io v53), not the Canner fork. The WASM version executes queries directly via DataFusion — no SQL unparser or dialect transpilation needed. Kept outside the `wren-core/` workspace to avoid dependency conflicts.

Shared code: `wren-core-base` (manifest types, no DataFusion dependency) and `wren-core` semantic layer (MDL analysis rules).

## Architecture

```
Browser JS
├── registerParquet(name, bytes) → Arrow RecordBatch → MemTable
├── registerJson(name, json) → Arrow JSON reader → MemTable
├── loadMDL(mdl_json, source) → AnalyzedWrenMDL + table resolution
│ URL mode: source="https://..." → ListingTable via HttpStore
│ Local mode: source="./..." → expects pre-registered tables
│ Fallback: source="" → auto-detect from tableReference
└── query(sql) → MDL rewrite → DataFusion execute → JSON result
```

## Key Source Files

- `src/lib.rs` (758 lines) — Single-file crate with `WrenEngine` struct:
- `new()` → SessionContext with single-thread config
- `register_json()` → JSON array → NDJSON → Arrow RecordBatch → MemTable
- `register_parquet()` → Parquet bytes → Arrow RecordBatch → MemTable
- `load_mdl()` → Parse manifest, analyze with wren-core, register tables by mode
- `query()` → Apply MDL analyzer rules → DataFusion execute → JSON string
- `sdk/src/index.ts` — TypeScript wrapper (`WrenEngine` class) for npm package
- `sdk/src/wren_core_wasm.d.ts` — Hand-maintained type stubs for wasm-bindgen output
- `sdk/tests/index.test.mjs` — Node.js integration tests
- `scripts/build.mjs` — Build script: copy pkg/ artifacts + compile TS → dist/
- `examples/` — Browser HTML examples and headless Node.js tests

## Dev Commands

```bash
just build # Full build: WASM (release) + TypeScript SDK → dist/
just build-wasm # WASM only (wasm-pack → pkg/), macOS LLVM auto-detected
just build-wasm-dev # WASM debug build (faster, no --release)
just build-dist # Assemble dist/ from pkg/ + TS (requires pkg/ to exist)
just test # SDK integration tests (requires dist/)
just typecheck # TypeScript type check only
just serve # HTTP server on localhost:8787 for browser examples
just test-examples # Headless Node.js example tests
just size # Report WASM binary size (raw + gzip)
just clean # Remove pkg/, dist/, target/
```

## Dependencies

- **DataFusion v53** (upstream, crates.io) — query engine, `default-features = false` + selected features
- **Arrow v58.1** — `json` feature for JSON reader
- **Parquet v58.1** — `snap` + `lz4` only (no zstd — requires C library, can't compile to WASM)
- **object_store v0.13.1** — `aws` + `http` features for URL mode (HttpStore)
- **wren-core** (path: `../wren-core/core`) — semantic layer, `default-features = false`
- **wren-core-base** (path: `../wren-core-base`) — shared manifest types
- **wasm-bindgen / js-sys / web-sys** — WASM ↔ JS bindings
- **tokio** — `rt` + `macros` only (no multi-thread for WASM)
- **chrono** — `wasmbind` feature (uses `js_sys::Date` instead of `SystemTime`)
- **getrandom** v0.2/0.3/0.4 — all need `js`/`wasm_js` feature for `wasm32` target

## WASM-Specific Constraints

- **Single-threaded**: `SessionConfig::with_target_partitions(1)`, tokio `rt` only (no `rt-multi-thread`)
- **No zstd**: `zstd-sys` (C library) can't compile to WASM. Parquet uses snappy + lz4 (pure Rust)
- **No SystemTime**: chrono `wasmbind` feature required
- **getrandom**: All three major versions (0.2, 0.3, 0.4) in the dep tree need explicit WASM JS backend
- **macOS build**: Needs LLVM (`brew install llvm`) for C deps — justfile handles env vars automatically
- **Binary size**: ~68 MB raw / ~14 MB gzip (target: < 15 MB gzip)

## npm Package (wren-core-sdk)

`package.json` defines the npm package. TypeScript SDK wraps the raw wasm-bindgen API:
- `WrenEngine.init(options?)` — load WASM binary, create engine
- `engine.loadMDL(mdl, profile)` — load MDL manifest with `{ source }` profile
- `engine.registerParquet(name, data)` / `engine.registerJson(name, data)` — pre-register tables
- `engine.query(sql)` → `Record<string, unknown>[]` (parsed, not raw JSON string)
- `engine.free()` — release WASM memory

Build output goes to `dist/` (ESM + types + `.wasm` binary). Published to npm, usable via CDN (unpkg/jsDelivr).

## Conventions

- Rust formatted with `cargo fmt`, linted with `clippy -D warnings`
- TypeScript uses strict mode, ES2020 target
- `#[wasm_bindgen(js_name = camelCase)]` for JS-facing API names
- Errors propagate as `JsError` (visible in browser console with stack traces)
- Tests: `wasm-bindgen-test` for Rust WASM tests, `node:test` for SDK integration tests
3 changes: 2 additions & 1 deletion wren-core-wasm/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
/pkg/
/dist/
Cargo.lock
/examples/data/
node_modules/
package-lock.json
176 changes: 176 additions & 0 deletions wren-core-wasm/AGENT_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# WASM Dashboard Generation Guide

Reference for AI agents generating browser-based HTML dashboard artifacts using `wren-core-sdk`.

## How to Import

```html
<script type="module">
import { WrenEngine } from 'https://unpkg.com/wren-core-sdk@0.1.0/dist/index.js';
</script>
```

**Use unpkg, not jsDelivr.** jsDelivr's free CDN has a 50 MB per-file limit and
the WASM binary is ~68 MB raw, so jsDelivr returns 403 on the `.wasm` fetch.

## Two Data Loading Modes

### URL Mode (recommended for large data)

Data lives on a CORS-enabled HTTP server. DataFusion reads Parquet via range requests.

```javascript
const engine = await WrenEngine.init();
await engine.loadMDL(mdlJson, { source: 'https://your-cdn.com/data/' });
const rows = await engine.query('SELECT * FROM "Orders" LIMIT 100');
```

Requirements:
- Server must support CORS and HTTP range requests
- MDL `tableReference.table` uses bare names (e.g., `"orders"`) — the engine prepends `source` as URL prefix

### Inline Mode (for small data, < 50 MB)

Data is embedded directly in the HTML file.

```javascript
const engine = await WrenEngine.init();

// From JSON
await engine.registerJson('orders', [
{ id: 1, customer: 'Alice', amount: 100 },
{ id: 2, customer: 'Bob', amount: 200 },
]);

// Or from Parquet (base64-decoded ArrayBuffer)
const parquetBytes = Uint8Array.from(atob(PARQUET_BASE64), c => c.charCodeAt(0));
await engine.registerParquet('orders', parquetBytes.buffer);

await engine.loadMDL(mdlJson, { source: '' });
const rows = await engine.query('SELECT * FROM "Orders" LIMIT 100');
```

## MDL Structure

Every dashboard needs an MDL manifest. Minimal example:

```javascript
const mdl = {
catalog: 'wren',
schema: 'public',
models: [
{
name: 'Orders', // query as: SELECT ... FROM "Orders"
tableReference: { table: 'orders' }, // physical table name (bare, no URL)
columns: [
{ name: 'id', type: 'INTEGER' },
{ name: 'customer', type: 'VARCHAR' },
{ name: 'amount', type: 'DOUBLE' },
{ name: 'order_date', type: 'DATE' },
],
primaryKey: 'id',
},
],
relationships: [],
metrics: [],
views: [],
};
```

## Query Results

`engine.query(sql)` returns `Record<string, unknown>[]` — an array of plain objects. This is directly usable with:

- **Chart.js**: `data.datasets[0].data = rows.map(r => r.amount)`
- **D3.js**: `d3.select('svg').selectAll('rect').data(rows)`
- **console.table**: `console.table(rows)`
- **HTML table**: iterate `rows` to build `<tr>/<td>` elements

## Complete HTML Template (Inline Mode)

```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Dashboard</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js@4"></script>
</head>
<body>
<h1>Revenue Dashboard</h1>
<canvas id="chart" width="600" height="400"></canvas>
<div id="status">Loading engine...</div>

<script type="module">
import { WrenEngine } from 'https://unpkg.com/wren-core-sdk@0.1.0/dist/index.js';

const status = document.getElementById('status');

try {
// 1. Initialize engine
const engine = await WrenEngine.init();
status.textContent = 'Engine ready. Loading data...';

// 2. Register inline data
await engine.registerJson('orders', [
{ id: 1, customer: 'Alice', amount: 150, month: '2024-01' },
{ id: 2, customer: 'Bob', amount: 200, month: '2024-01' },
{ id: 3, customer: 'Alice', amount: 300, month: '2024-02' },
{ id: 4, customer: 'Bob', amount: 100, month: '2024-02' },
]);

// 3. Load MDL
const mdl = {
catalog: 'wren', schema: 'public',
models: [{
name: 'Orders',
tableReference: { table: 'orders' },
columns: [
{ name: 'id', type: 'INTEGER' },
{ name: 'customer', type: 'VARCHAR' },
{ name: 'amount', type: 'DOUBLE' },
{ name: 'month', type: 'VARCHAR' },
],
primaryKey: 'id',
}],
relationships: [], metrics: [], views: [],
};
await engine.loadMDL(mdl, { source: '' });

// 4. Query
const rows = await engine.query(
'SELECT month, sum(amount) AS revenue FROM "Orders" GROUP BY month ORDER BY month'
);
status.textContent = `Loaded ${rows.length} data points.`;

// 5. Render chart
new Chart(document.getElementById('chart'), {
type: 'bar',
data: {
labels: rows.map(r => r.month),
datasets: [{
label: 'Revenue',
data: rows.map(r => r.revenue),
backgroundColor: 'rgba(54, 162, 235, 0.6)',
}],
},
});

engine.free();
} catch (err) {
status.textContent = `Error: ${err.message}`;
console.error(err);
}
</script>
</body>
</html>
```

## Common Pitfalls

1. **Model names are case-sensitive** — use double quotes: `FROM "Orders"`, not `FROM Orders`
2. **`loadMDL` must be called after `registerJson`/`registerParquet`** in inline mode
3. **WASM binary is ~68 MB** — show a loading indicator during `WrenEngine.init()`
4. **`source: ''`** means "use pre-registered tables only" — don't pass `''` if you expect URL mode
5. **CORS required** for URL mode — `file://` protocol won't work for fetching remote Parquet
Loading
Loading