Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/guide/modeling/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ properties: {}
| `columns` | Yes | List of columns to expose (see [Column Fields](#column-fields)) |
| `primary_key` | No | Column name that uniquely identifies a row; required for relationships |
| `cached` | No | Whether query results for this model should be cached; `false` by default |
| `dialect` | No | SQL dialect of the model's `ref_sql` (e.g. `bigquery`, `postgres`). Overrides the project-level `data_source` for this model. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (description, tags, etc.) |

## Data Source: Two Ways to Point at Data
Expand Down
1 change: 1 addition & 0 deletions docs/guide/modeling/view.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ statement: >
|-------|----------|-------------|
| `name` | Yes | Unique identifier used in SQL queries |
| `statement` | Yes | A complete SQL SELECT statement; may reference other models or views |
| `dialect` | No | SQL dialect of the view's `statement` (e.g. `bigquery`, `postgres`). Currently metadata only — the engine always parses view statements with its generic SQL parser. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (use `properties.description` for a human-readable description) |

## Model vs View
Expand Down
141 changes: 138 additions & 3 deletions docs/guide/modeling/wren_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ default_project: ~/projects/sales
### `wren_project.yml`

```yaml
schema_version: 2
schema_version: 3
name: my_project
version: "1.0"
catalog: wren
Expand All @@ -95,7 +95,7 @@ data_source: postgres

| Field | Description |
|-------|-------------|
| `schema_version` | Directory layout version. `2` = folder-per-entity (current). Owned by the CLI — do not bump manually. |
| `schema_version` | Directory layout version. `2` = folder-per-entity, `3` = adds `dialect` field support (current). Owned by the CLI — do not bump manually. |
| `name` | Project name |
| `version` | User's own project version (free-form, no effect on parsing) |
| `catalog` | **Wren Engine namespace** — NOT your database catalog. Identifies this MDL project within the engine. Default: `wren`. |
Expand Down Expand Up @@ -146,6 +146,19 @@ cached: false
properties: {}
```

**`dialect`** — optional field declaring which SQL dialect the model's `ref_sql` is written in. When omitted, the project-level `data_source` is used. This lets a single project contain models whose SQL targets different databases:

```yaml
name: revenue
ref_sql: "SELECT * FROM `project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

Requires `schema_version: 3`. See [Dialect Override](#dialect-override) for details.

**ref_sql** — defines the model via a SQL query. SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file (the `.sql` file takes precedence if both exist):

```yaml
Expand Down Expand Up @@ -186,6 +199,16 @@ properties:
description: "Top customers by lifetime value"
```

Like models, views support an optional **`dialect`** field (requires `schema_version: 3`):

```yaml
name: monthly_summary
statement: "SELECT date_trunc('month', created_at) FROM orders"
dialect: postgres
```

When set, the dialect is stored as metadata for downstream consumers. It does not currently affect how the engine parses the view's statement — view statements are always normalized into a logical plan via DataFusion's generic SQL parser. See [Dialect Override](#dialect-override) for details.

### `relationships.yml`

```yaml
Expand Down Expand Up @@ -226,6 +249,7 @@ wren context init → scaffold project in current directory
(edit models/, relationships.yml, instructions.md)
wren context validate → check YAML structure (no DB needed)
wren context build → compile to target/mdl.json
wren context upgrade → upgrade project to latest schema_version
wren profile add my-pg ... → save connection to ~/.wren/profiles.yml
wren memory index → index schema + instructions into .wren/memory/
wren --sql "SELECT 1" → verify connection
Expand Down Expand Up @@ -280,6 +304,65 @@ wren context init --from-mdl mdl.json --path my_project --force
```

> **When to use this:** You have an existing `mdl.json` that was authored by hand or generated by an older workflow (e.g. the MCP server's `mdl_save_project` tool), and you want to adopt the YAML project format for version control and CLI-driven workflows.
>
> The import is `layoutVersion`-aware: manifests with `layoutVersion: 2` produce a `schema_version: 3` project with `dialect` fields preserved. Manifests without `layoutVersion` (or `layoutVersion: 1`) produce a `schema_version: 2` project.

---

## Upgrading an Existing Project

When new features are added to the project format (e.g. the `dialect` field in schema_version 3), use `wren context upgrade` to bring your project up to date:

```bash
wren context upgrade --path my_project
```

This upgrades to the latest `schema_version`. The command handles all intermediate steps automatically — for example, upgrading from v1 to v3 applies v1→v2 (restructure flat files into directories) then v2→v3 (enable dialect support).

### What each upgrade does

| Upgrade | File changes |
|---------|-------------|
| v1 → v2 | `models/*.yml` flat files → `models/<name>/metadata.yml` directories; `ref_sql` extracted to `ref_sql.sql`; `views.yml` → `views/<name>/metadata.yml` directories; old files deleted |
| v2 → v3 | No file layout changes — only bumps `schema_version` in `wren_project.yml` to enable `dialect` field support |

### Options

| Flag | Description |
|------|-------------|
| `--to N` | Upgrade to a specific schema_version instead of the latest |
| `--dry-run` | Preview what files would be created, deleted, or modified — without writing anything |

### Preview before upgrading

```bash
wren context upgrade --path my_project --dry-run
```

```text
Dry run — no files will be changed.

Would create:
models/orders/metadata.yml
models/orders/ref_sql.sql
views/summary/metadata.yml

Would delete:
models/orders.yml
views.yml

Would modify:
wren_project.yml (schema_version 1 -> 3)
```

### After upgrading

```bash
wren context validate --path my_project
wren context build --path my_project
```

> **When to use this:** Your project was created with an older CLI version and you want to use new features (like per-model `dialect`). If your project is already at the latest schema_version, the command exits with a "nothing to do" message.

---

Expand All @@ -297,8 +380,60 @@ The `build` step converts all YAML keys from snake_case to camelCase:
| `primary_key` | `primaryKey` |
| `join_type` | `joinType` |
| `data_source` | `dataSource` |
| `layout_version` | `layoutVersion` |
| `refresh_time` | `refreshTime` |
| `base_object` | `baseObject` |

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `dialect`, `properties`) are identical in both formats.

The `layoutVersion` field is stamped automatically by `wren context build` based on the project's `schema_version`. You do not set it manually in YAML.

---

## Dialect Override

Models and views support an optional `dialect` field that declares which SQL dialect their embedded SQL is written in. This requires `schema_version: 3`.

### Semantics

- **`dialect` omitted (or `null`)** — falls back to the project-level `data_source`. This is the default and matches the behavior of all existing projects.
- **`dialect` set** — the embedded SQL is written in the specified dialect, which may differ from the project's `data_source`.

### Model dialect

When a model has `dialect: bigquery` but the project's `data_source` is `postgres`, the engine knows the model's `ref_sql` contains BigQuery-flavored SQL (e.g. backtick-quoted identifiers, BigQuery functions). The engine uses this to select the correct SQL parser for the ref_sql.

```yaml
# models/revenue/metadata.yml
name: revenue
ref_sql: "SELECT * FROM `my-project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

### View dialect

For views, the `dialect` field is currently **metadata only**. The engine normalizes view statements into a logical plan using DataFusion's generic SQL parser regardless of the dialect setting. The field is still valuable because:

- It documents the author's intent (which dialect the SQL was written in).
- Downstream consumers (ibis-server, MCP clients) can use it for dialect-aware processing.
- When dialect-aware view parsing is added in the future, the field will already be in place.

### Valid dialect values

`athena`, `bigquery`, `canner`, `clickhouse`, `databricks`, `datafusion`, `doris`, `duckdb`, `gcs_file`, `local_file`, `minio_file`, `mssql`, `mysql`, `oracle`, `postgres`, `redshift`, `s3_file`, `snowflake`, `spark`, `trino`

### Version requirements

The `dialect` field requires `schema_version: 3` in `wren_project.yml`. Using `dialect` in a `schema_version: 2` project produces a validation warning. The `schema_version` also controls the `layoutVersion` stamped in the compiled `target/mdl.json`:

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats.
| `schema_version` | `layoutVersion` | Capabilities |
|-------------------|-----------------|--------------|
| 1 | 1 | Legacy flat-file project format |
| 2 | 1 | Folder-per-entity project format |
| 3 | 2 | `dialect` field on models and views |

---

Expand Down
11 changes: 11 additions & 0 deletions wren-core-base/manifest-macro/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Manifest {
#[serde(default = "default_layout_version")]
pub layout_version: u32,
pub catalog: String,
pub schema: String,
#[serde(default)]
Expand All @@ -51,6 +53,10 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[serde(default)]
pub data_source: Option<DataSource>,
}

fn default_layout_version() -> u32 {
1
}
};
proc_macro::TokenStream::from(expanded)
}
Expand Down Expand Up @@ -154,6 +160,8 @@ pub fn model(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
pub refresh_time: Option<String>,
#[serde(default)]
pub row_level_access_controls: Vec<Arc<RowLevelAccessControl>>,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down Expand Up @@ -363,9 +371,12 @@ pub fn view(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
let expanded = quote! {
#python_binding
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
#[serde(rename_all = "camelCase")]
pub struct View {
pub name: String,
pub statement: String,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down
Loading
Loading