Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/guide/modeling/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ properties: {}
| `columns` | Yes | List of columns to expose (see [Column Fields](#column-fields)) |
| `primary_key` | No | Column name that uniquely identifies a row; required for relationships |
| `cached` | No | Whether query results for this model should be cached; `false` by default |
| `dialect` | No | SQL dialect of the model's `ref_sql` (e.g. `bigquery`, `postgres`). Overrides the project-level `data_source` for this model. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (description, tags, etc.) |

## Data Source: Two Ways to Point at Data
Expand Down
1 change: 1 addition & 0 deletions docs/guide/modeling/view.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ statement: >
|-------|----------|-------------|
| `name` | Yes | Unique identifier used in SQL queries |
| `statement` | Yes | A complete SQL SELECT statement; may reference other models or views |
| `dialect` | No | SQL dialect of the view's `statement` (e.g. `bigquery`, `postgres`). Currently metadata only — the engine always parses view statements with its generic SQL parser. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (use `properties.description` for a human-readable description) |

## Model vs View
Expand Down
141 changes: 138 additions & 3 deletions docs/guide/modeling/wren_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ default_project: ~/projects/sales
### `wren_project.yml`

```yaml
schema_version: 2
schema_version: 3
name: my_project
version: "1.0"
catalog: wren
Expand All @@ -95,7 +95,7 @@ data_source: postgres

| Field | Description |
|-------|-------------|
| `schema_version` | Directory layout version. `2` = folder-per-entity (current). Owned by the CLI — do not bump manually. |
| `schema_version` | Directory layout version. `2` = folder-per-entity, `3` = adds `dialect` field support (current). Owned by the CLI — do not bump manually. |
| `name` | Project name |
| `version` | User's own project version (free-form, no effect on parsing) |
| `catalog` | **Wren Engine namespace** — NOT your database catalog. Identifies this MDL project within the engine. Default: `wren`. |
Expand Down Expand Up @@ -146,6 +146,19 @@ cached: false
properties: {}
```

**`dialect`** — optional field declaring which SQL dialect the model's `ref_sql` is written in. When omitted, the project-level `data_source` is used. This lets a single project contain models whose SQL targets different databases:

```yaml
name: revenue
ref_sql: "SELECT * FROM `project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

Requires `schema_version: 3`. See [Dialect Override](#dialect-override) for details.

**ref_sql** — defines the model via a SQL query. SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file (the `.sql` file takes precedence if both exist):

```yaml
Expand Down Expand Up @@ -186,6 +199,16 @@ properties:
description: "Top customers by lifetime value"
```

Like models, views support an optional **`dialect`** field (requires `schema_version: 3`):

```yaml
name: monthly_summary
statement: "SELECT date_trunc('month', created_at) FROM orders"
dialect: postgres
```

When set, the dialect is stored as metadata for downstream consumers. It does not currently affect how the engine parses the view's statement — view statements are always normalized into a logical plan via DataFusion's generic SQL parser. See [Dialect Override](#dialect-override) for details.

### `relationships.yml`

```yaml
Expand Down Expand Up @@ -226,6 +249,7 @@ wren context init → scaffold project in current directory
(edit models/, relationships.yml, instructions.md)
wren context validate → check YAML structure (no DB needed)
wren context build → compile to target/mdl.json
wren context upgrade → upgrade project to latest schema_version
wren profile add my-pg ... → save connection to ~/.wren/profiles.yml
wren memory index → index schema + instructions into .wren/memory/
wren --sql "SELECT 1" → verify connection
Expand Down Expand Up @@ -280,6 +304,65 @@ wren context init --from-mdl mdl.json --path my_project --force
```

> **When to use this:** You have an existing `mdl.json` that was authored by hand or generated by an older workflow (e.g. the MCP server's `mdl_save_project` tool), and you want to adopt the YAML project format for version control and CLI-driven workflows.
>
> The import is `layoutVersion`-aware: manifests with `layoutVersion: 2` produce a `schema_version: 3` project with `dialect` fields preserved. Manifests without `layoutVersion` (or `layoutVersion: 1`) produce a `schema_version: 2` project.

---

## Upgrading an Existing Project

When new features are added to the project format (e.g. the `dialect` field in schema_version 3), use `wren context upgrade` to bring your project up to date:

```bash
wren context upgrade --path my_project
```

This upgrades to the latest `schema_version`. The command handles all intermediate steps automatically — for example, upgrading from v1 to v3 applies v1→v2 (restructure flat files into directories) then v2→v3 (enable dialect support).

### What each upgrade does

| Upgrade | File changes |
|---------|-------------|
| v1 → v2 | `models/*.yml` flat files → `models/<name>/metadata.yml` directories; `ref_sql` extracted to `ref_sql.sql`; `views.yml` → `views/<name>/metadata.yml` directories; old files deleted |
| v2 → v3 | No file layout changes — only bumps `schema_version` in `wren_project.yml` to enable `dialect` field support |

### Options

| Flag | Description |
|------|-------------|
| `--to N` | Upgrade to a specific schema_version instead of the latest |
| `--dry-run` | Preview what files would be created, deleted, or modified — without writing anything |

### Preview before upgrading

```bash
wren context upgrade --path my_project --dry-run
```

```text
Dry run — no files will be changed.

Would create:
models/orders/metadata.yml
models/orders/ref_sql.sql
views/summary/metadata.yml

Would delete:
models/orders.yml
views.yml

Would modify:
wren_project.yml (schema_version 1 -> 3)
```

### After upgrading

```bash
wren context validate --path my_project
wren context build --path my_project
```

> **When to use this:** Your project was created with an older CLI version and you want to use new features (like per-model `dialect`). If your project is already at the latest schema_version, the command exits with a "nothing to do" message.

---

Expand All @@ -297,8 +380,60 @@ The `build` step converts all YAML keys from snake_case to camelCase:
| `primary_key` | `primaryKey` |
| `join_type` | `joinType` |
| `data_source` | `dataSource` |
| `layout_version` | `layoutVersion` |
| `refresh_time` | `refreshTime` |
| `base_object` | `baseObject` |

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `dialect`, `properties`) are identical in both formats.

The `layoutVersion` field is stamped automatically by `wren context build` based on the project's `schema_version`. You do not set it manually in YAML.

---

## Dialect Override

Models and views support an optional `dialect` field that declares which SQL dialect their embedded SQL is written in. This requires `schema_version: 3`.

### Semantics

- **`dialect` omitted (or `null`)** — falls back to the project-level `data_source`. This is the default and matches the behavior of all existing projects.
- **`dialect` set** — the embedded SQL is written in the specified dialect, which may differ from the project's `data_source`.

### Model dialect

When a model has `dialect: bigquery` but the project's `data_source` is `postgres`, the engine knows the model's `ref_sql` contains BigQuery-flavored SQL (e.g. backtick-quoted identifiers, BigQuery functions). The engine uses this to select the correct SQL parser for the ref_sql.

```yaml
# models/revenue/metadata.yml
name: revenue
ref_sql: "SELECT * FROM `my-project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

### View dialect

For views, the `dialect` field is currently **metadata only**. The engine normalizes view statements into a logical plan using DataFusion's generic SQL parser regardless of the dialect setting. The field is still valuable because:

- It documents the author's intent (which dialect the SQL was written in).
- Downstream consumers (ibis-server, MCP clients) can use it for dialect-aware processing.
- When dialect-aware view parsing is added in the future, the field will already be in place.

### Valid dialect values

`athena`, `bigquery`, `canner`, `clickhouse`, `databricks`, `datafusion`, `doris`, `duckdb`, `gcs_file`, `local_file`, `minio_file`, `mssql`, `mysql`, `oracle`, `postgres`, `redshift`, `s3_file`, `snowflake`, `spark`, `trino`

### Version requirements

The `dialect` field requires `schema_version: 3` in `wren_project.yml`. Using `dialect` in a `schema_version: 2` project produces a validation warning. The `schema_version` also controls the `layoutVersion` stamped in the compiled `target/mdl.json`:

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats.
| `schema_version` | `layoutVersion` | Capabilities |
|-------------------|-----------------|--------------|
| 1 | 1 | Legacy flat-file project format |
| 2 | 1 | Folder-per-entity project format |
| 3 | 2 | `dialect` field on models and views |

---

Expand Down
11 changes: 11 additions & 0 deletions wren-core-base/manifest-macro/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Manifest {
#[serde(default = "default_layout_version")]
pub layout_version: u32,
pub catalog: String,
pub schema: String,
#[serde(default)]
Expand All @@ -51,6 +53,10 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[serde(default)]
pub data_source: Option<DataSource>,
}

fn default_layout_version() -> u32 {
1
}
};
proc_macro::TokenStream::from(expanded)
}
Expand Down Expand Up @@ -154,6 +160,8 @@ pub fn model(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
pub refresh_time: Option<String>,
#[serde(default)]
pub row_level_access_controls: Vec<Arc<RowLevelAccessControl>>,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down Expand Up @@ -363,9 +371,12 @@ pub fn view(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
let expanded = quote! {
#python_binding
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
#[serde(rename_all = "camelCase")]
pub struct View {
pub name: String,
pub statement: String,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down
Loading
Loading