Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/guide/modeling/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ properties: {}
| `columns` | Yes | List of columns to expose (see [Column Fields](#column-fields)) |
| `primary_key` | No | Column name that uniquely identifies a row; required for relationships |
| `cached` | No | Whether query results for this model should be cached; `false` by default |
| `dialect` | No | SQL dialect of the model's `ref_sql` (e.g. `bigquery`, `postgres`). Overrides the project-level `data_source` for this model. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (description, tags, etc.) |

## Data Source: Two Ways to Point at Data
Expand Down
1 change: 1 addition & 0 deletions docs/guide/modeling/view.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ statement: >
|-------|----------|-------------|
| `name` | Yes | Unique identifier used in SQL queries |
| `statement` | Yes | A complete SQL SELECT statement; may reference other models or views |
| `dialect` | No | SQL dialect of the view's `statement` (e.g. `bigquery`, `postgres`). Currently metadata only — the engine always parses view statements with its generic SQL parser. Requires `schema_version: 3`. See [Dialect Override](./wren_project.md#dialect-override). |
| `properties` | No | Arbitrary key-value metadata (use `properties.description` for a human-readable description) |

## Model vs View
Expand Down
81 changes: 78 additions & 3 deletions docs/guide/modeling/wren_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ default_project: ~/projects/sales
### `wren_project.yml`

```yaml
schema_version: 2
schema_version: 3
name: my_project
version: "1.0"
catalog: wren
Expand All @@ -95,7 +95,7 @@ data_source: postgres

| Field | Description |
|-------|-------------|
| `schema_version` | Directory layout version. `2` = folder-per-entity (current). Owned by the CLI — do not bump manually. |
| `schema_version` | Directory layout version. `2` = folder-per-entity, `3` = adds `dialect` field support (current). Owned by the CLI — do not bump manually. |
| `name` | Project name |
| `version` | User's own project version (free-form, no effect on parsing) |
| `catalog` | **Wren Engine namespace** — NOT your database catalog. Identifies this MDL project within the engine. Default: `wren`. |
Expand Down Expand Up @@ -146,6 +146,19 @@ cached: false
properties: {}
```

**`dialect`** — optional field declaring which SQL dialect the model's `ref_sql` is written in. When omitted, the project-level `data_source` is used. This lets a single project contain models whose SQL targets different databases:

```yaml
name: revenue
ref_sql: "SELECT * FROM `project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

Requires `schema_version: 3`. See [Dialect Override](#dialect-override) for details.

**ref_sql** — defines the model via a SQL query. SQL can be inline in `metadata.yml` or in a separate `ref_sql.sql` file (the `.sql` file takes precedence if both exist):

```yaml
Expand Down Expand Up @@ -186,6 +199,16 @@ properties:
description: "Top customers by lifetime value"
```

Like models, views support an optional **`dialect`** field (requires `schema_version: 3`):

```yaml
name: monthly_summary
statement: "SELECT date_trunc('month', created_at) FROM orders"
dialect: postgres
```

When set, the dialect is stored as metadata for downstream consumers. It does not currently affect how the engine parses the view's statement — view statements are always normalized into a logical plan via DataFusion's generic SQL parser. See [Dialect Override](#dialect-override) for details.

### `relationships.yml`

```yaml
Expand Down Expand Up @@ -297,8 +320,60 @@ The `build` step converts all YAML keys from snake_case to camelCase:
| `primary_key` | `primaryKey` |
| `join_type` | `joinType` |
| `data_source` | `dataSource` |
| `layout_version` | `layoutVersion` |
| `refresh_time` | `refreshTime` |
| `base_object` | `baseObject` |

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `dialect`, `properties`) are identical in both formats.

The `layoutVersion` field is stamped automatically by `wren context build` based on the project's `schema_version`. You do not set it manually in YAML.

---

## Dialect Override

Models and views support an optional `dialect` field that declares which SQL dialect their embedded SQL is written in. This requires `schema_version: 3`.

### Semantics

- **`dialect` omitted (or `null`)** — falls back to the project-level `data_source`. This is the default and matches the behavior of all existing projects.
- **`dialect` set** — the embedded SQL is written in the specified dialect, which may differ from the project's `data_source`.

### Model dialect

When a model has `dialect: bigquery` but the project's `data_source` is `postgres`, the engine knows the model's `ref_sql` contains BigQuery-flavored SQL (e.g. backtick-quoted identifiers, BigQuery functions). The engine uses this to select the correct SQL parser for the ref_sql.

```yaml
# models/revenue/metadata.yml
name: revenue
ref_sql: "SELECT * FROM `my-project.dataset.table`"
dialect: bigquery
columns:
- name: amount
type: DECIMAL
```

### View dialect

For views, the `dialect` field is currently **metadata only**. The engine normalizes view statements into a logical plan using DataFusion's generic SQL parser regardless of the dialect setting. The field is still valuable because:

- It documents the author's intent (which dialect the SQL was written in).
- Downstream consumers (ibis-server, MCP clients) can use it for dialect-aware processing.
- When dialect-aware view parsing is added in the future, the field will already be in place.

### Valid dialect values

`athena`, `bigquery`, `canner`, `clickhouse`, `databricks`, `datafusion`, `doris`, `duckdb`, `gcs_file`, `local_file`, `minio_file`, `mssql`, `mysql`, `oracle`, `postgres`, `redshift`, `s3_file`, `snowflake`, `spark`, `trino`

### Version requirements

The `dialect` field requires `schema_version: 3` in `wren_project.yml`. Using `dialect` in a `schema_version: 2` project produces a validation warning. The `schema_version` also controls the `layoutVersion` stamped in the compiled `target/mdl.json`:

Generic rule: split on `_`, capitalize each word after the first, join. All other fields (`name`, `type`, `catalog`, `schema`, `table`, `condition`, `models`, `columns`, `cached`, `properties`) are identical in both formats.
| `schema_version` | `layoutVersion` | Capabilities |
|-------------------|-----------------|--------------|
| 1 | 1 | Legacy flat-file project format |
| 2 | 1 | Folder-per-entity project format |
| 3 | 2 | `dialect` field on models and views |

---

Expand Down
11 changes: 11 additions & 0 deletions wren-core-base/manifest-macro/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash, Clone)]
#[serde(rename_all = "camelCase")]
pub struct Manifest {
#[serde(default = "default_layout_version")]
pub layout_version: u32,
pub catalog: String,
pub schema: String,
#[serde(default)]
Expand All @@ -51,6 +53,10 @@ pub fn manifest(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStr
#[serde(default)]
pub data_source: Option<DataSource>,
}

fn default_layout_version() -> u32 {
1
}
};
proc_macro::TokenStream::from(expanded)
}
Expand Down Expand Up @@ -154,6 +160,8 @@ pub fn model(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
pub refresh_time: Option<String>,
#[serde(default)]
pub row_level_access_controls: Vec<Arc<RowLevelAccessControl>>,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down Expand Up @@ -363,9 +371,12 @@ pub fn view(python_binding: proc_macro::TokenStream) -> proc_macro::TokenStream
let expanded = quote! {
#python_binding
#[derive(Serialize, Deserialize, Debug, PartialEq, Eq, Hash)]
#[serde(rename_all = "camelCase")]
pub struct View {
pub name: String,
pub statement: String,
#[serde(default)]
pub dialect: Option<DataSource>,
}
};
proc_macro::TokenStream::from(expanded)
Expand Down
143 changes: 143 additions & 0 deletions wren-core-base/src/mdl/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ impl ManifestBuilder {
pub fn new() -> Self {
Self {
manifest: Manifest {
layout_version: 1,
catalog: "wrenai".to_string(),
schema: "public".to_string(),
models: vec![],
Expand All @@ -57,6 +58,11 @@ impl ManifestBuilder {
}
}

pub fn layout_version(mut self, version: u32) -> Self {
self.manifest.layout_version = version;
self
}

pub fn catalog(mut self, catalog: &str) -> Self {
self.manifest.catalog = catalog.to_string();
self
Expand Down Expand Up @@ -114,6 +120,7 @@ impl ModelBuilder {
cached: false,
refresh_time: None,
row_level_access_controls: vec![],
dialect: None,
},
}
}
Expand Down Expand Up @@ -168,6 +175,11 @@ impl ModelBuilder {
self
}

pub fn dialect(mut self, dialect: DataSource) -> Self {
self.model.dialect = Some(dialect);
self
}

pub fn build(self) -> Arc<Model> {
Arc::new(self.model)
}
Expand Down Expand Up @@ -406,6 +418,7 @@ impl ViewBuilder {
view: View {
name: name.to_string(),
statement: "".to_string(),
dialect: None,
},
}
}
Expand All @@ -415,6 +428,11 @@ impl ViewBuilder {
self
}

pub fn dialect(mut self, dialect: DataSource) -> Self {
self.view.dialect = Some(dialect);
self
}

pub fn build(self) -> Arc<View> {
Arc::new(self.view)
}
Expand Down Expand Up @@ -848,4 +866,129 @@ mod test {
assert_eq!(actual.normalized_name(), actual.name.to_lowercase());
assert_eq!(actual, expected)
}

#[test]
fn test_manifest_layout_version_default() {
let json = r#"{"catalog":"wren","schema":"public"}"#;
let manifest: Manifest = serde_json::from_str(json).unwrap();
assert_eq!(manifest.layout_version, 1);
}

#[test]
fn test_manifest_layout_version_explicit() {
let json = r#"{"layoutVersion":2,"catalog":"wren","schema":"public"}"#;
let manifest: Manifest = serde_json::from_str(json).unwrap();
assert_eq!(manifest.layout_version, 2);
}

#[test]
fn test_manifest_layout_version_roundtrip() {
let expected = ManifestBuilder::new().layout_version(2).build();
let json_str = serde_json::to_string(&expected).unwrap();
assert!(json_str.contains(r#""layoutVersion":2"#));
let actual: Manifest = serde_json::from_str(&json_str).unwrap();
assert_eq!(actual.layout_version, 2);
assert_eq!(actual, expected);
}

#[test]
fn test_manifest_version_validation_ok() {
use crate::mdl::manifest::MAX_SUPPORTED_LAYOUT_VERSION;
let manifest = ManifestBuilder::new()
.layout_version(MAX_SUPPORTED_LAYOUT_VERSION)
.build();
assert!(manifest.validate_layout_version().is_ok());
}

#[test]
fn test_manifest_version_validation_rejected() {
let manifest = ManifestBuilder::new().layout_version(99).build();
let err = manifest.validate_layout_version().unwrap_err();
assert!(err.to_string().contains("99"));
assert!(err.to_string().contains("only supports up to"));
}

#[test]
fn test_model_dialect_none_default() {
let json = r#"{"name":"test","columns":[]}"#;
let model: Arc<Model> = serde_json::from_str(json).unwrap();
assert!(model.dialect.is_none());
}

#[test]
fn test_model_dialect_roundtrip() {
let expected = ModelBuilder::new("test")
.table_reference("test")
.column(ColumnBuilder::new("id", "integer").build())
.dialect(DataSource::BigQuery)
.build();

let json_str = serde_json::to_string(&expected).unwrap();
assert!(json_str.contains(r#""dialect":"BIGQUERY""#));
let actual: Arc<Model> = serde_json::from_str(&json_str).unwrap();
assert_eq!(actual.dialect, Some(DataSource::BigQuery));
assert_eq!(actual, expected);
}

#[test]
fn test_model_dialect_case_insensitive() {
let json = r#"{"name":"test","columns":[],"dialect":"bigquery"}"#;
let model: Arc<Model> = serde_json::from_str(json).unwrap();
assert_eq!(model.dialect, Some(DataSource::BigQuery));
}

#[test]
fn test_view_dialect_none_default() {
let json = r#"{"name":"test","statement":"SELECT 1"}"#;
let view: Arc<View> = serde_json::from_str(json).unwrap();
assert!(view.dialect.is_none());
}

#[test]
fn test_view_dialect_roundtrip() {
let expected = ViewBuilder::new("test")
.statement("SELECT * FROM test")
.dialect(DataSource::Postgres)
.build();

let json_str = serde_json::to_string(&expected).unwrap();
assert!(json_str.contains(r#""dialect":"POSTGRES""#));
let actual: Arc<View> = serde_json::from_str(&json_str).unwrap();
assert_eq!(actual.dialect, Some(DataSource::Postgres));
assert_eq!(actual, expected);
}

#[test]
fn test_manifest_with_dialect_models_and_views() {
let model = ModelBuilder::new("revenue")
.ref_sql("SELECT * FROM `project.dataset.table`")
.dialect(DataSource::BigQuery)
.column(ColumnBuilder::new("amount", "decimal").build())
.build();

let view = ViewBuilder::new("summary")
.statement("SELECT date_trunc('month', created_at) FROM orders")
.dialect(DataSource::Postgres)
.build();

let expected = ManifestBuilder::new()
.layout_version(2)
.model(model)
.view(view)
.data_source(DataSource::Postgres)
.build();

let json_str = serde_json::to_string(&expected).unwrap();
let actual: Manifest = serde_json::from_str(&json_str).unwrap();
assert_eq!(actual, expected);
assert_eq!(actual.layout_version, 2);
assert_eq!(actual.models[0].dialect, Some(DataSource::BigQuery));
assert_eq!(actual.views[0].dialect, Some(DataSource::Postgres));
}

#[test]
fn test_manifest_builder_default_layout_version() {
let manifest = ManifestBuilder::new().build();
assert_eq!(manifest.layout_version, 1);
}
}
Loading
Loading