diff --git a/.claude/commands/onboarding-clean.md b/.claude/commands/onboarding-clean.md
index 36ee2408e..e76648262 100644
--- a/.claude/commands/onboarding-clean.md
+++ b/.claude/commands/onboarding-clean.md
@@ -9,16 +9,18 @@ Write and run data cleaning code for a Data Basis dataset.
 
 ## Folder structure
 
-Work in a folder **external to the `pipelines/` repo**:
+Raw data and output live **external to the `pipelines/` repo** (large files are
+never committed). Cleaning code lives **inside** the repo under the model folder.
 
 ```text
-<dataset_root>/
-├── input/          ← raw files (CSV, Excel, JSON, etc.) — do not modify
-├── output/
-│   └── <table_slug>/
-│       └── ano=<year>/sigla_uf=<uf>/   (municipio/UF tables)
-│       └── ano=<year>/                  (Brasil-level tables)
-└── code/
+<dataset_root>/          ← external working directory (e.g. ~/Downloads/<slug>/)
+├── input/               ← raw files (CSV, Excel, JSON, etc.) — do not modify
+└── output/
+    └── <table_slug>/
+        └── ano=<year>/sigla_uf=<uf>/   (municipio/UF tables)
+        └── ano=<year>/                  (Brasil-level tables)
+
+pipelines/models/<dataset_gcp_id>/code/   ← write cleaning scripts here
     └── clean.py    (one script per dataset if tables share raw source)
     └── clean_<table>.py  (one per table if they don't)
 ```
@@ -35,10 +37,15 @@ Read the architecture tables from Drive (URLs from `databasis-architecture` outp
 
 Read the first 20 rows of each raw file to understand structure. Check:
 - File format (CSV, Excel, JSON, fixed-width, etc.)
-- Encoding (UTF-8, ISO-8859-1, etc.)
+- Encoding: try `utf-8-sig` first, fall back to `latin1`. For files where column
+  names with accents come out garbled under `latin1`, re-decode with
+  `.encode("latin1").decode("utf-8", errors="replace")`.
 - Column names and their mapping to architecture names
 - Any header rows, footer rows, or skip rows
 - Date formats
+- Number formatting: Brazilian CSVs often use `.` as thousands separator and `,`
+  as decimal separator (e.g. `"1.234,56"` = 1234.56). Strip `.` first, then
+  replace `,` with `.` before calling `pd.to_numeric`.
 
 ## Step 3 — Write cleaning code
 
@@ -58,13 +65,56 @@ Standard column types:
 - STRING: `.astype(str).str.strip().replace('nan', pd.NA)`
 - DATE: `pd.to_datetime(col, errors='coerce').dt.date`
 
+### Explicit pyarrow schema (required)
+
+Always build an explicit `pa.Schema` and pass it to `pa.Table.from_pandas`. This
+prevents INT64/FLOAT64 mismatches when some partitions have all-integer values
+in columns that should be FLOAT64.
+
+```python
+import pyarrow as pa
+
+def _build_schema() -> pa.Schema:
+    fields = []
+    for col in OUTPUT_COLUMNS:
+        if col in partition_cols:
+            continue
+        if col in INT_COLS:
+            fields.append(pa.field(col, pa.int64()))
+        elif col in FLOAT_COLS:
+            fields.append(pa.field(col, pa.float64()))
+        elif col in DATE_COLS:
+            fields.append(pa.field(col, pa.date32()))
+        else:
+            fields.append(pa.field(col, pa.string()))
+    return pa.schema(fields)
+
+_SCHEMA = _build_schema()
+
+def write_partition(df, ...):
+    table = pa.Table.from_pandas(data, schema=_SCHEMA, preserve_index=False)
+    pq.write_table(table, out / "data.parquet", compression="snappy")
+```
+
+### Geometry columns
+
+If the dataset includes a shapefile or WKT geometry:
+- Convert to WKT using `geopandas`: ensure CRS is EPSG:4674 (SIRGAS 2000).
+- Store as STRING in parquet (`pa.string()`).
+- **Verify the join key** between the shapefile and tabular data — shapefile IDs
+  and tabular IDs are often different systems (e.g. `cd_cnuc` vs `id_uc`).
+  Inspect both before joining.
+- In the DBT model, cast with `ST_GEOGFROMTEXT(col, make_valid => true)` and
+  type the column as GEOGRAPHY, not STRING.
+
 ## Step 4 — Validate subset output
 
 After running on the subset:
-1. Verify column names match architecture exactly
-2. Verify types are correct
-3. Check for unexpected nulls in primary key columns
-4. Print row counts and a sample
+1. Check the parquet schema with `pq.read_schema(path)` — verify all column types
+   match the architecture before uploading.
+2. Verify column names match architecture exactly.
+3. Check for unexpected nulls in primary key columns.
+4. Print row counts and a sample.
 
 Only proceed to full data after subset is verified. Ask the user to confirm.
 
diff --git a/.claude/commands/onboarding-dbt.md b/.claude/commands/onboarding-dbt.md
index ac7991dce..9b513b8f6 100644
--- a/.claude/commands/onboarding-dbt.md
+++ b/.claude/commands/onboarding-dbt.md
@@ -42,6 +42,17 @@ from
 
 Column order must match the architecture table exactly.
 
+### Geometry columns
+
+If the dataset has a WKT geometry column, cast it to GEOGRAPHY — not STRING:
+
+```sql
+st_geogfromtext(safe_cast(geometria as string), make_valid => true) geometria,
+```
+
+`make_valid => true` handles degenerate polygons (rings with fewer than 3 unique
+vertices) that may exist in source shapefiles.
+
 ## Step 3 — Write schema.yaml
 
 One file: `models/<dataset_slug>/schema.yml`
@@ -52,8 +63,12 @@ Template:
 version: 2
 models:
   - name: <dataset_slug>__<table_slug>
-    description: <description in Portuguese from architecture>
+    description: >
+      <description in Portuguese from architecture — use > block scalar whenever
+      the description spans multiple lines or contains a colon>
     tests:
+      - dbt_utils.unique_combination_of_columns:
+          combination_of_columns: [<partition_col>, <primary_key_col>]
       - not_null_proportion_multiple_columns:
           at_least: 0.05
     columns:
@@ -69,10 +84,30 @@ models:
 ```
 
 Rules:
-- Add `not_null` test to partition columns and primary keys
-- Add `relationships` test to any column with a `directory_column` in the architecture
-- Add `not_null_proportion_multiple_columns` at 0.05 to every model
-- Use Portuguese descriptions from architecture
+- **Always use `>` block scalar** for multi-line descriptions or any description
+  that may contain a `:` — bare scalars with `:` in continuation lines break YAML
+  parsing (e.g. `"Fonte: MMA"` on a continuation line triggers a parse error).
+- Add `not_null` test to partition columns and primary keys.
+- Add `relationships` test to any column with a `directory_column` in the architecture.
+- Add `not_null_proportion_multiple_columns` at 0.05 to every model.
+- Use Portuguese descriptions from architecture.
+- For the uniqueness test, prefer a stable string identifier (e.g. `codigo_uc`)
+  over an integer ID that may be NULL in older snapshots.
+
+### Excluding columns from `not_null_proportion_multiple_columns`
+
+The test macro supports an `ignore_values` parameter (not `exclude`):
+
+```yaml
+- not_null_proportion_multiple_columns:
+    at_least: 0.05
+    ignore_values:
+      - column_that_is_legitimately_empty
+      - another_sparse_column
+```
+
+Use this for columns that are 100 % null in the source (headers present but never
+populated by the provider).
 
 ## Step 4 — Check dbt_project.yml
 
diff --git a/.claude/commands/onboarding-discover.md b/.claude/commands/onboarding-discover.md
index 5dc774e1b..6ee2b8480 100644
--- a/.claude/commands/onboarding-discover.md
+++ b/.claude/commands/onboarding-discover.md
@@ -17,7 +17,11 @@ Use the `discover_ids` MCP tool (env from argument):
 discover_ids(env=<env>)
 ```
 
-This returns IDs for: status, bigquery_type, entity, area, license, availability, organization.
+This returns IDs for: status, bigquery_type, entity, license, availability, organization, theme.
+
+**Never search the web, hardcode IDs, or guess slugs.** All reference IDs (themes,
+organizations, licenses, tags, entities, statuses) must come from `discover_ids`
+or `lookup_area`. IDs differ between dev and prod environments.
 
 ## Step 2 — Fetch dataset state
 
@@ -47,10 +51,11 @@ Reference IDs:
   entity.year:              <id>
   entity.state:             <id>
   entity.municipality:      <id>
-  entity.financing_phase:   <id>
-  entity.financing_account: <id>
   area.br:                  <id>
   bigquery_type.INT64:       <id>
+  availability.online:      <id>
+  organization.<slug>:      <id>
+  theme.<slug>:             <id>
   ...
 
 Dataset:
diff --git a/models/br_mma_cnuc/br_mma_cnuc__unidades_conservacao.sql b/models/br_mma_cnuc/br_mma_cnuc__unidades_conservacao.sql
new file mode 100644
index 000000000..183834e17
--- /dev/null
+++ b/models/br_mma_cnuc/br_mma_cnuc__unidades_conservacao.sql
@@ -0,0 +1,70 @@
+{{
+    config(
+        alias="unidades_conservacao",
+        schema="br_mma_cnuc",
+        materialized="table",
+    )
+}}
+
+select
+    safe_cast(ano as int64) ano,
+    safe_cast(semestre as int64) semestre,
+    safe_cast(id_uc as int64) id_uc,
+    safe_cast(codigo_uc as string) codigo_uc,
+    safe_cast(nome_uc as string) nome_uc,
+    safe_cast(esfera_administrativa as string) esfera_administrativa,
+    safe_cast(categoria_manejo as string) categoria_manejo,
+    safe_cast(categoria_iucn as string) categoria_iucn,
+    safe_cast(grupo as string) grupo,
+    safe_cast(protecao_integral as int64) protecao_integral,
+    safe_cast(uso_sustentavel as int64) uso_sustentavel,
+    safe_cast(sigla_uf as string) sigla_uf,
+    safe_cast(municipios_abrangidos as string) municipios_abrangidos,
+    safe_cast(ano_criacao as int64) ano_criacao,
+    safe_cast(ano_ato_legal_recente as int64) ano_ato_legal_recente,
+    safe_cast(ato_legal_criacao as string) ato_legal_criacao,
+    safe_cast(outros_atos_legais as string) outros_atos_legais,
+    safe_cast(plano_manejo as string) plano_manejo,
+    safe_cast(conselho_gestor as string) conselho_gestor,
+    safe_cast(orgao_gestor as string) orgao_gestor,
+    safe_cast(informacoes_gerais as string) informacoes_gerais,
+    safe_cast(fonte_area as int64) fonte_area,
+    safe_cast(area_soma_biomas as float64) area_soma_biomas,
+    safe_cast(area_soma_biomas_continental as float64) area_soma_biomas_continental,
+    safe_cast(area_ato_legal_criacao as float64) area_ato_legal_criacao,
+    safe_cast(area_amazonia as float64) area_amazonia,
+    safe_cast(area_caatinga as float64) area_caatinga,
+    safe_cast(area_cerrado as float64) area_cerrado,
+    safe_cast(area_mata_atlantica as float64) area_mata_atlantica,
+    safe_cast(area_pampa as float64) area_pampa,
+    safe_cast(area_pantanal as float64) area_pantanal,
+    safe_cast(area_marinha as float64) area_marinha,
+    safe_cast(bioma_declarado as string) bioma_declarado,
+    safe_cast(biomas_abrangidos as string) biomas_abrangidos,
+    safe_cast(percentual_alem_linha_costa as float64) percentual_alem_linha_costa,
+    safe_cast(recortes as float64) recortes,
+    safe_cast(mar_territorial as float64) mar_territorial,
+    safe_cast(municipio_costeiro as float64) municipio_costeiro,
+    safe_cast(
+        municipio_costeiro_area_marinha as float64
+    ) municipio_costeiro_area_marinha,
+    safe_cast(amazonia_legal as float64) amazonia_legal,
+    safe_cast(lei_mata_atlantica as float64) lei_mata_atlantica,
+    safe_cast(sobreposicao_ti_tq as float64) sobreposicao_ti_tq,
+    safe_cast(programa_projeto as string) programa_projeto,
+    safe_cast(sitios_patrimonio_mundial as string) sitios_patrimonio_mundial,
+    safe_cast(sitios_ramsar as string) sitios_ramsar,
+    safe_cast(mosaico as string) mosaico,
+    safe_cast(reserva_biosfera as string) reserva_biosfera,
+    safe_cast(codigo_wdpa as string) codigo_wdpa,
+    safe_cast(regiao as string) regiao,
+    safe_cast(
+        qualidade_dados_georreferenciados as string
+    ) qualidade_dados_georreferenciados,
+    safe_cast(presente_versao_anterior as string) presente_versao_anterior,
+    safe_cast(diferenca_area as float64) diferenca_area,
+    safe_cast(razao_diferenca_area as float64) razao_diferenca_area,
+    safe_cast(data_publicacao_cnuc as date) data_publicacao_cnuc,
+    safe_cast(data_ultima_certificacao as date) data_ultima_certificacao,
+    st_geogfromtext(safe_cast(geometria as string), make_valid => true) geometria,
+from {{ set_datalake_project("br_mma_cnuc_staging.unidades_conservacao") }} as t
diff --git a/models/br_mma_cnuc/code/clean.py b/models/br_mma_cnuc/code/clean.py
new file mode 100644
index 000000000..ace644961
--- /dev/null
+++ b/models/br_mma_cnuc/code/clean.py
@@ -0,0 +1,389 @@
+"""
+br_mma_cnuc — Unidades de Conservação (CNUC)
+Cleans all biannual CSV snapshots into a single stacked parquet table.
+Geometry (WKT) is merged from shapefiles where available.
+Partitioned by ano + semestre.
+
+Usage:
+    python clean.py                # full run
+    python clean.py --subset       # 2018_1 only for validation
+"""
+
+import argparse
+import re
+from pathlib import Path
+
+import geopandas as gpd
+import pandas as pd
+import pyarrow as pa
+import pyarrow.parquet as pq
+
+# ── Paths ──────────────────────────────────────────────────────────────────
+ROOT = Path(__file__).resolve().parent.parent
+INPUT_DIR = ROOT / "input"
+OUTPUT_DIR = ROOT / "output" / "unidades_conservacao"
+OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+
+# ── Shapefile sources: (ano, semestre) → polygon shapefile path ────────────
+# Points-only files (shp_2024_1) are excluded; 2025 shapefiles not yet available.
+SHAPEFILES = {
+    (2024, 2): INPUT_DIR / "shp_2024_2" / "shp_cnuc_2024_10_pol.shp",
+}
+
+# ── Column mapping: raw → bd ───────────────────────────────────────────────
+COLUMN_MAP = {
+    "ID_UC": "id_uc",
+    "Código UC": "codigo_uc",
+    "Nome da UC": "nome_uc",
+    "NOME DA UC": "nome_uc",
+    "Informações Gerais": "informacoes_gerais",
+    "Esfera Administrativa": "esfera_administrativa",
+    "Categoria de Manejo": "categoria_manejo",
+    "Categoria IUCN": "categoria_iucn",
+    "Grupo": "grupo",
+    "PI": "protecao_integral",
+    "US": "uso_sustentavel",
+    "UF": "sigla_uf",
+    "Municípios Abrangidos": "municipios_abrangidos",
+    "Ano de criação": "ano_criacao",
+    "Ano de Criação": "ano_criacao",
+    "Ano do ato legal mais recente": "ano_ato_legal_recente",
+    "Ato Legal de Criação": "ato_legal_criacao",
+    "Outros atos legais": "outros_atos_legais",
+    "Plano de Manejo": "plano_manejo",
+    "Conselho Gestor": "conselho_gestor",
+    "Órgão Gestor": "orgao_gestor",
+    "Fonte da Área: (1 = SHP, 0 = Ato legal)": "fonte_area",
+    # Area totals
+    "Área (ha)": "area_soma_biomas",  # 2018 only — treated as total
+    "Área soma biomas": "area_soma_biomas",
+    "Bioma Área (ha)": None,  # drop — duplicate of area_soma_biomas
+    "Área soma Biomas Continental": "area_soma_biomas_continental",
+    "Área Ato Legal de Criação": "area_ato_legal_criacao",
+    # Biome areas (unit = ha, tracked in measurement_unit field)
+    "Área (ha) Amazônia": "area_amazonia",
+    "Amazônia": "area_amazonia",
+    "Área (ha) Caatinga": "area_caatinga",
+    "Caatinga": "area_caatinga",
+    "Área (ha) Cerrado": "area_cerrado",
+    "Cerrado": "area_cerrado",
+    "Área (ha) Mata Atlântica": "area_mata_atlantica",
+    "Mata Atlântica": "area_mata_atlantica",
+    "Área (ha) Pampa": "area_pampa",
+    "Pampa": "area_pampa",
+    "Área (ha) Pantanal": "area_pantanal",
+    "Pantanal": "area_pantanal",
+    "Área (ha) Área Marinha": "area_marinha",
+    "Área Marinha": "area_marinha",
+    # Biome metadata
+    "Bioma declarado": "bioma_declarado",
+    "Biomas Abrangidos": "biomas_abrangidos",
+    "% Além da linha de costa": "percentual_alem_linha_costa",
+    # Recortes
+    "Recortes (ha)": "recortes",
+    "Mar Territorial": "mar_territorial",
+    "Município Costeiro": "municipio_costeiro",
+    "Município Costeiro + Área Marinha": "municipio_costeiro_area_marinha",
+    "Amazônia Legal": "amazonia_legal",
+    "Lei da Mata Atlântica": "lei_mata_atlantica",
+    "Sobreposição com TI ou TQ": "sobreposicao_ti_tq",
+    # Programs / flags
+    "Programa/Projeto": "programa_projeto",
+    "Sítios do Patrimônio Mundial": "sitios_patrimonio_mundial",
+    "Sítios do Patrimônio Natural": "sitios_patrimonio_mundial",  # old name
+    "Sítios Ramsar": "sitios_ramsar",
+    "Mosaico": "mosaico",
+    "Reserva da Biosfera": "reserva_biosfera",
+    # IDs
+    "Código WDPA": "codigo_wdpa",
+    "WDPAID": "codigo_wdpa",
+    # QA / admin (2024+)
+    "Região": "regiao",
+    "Qualidade dos dados georreferenciados": "qualidade_dados_georreferenciados",
+    "Presente na versão anterior": "presente_versao_anterior",
+    "Diferença Área": "diferenca_area",
+    "Razão Diferença Área": "razao_diferenca_area",
+    "Data da publicação no CNUC": "data_publicacao_cnuc",
+    "Data da última certificação dos dados pelo Órgão Gestor": "data_ultima_certificacao",
+}
+
+# ── Final column order ─────────────────────────────────────────────────────
+OUTPUT_COLUMNS = [
+    "ano",
+    "semestre",
+    "id_uc",
+    "codigo_uc",
+    "nome_uc",
+    "esfera_administrativa",
+    "categoria_manejo",
+    "categoria_iucn",
+    "grupo",
+    "protecao_integral",
+    "uso_sustentavel",
+    "sigla_uf",
+    "municipios_abrangidos",
+    "ano_criacao",
+    "ano_ato_legal_recente",
+    "ato_legal_criacao",
+    "outros_atos_legais",
+    "plano_manejo",
+    "conselho_gestor",
+    "orgao_gestor",
+    "informacoes_gerais",
+    "fonte_area",
+    "area_soma_biomas",
+    "area_soma_biomas_continental",
+    "area_ato_legal_criacao",
+    "area_amazonia",
+    "area_caatinga",
+    "area_cerrado",
+    "area_mata_atlantica",
+    "area_pampa",
+    "area_pantanal",
+    "area_marinha",
+    "bioma_declarado",
+    "biomas_abrangidos",
+    "percentual_alem_linha_costa",
+    "recortes",
+    "mar_territorial",
+    "municipio_costeiro",
+    "municipio_costeiro_area_marinha",
+    "amazonia_legal",
+    "lei_mata_atlantica",
+    "sobreposicao_ti_tq",
+    "programa_projeto",
+    "sitios_patrimonio_mundial",
+    "sitios_ramsar",
+    "mosaico",
+    "reserva_biosfera",
+    "codigo_wdpa",
+    "regiao",
+    "qualidade_dados_georreferenciados",
+    "presente_versao_anterior",
+    "diferenca_area",
+    "razao_diferenca_area",
+    "data_publicacao_cnuc",
+    "data_ultima_certificacao",
+    "geometria",
+]
+
+FLOAT_COLS = {
+    "area_soma_biomas",
+    "area_soma_biomas_continental",
+    "area_ato_legal_criacao",
+    "area_amazonia",
+    "area_caatinga",
+    "area_cerrado",
+    "area_mata_atlantica",
+    "area_pampa",
+    "area_pantanal",
+    "area_marinha",
+    "percentual_alem_linha_costa",
+    "recortes",
+    "mar_territorial",
+    "municipio_costeiro",
+    "municipio_costeiro_area_marinha",
+    "amazonia_legal",
+    "lei_mata_atlantica",
+    "sobreposicao_ti_tq",
+    "diferenca_area",
+    "razao_diferenca_area",
+}
+
+INT_COLS = {
+    "id_uc",
+    "ano_criacao",
+    "ano_ato_legal_recente",
+    "fonte_area",
+    "protecao_integral",
+    "uso_sustentavel",
+}
+
+DATE_COLS = {"data_publicacao_cnuc", "data_ultima_certificacao"}
+
+
+def parse_filename(path: Path) -> tuple[int, int]:
+    m = re.search(r"cnuc_(\d{4})_(\d)", path.name)
+    if not m:
+        raise ValueError(f"Cannot parse ano/semestre from {path.name}")
+    return int(m.group(1)), int(m.group(2))
+
+
+def read_csv(path: Path) -> pd.DataFrame:
+    for enc in ("utf-8-sig", "latin1"):
+        try:
+            return pd.read_csv(
+                path, sep=";", encoding=enc, dtype=str, low_memory=False
+            )
+        except UnicodeDecodeError:
+            continue
+    raise ValueError(f"Cannot decode {path}")
+
+
+def clean_string(s: pd.Series) -> pd.Series:
+    return (
+        s.astype(str)
+        .str.strip()
+        .replace({"nan": pd.NA, "-": pd.NA, "": pd.NA})
+    )
+
+
+def load_geometry(shp_path: Path) -> dict[str, str]:
+    """Return {cd_cnuc: wkt_string} from a polygon shapefile."""
+    gdf = gpd.read_file(shp_path, encoding="latin1")
+    if gdf.crs is None or gdf.crs.to_epsg() != 4674:
+        gdf = gdf.to_crs(epsg=4674)
+    gdf["_wkt"] = gdf.geometry.to_wkt()
+    return dict(
+        zip(gdf["cd_cnuc"].astype(str).str.strip(), gdf["_wkt"], strict=False)
+    )
+
+
+def clean_file(path: Path, geo_lookup: dict[int, str] | None) -> pd.DataFrame:
+    ano, semestre = parse_filename(path)
+    df = read_csv(path)
+
+    # Drop pandas-generated duplicate columns (.1, .2 suffixes)
+    df = df.loc[:, ~df.columns.str.match(r".*\.\d+$")]
+
+    # Rename columns
+    rename = {c: COLUMN_MAP[c] for c in df.columns if c in COLUMN_MAP}
+    df = df.rename(columns=rename)
+
+    # Drop columns mapped to None
+    drop = [c for c in df.columns if c in COLUMN_MAP and COLUMN_MAP[c] is None]
+    df = df.drop(columns=drop, errors="ignore")
+
+    # Keep only known columns
+    known = set(OUTPUT_COLUMNS)
+    df = df[[c for c in df.columns if c in known]]
+
+    # Add partition columns
+    df["ano"] = ano
+    df["semestre"] = semestre
+
+    # Add missing output columns as NA
+    for col in OUTPUT_COLUMNS:
+        if col not in df.columns:
+            df[col] = pd.NA
+
+    # Type casts — int
+    for col in INT_COLS:
+        if col in df.columns:
+            cleaned = (
+                df[col]
+                .astype(str)
+                .str.replace(".", "", regex=False)
+                .str.strip()
+            )
+            s = pd.to_numeric(cleaned, errors="coerce")
+            mask = s.isna()
+            arr = s.fillna(0).astype(int).astype("Int64")
+            arr[mask] = pd.NA
+            df[col] = arr
+
+    # Type casts — float (Brazilian formatting: period=thousands, comma=decimal)
+    for col in FLOAT_COLS:
+        if col in df.columns:
+            df[col] = (
+                df[col]
+                .astype(str)
+                .str.replace(".", "", regex=False)
+                .str.replace(",", ".", regex=False)
+            )
+            df[col] = pd.to_numeric(df[col], errors="coerce")
+
+    # Type casts — date
+    for col in DATE_COLS:
+        if col in df.columns:
+            df[col] = pd.to_datetime(
+                df[col], dayfirst=True, errors="coerce"
+            ).dt.date
+
+    # Type casts — string
+    str_cols = [
+        c
+        for c in OUTPUT_COLUMNS
+        if c not in INT_COLS | FLOAT_COLS | DATE_COLS
+        and c not in {"ano", "semestre", "geometria"}
+    ]
+    for col in str_cols:
+        if col in df.columns:
+            df[col] = clean_string(df[col])
+
+    # Merge geometry on codigo_uc (cd_cnuc in shapefile — 100% coverage)
+    if geo_lookup is not None and "codigo_uc" in df.columns:
+        df["geometria"] = (
+            df["codigo_uc"].astype(str).str.strip().map(geo_lookup)
+        )
+    else:
+        df["geometria"] = pd.NA
+
+    return df[OUTPUT_COLUMNS]
+
+
+def _build_schema() -> pa.Schema:
+    """Build a fixed pyarrow schema so all partitions are type-consistent."""
+    fields = []
+    for col in OUTPUT_COLUMNS:
+        if col in ("ano", "semestre"):
+            continue
+        if col in INT_COLS:
+            fields.append(pa.field(col, pa.int64()))
+        elif col in FLOAT_COLS:
+            fields.append(pa.field(col, pa.float64()))
+        elif col in DATE_COLS:
+            fields.append(pa.field(col, pa.date32()))
+        else:
+            fields.append(pa.field(col, pa.string()))
+    return pa.schema(fields)
+
+
+_SCHEMA = _build_schema()
+
+
+def write_partition(df: pd.DataFrame, ano: int, semestre: int):
+    out = OUTPUT_DIR / f"ano={ano}" / f"semestre={semestre}"
+    out.mkdir(parents=True, exist_ok=True)
+    data = df.drop(columns=["ano", "semestre"])
+    table = pa.Table.from_pandas(data, schema=_SCHEMA, preserve_index=False)
+    pq.write_table(table, out / "data.parquet", compression="snappy")
+
+
+def main(subset: bool = False):
+    files = sorted(INPUT_DIR.glob("cnuc_*.csv"))
+    if subset:
+        files = [f for f in files if "2018_1" in f.name]
+
+    # Pre-load geometry lookups
+    geo_cache: dict[tuple[int, int], dict[int, str]] = {}
+    for key, shp_path in SHAPEFILES.items():
+        if shp_path.exists():
+            print(f"  Loading geometry for {key} ...", flush=True)
+            geo_cache[key] = load_geometry(shp_path)
+            print(f"    {len(geo_cache[key]):,} polygons loaded")
+
+    total_rows = 0
+    for path in files:
+        ano, semestre = parse_filename(path)
+        geo = geo_cache.get((ano, semestre))
+        print(f"  Processing {path.name} ...", end=" ", flush=True)
+        df = clean_file(path, geo)
+        write_partition(df, ano, semestre)
+        n_geo = df["geometria"].notna().sum()
+        print(
+            f"{len(df):,} rows (geometry: {n_geo:,}) → ano={ano}/semestre={semestre}"
+        )
+        total_rows += len(df)
+
+    print(f"\nDone. Total rows: {total_rows:,}")
+    print(f"Output: {OUTPUT_DIR}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--subset", action="store_true", help="Run on 2018_1 only"
+    )
+    args = parser.parse_args()
+    main(subset=args.subset)
diff --git a/models/br_mma_cnuc/code/upload.py b/models/br_mma_cnuc/code/upload.py
new file mode 100644
index 000000000..69cc44e0f
--- /dev/null
+++ b/models/br_mma_cnuc/code/upload.py
@@ -0,0 +1,16 @@
+import basedosdados as bd
+
+DATASET_ID = "br_mma_cnuc"
+TABLE_ID = "unidades_conservacao"
+BILLING_PROJECT = "basedosdados-dev"
+
+tb = bd.Table(dataset_id=DATASET_ID, table_id=TABLE_ID)
+
+path_to_data = "/Users/rdahis/Downloads/CNUC/output/unidades_conservacao"
+
+tb.create(
+    path=path_to_data,
+    if_storage_data_exists="replace",
+    if_table_exists="replace",
+    source_format="parquet",
+)
diff --git a/models/br_mma_cnuc/schema.yaml b/models/br_mma_cnuc/schema.yaml
new file mode 100644
index 000000000..efa9a9149
--- /dev/null
+++ b/models/br_mma_cnuc/schema.yaml
@@ -0,0 +1,134 @@
+---
+version: 2
+models:
+  - name: br_mma_cnuc__unidades_conservacao
+    description: >
+      Cadastro Nacional de Unidades de Conservação (CNUC) — registro semestral de
+      todas as unidades de conservação brasileiras, com atributos administrativos,
+      legais, de cobertura de biomas e georreferenciamento.
+      Fonte: Ministério do Meio Ambiente e Mudança do Clima (MMA).
+    tests:
+      - dbt_utils.unique_combination_of_columns:
+          combination_of_columns: [ano, semestre, codigo_uc]
+      - not_null_proportion_multiple_columns:
+          at_least: 0.05
+          ignore_values:
+            - razao_diferenca_area
+            - data_ultima_certificacao
+            - lei_mata_atlantica
+            - recortes
+            - sobreposicao_ti_tq
+            - amazonia_legal
+    columns:
+      - name: ano
+        description: Ano de referência do cadastro
+      - name: semestre
+        description: Semestre de referência (1 ou 2)
+      - name: id_uc
+        description: Identificador interno da unidade de conservação
+      - name: codigo_uc
+        description: Código da unidade de conservação no CNUC
+      - name: nome_uc
+        description: Nome da unidade de conservação
+      - name: esfera_administrativa
+        description: Esfera administrativa (Federal, Estadual, Municipal)
+      - name: categoria_manejo
+        description: Categoria de manejo
+      - name: categoria_iucn
+        description: Categoria IUCN
+      - name: grupo
+        description: Grupo (PI = Proteção Integral, US = Uso Sustentável)
+      - name: protecao_integral
+        description: Indicador de proteção integral (1 = sim)
+      - name: uso_sustentavel
+        description: Indicador de uso sustentável (1 = sim)
+      - name: sigla_uf
+        description: Sigla da unidade federativa
+      - name: municipios_abrangidos
+        description: Municípios abrangidos pela UC
+      - name: ano_criacao
+        description: Ano de criação da UC
+      - name: ano_ato_legal_recente
+        description: Ano do ato legal mais recente
+      - name: ato_legal_criacao
+        description: Ato legal de criação
+      - name: outros_atos_legais
+        description: Outros atos legais
+      - name: plano_manejo
+        description: Existência de plano de manejo
+      - name: conselho_gestor
+        description: Existência de conselho gestor
+      - name: orgao_gestor
+        description: Órgão gestor da UC
+      - name: informacoes_gerais
+        description: Informações gerais
+      - name: fonte_area
+        description: Fonte da área (1 = shapefile, 0 = ato legal)
+      - name: area_soma_biomas
+        description: Área total (soma dos biomas) em hectares
+      - name: area_soma_biomas_continental
+        description: Área continental total (soma dos biomas) em hectares
+      - name: area_ato_legal_criacao
+        description: Área conforme ato legal de criação em hectares
+      - name: area_amazonia
+        description: Área no bioma Amazônia em hectares
+      - name: area_caatinga
+        description: Área no bioma Caatinga em hectares
+      - name: area_cerrado
+        description: Área no bioma Cerrado em hectares
+      - name: area_mata_atlantica
+        description: Área no bioma Mata Atlântica em hectares
+      - name: area_pampa
+        description: Área no bioma Pampa em hectares
+      - name: area_pantanal
+        description: Área no bioma Pantanal em hectares
+      - name: area_marinha
+        description: Área marinha em hectares
+      - name: bioma_declarado
+        description: Bioma declarado pelo órgão gestor
+      - name: biomas_abrangidos
+        description: Biomas abrangidos pela UC
+      - name: percentual_alem_linha_costa
+        description: Percentual além da linha de costa
+      - name: recortes
+        description: Área total de recortes em hectares
+      - name: mar_territorial
+        description: Área no mar territorial em hectares
+      - name: municipio_costeiro
+        description: Área em município costeiro em hectares
+      - name: municipio_costeiro_area_marinha
+        description: Área em município costeiro com área marinha em hectares
+      - name: amazonia_legal
+        description: Área na Amazônia Legal em hectares
+      - name: lei_mata_atlantica
+        description: Área sob Lei da Mata Atlântica em hectares
+      - name: sobreposicao_ti_tq
+        description: Sobreposição com terra indígena ou território quilombola em hectares
+      - name: programa_projeto
+        description: Programa ou projeto associado
+      - name: sitios_patrimonio_mundial
+        description: Sítios do patrimônio mundial
+      - name: sitios_ramsar
+        description: Sítios Ramsar
+      - name: mosaico
+        description: Mosaico de unidades de conservação
+      - name: reserva_biosfera
+        description: Reserva da biosfera
+      - name: codigo_wdpa
+        description: Código WDPA
+      - name: regiao
+        description: Região geográfica
+      - name: qualidade_dados_georreferenciados
+        description: Qualidade dos dados georreferenciados
+      - name: presente_versao_anterior
+        description: Presente na versão anterior do cadastro
+      - name: diferenca_area
+        description: Diferença de área em relação à versão anterior em hectares
+      - name: razao_diferenca_area
+        description: Razão da diferença de área
+      - name: data_publicacao_cnuc
+        description: Data de publicação no CNUC
+      - name: data_ultima_certificacao
+        description: Data da última certificação pelo órgão gestor
+      - name: geometria
+        description: Geometria da UC (SIRGAS 2000 / EPSG:4674)