Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3122c49
feat(app): declarative storage bundle, hybrid Lakebase/Delta backend,…
berrybluecode May 11, 2026
1f6e53d
feat(app): simplify Lakebase backend — connect to databricks_postgres…
berrybluecode May 12, 2026
8fda0ee
Merge branch 'main' into feat/app-refactor-backend-add-lakebase
berrybluecode May 12, 2026
8c8f785
docs: bump v0.13.0 → v0.14.0 GitHub URLs to satisfy fmt check
berrybluecode May 12, 2026
71563f9
fix(deps): make OltpExecutor a real Union type alias and actually use it
berrybluecode May 12, 2026
87c8d38
refactor(app): atomic Pg migrations, fqn() helper, bind/wheel script …
berrybluecode May 12, 2026
233ee20
Merge branch 'feat/app-refactor-backend-add-lakebase' of github-perso…
berrybluecode May 12, 2026
3f247de
fix(ci): restore trailing newline in _metadata.py and bump v0.13.0 Gi…
berrybluecode May 12, 2026
151a52d
fix pytest
berrybluecode May 12, 2026
bde6ab7
fix test
berrybluecode May 13, 2026
06757c4
feat(app): per-run review status, embedded Insights dashboard, and br…
berrybluecode Jun 11, 2026
ee5cc52
feat(app): add Lakebase OLTP backend, contract/AI rule generation, i1…
berrybluecode Jun 22, 2026
011d118
Merge branch 'main' of github-personal:databrickslabs/dqx into feat/a…
berrybluecode Jun 22, 2026
87ab971
updated api docs command
mwojtyczka Jun 24, 2026
e0fb046
Merge branch 'main' into feat/app-ui-i18n-insights-contracts
mwojtyczka Jun 24, 2026
236447a
Address PR review: validate contract rules, cap LLM tokens, i18n, sch…
berrybluecode Jun 24, 2026
cddd404
Merge branch 'feat/app-ui-i18n-insights-contracts' of github-personal…
berrybluecode Jun 24, 2026
e08667c
fix(app): address PR review on review-status, i18n, LLM fan-out, and …
berrybluecode Jun 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 38 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,33 @@ app-test: ## Run app backend pytest suite (K=<expr> filter, COV=1 for coverage)

##@ App deploy (require PROFILE=<databricks-profile>; most also need TARGET=<bundle-target>)

# Minimum Databricks CLI version required to deploy. The ``postgres_roles``
# resource in ``app/databricks.yml`` (one-button Lakebase provisioning) was
# added in CLI v1.4.0 (https://github.com/databricks/cli/pull/5467); older
# CLIs reject the unknown field at ``bundle validate`` with a confusing
# error deep inside the deploy. Fail fast here with an actionable message.
DATABRICKS_MIN_VERSION := 1.4.0

# Preflight: assert the CLI exists and is new enough. Used as a prerequisite
# of app-deploy / app-bind so the version gate runs before any build or
# network call. The sort -V trick: the smallest of {MIN, installed} equals
# MIN exactly when installed >= MIN (handles 1.10 > 1.4 unlike a string sort).
app-check-cli: ## Verify the Databricks CLI meets the minimum version for deploy
@command -v databricks >/dev/null 2>&1 || \
{ echo "ERROR: 'databricks' CLI not found on PATH. Install: https://docs.databricks.com/dev-tools/cli/install.html"; exit 1; }
@ver=$$(databricks --version 2>/dev/null | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -1); \
if [ -z "$$ver" ]; then \
echo "ERROR: could not parse a version from 'databricks --version'."; exit 1; \
fi; \
if [ "$$(printf '%s\n%s\n' "$(DATABRICKS_MIN_VERSION)" "$$ver" | sort -V | head -1)" != "$(DATABRICKS_MIN_VERSION)" ]; then \
echo "ERROR: Databricks CLI v$$ver is too old; v$(DATABRICKS_MIN_VERSION)+ is required to deploy DQX Studio."; \
echo " app/databricks.yml uses the 'postgres_roles' resource (one-button Lakebase),"; \
echo " which older CLIs reject at 'bundle validate' with an unknown-field error."; \
echo " Upgrade: brew upgrade databricks (or https://docs.databricks.com/dev-tools/cli/install.html)"; \
exit 1; \
fi; \
echo "✓ Databricks CLI v$$ver (>= $(DATABRICKS_MIN_VERSION))"

# Grant Unity Catalog permissions after bundle deploy.
#
# Usage: make app-grant-permissions PROFILE=my-profile
Expand Down Expand Up @@ -216,7 +243,7 @@ app-grant-permissions: ## Run post-deploy GRANTs (called by app-deploy; standalo
# to ``make app-deploy`` — the bind step reads the resolved bundle
# variables to know which instance/schema name to bind to, so an
# override applied only at deploy time would bind the wrong resource.
app-bind: ## Adopt pre-existing UC schemas / volume / Lakebase into bundle (run ONCE per workspace)
app-bind: app-check-cli ## Adopt pre-existing UC schemas / volume / Lakebase into bundle (run ONCE per workspace)
@test -n "$(PROFILE)" || (echo "Usage: make app-bind PROFILE=<databricks-profile> TARGET=<bundle-target>"; exit 1)
@test -n "$(TARGET)" || (echo "Usage: make app-bind PROFILE=<databricks-profile> TARGET=<bundle-target>"; exit 1)
app/scripts/bind_resources.sh -p $(PROFILE) -t $(TARGET) $(if $(BUNDLE_VARS),-- $(BUNDLE_VARS))
Expand Down Expand Up @@ -245,10 +272,17 @@ app-bind: ## Adopt pre-existing UC schemas / volume / Lakebase into bundle (run
# after a manual delete (the deploy errors out with "Instance name is
# not unique" otherwise). Per-deploy CLI overrides keep ``databricks.yml``
# clean.
app-deploy: app-build ## Build, deploy bundle, grant permissions, and start app
#
# FORCE=1 appends ``--force`` to ``bundle deploy``. Use it when the deploy
# aborts because a resource was modified in the workspace UI since the last
# deploy (e.g. "dashboard ... has been modified remotely") — ``--force``
# overwrites the remote copy with the bundle's local definition. This DROPS
# any in-UI edits, so only set it once you've confirmed the local version is
# the one you want to ship.
app-deploy: app-check-cli app-build ## Build, deploy bundle, grant permissions, and start app (FORCE=1 to overwrite remote edits)
@test -n "$(PROFILE)" || (echo "Usage: make app-deploy PROFILE=<databricks-profile> TARGET=<bundle-target>"; exit 1)
@test -n "$(TARGET)" || (echo "Usage: make app-deploy PROFILE=<databricks-profile> TARGET=<bundle-target>"; exit 1)
cd app && databricks bundle deploy -p $(PROFILE) -t $(TARGET) $(BUNDLE_VARS)
cd app && databricks bundle deploy -p $(PROFILE) -t $(TARGET) $(if $(FORCE),--force) $(BUNDLE_VARS)
app/scripts/post_deploy_grants.sh -p $(PROFILE) -t $(TARGET) $(if $(BUNDLE_VARS),-- $(BUNDLE_VARS))
cd app && databricks bundle run $(APP_NAME) -p $(PROFILE) -t $(TARGET) $(BUNDLE_VARS)

Expand Down Expand Up @@ -290,4 +324,4 @@ fork-sync: ## Mirror a fork PR to a branch in the main repo for full CI (PR=<num
./.github/scripts/fork-sync-pr.sh $(PR)

.DEFAULT: all
.PHONY: help all clean dev lint fmt test integration e2e perf anomaly coverage combine-coverage docs-build docs-serve-dev docs-install docs-serve docs-clean app-install app-build app-start-dev app-stop-dev app-regen-api app-check app-test app-grant-permissions app-bind app-deploy fork-sync build lock-dependencies lock-app-dependencies
.PHONY: help all clean dev lint fmt test integration e2e perf anomaly coverage combine-coverage docs-build docs-serve-dev docs-install docs-serve docs-clean app-install app-build app-start-dev app-stop-dev app-regen-api app-check app-test app-check-cli app-grant-permissions app-bind app-deploy fork-sync build lock-dependencies lock-app-dependencies
45 changes: 45 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Security Policy

## Reporting a Vulnerability

**Please do not report security vulnerabilities through public GitHub issues, discussions, or pull requests.**

If you believe you've found a security vulnerability in DQX (the library, the CLI, or the DQX Studio web application), report it privately through one of the following channels:

1. **GitHub Security Advisories (preferred)** — open a private advisory on the [DQX repository](https://github.com/databrickslabs/dqx/security/advisories/new). This keeps the report confidential until a fix is released.
2. **Email** — if you cannot use GitHub Security Advisories, contact the Databricks Labs maintainers at [labs-oss@databricks.com](mailto:labs-oss@databricks.com) with `DQX SECURITY` in the subject line.

### What to include

To help us triage quickly, please include as much of the following as you can:

- A description of the issue and its potential impact.
- Steps to reproduce (proof-of-concept code, sample input, or a minimal repro repo).
- The affected version(s) of DQX and, if relevant, the Databricks Runtime version.
- Any mitigations or workarounds you have identified.

We will acknowledge receipt within **5 business days** and provide a more detailed response within **10 business days** indicating our next steps. We aim to patch confirmed vulnerabilities in a timely manner and will coordinate disclosure with the reporter.

## Supported Versions

Security fixes are only guaranteed for the **latest minor release** on the [`main` branch](https://github.com/databrickslabs/dqx) and published on [PyPI](https://pypi.org/project/databricks-labs-dqx/). Please upgrade to the latest release before reporting an issue when possible.

## Scope

The following are in scope:

- The `databricks-labs-dqx` Python library and CLI.
- The DQX Studio web application under `app/` (FastAPI backend + React frontend).
- The task-runner Databricks Job under `app/tasks/`.
- Example code in `demos/` (informational only — reports welcome but low priority).

Out of scope:

- Vulnerabilities in third-party dependencies that are better reported upstream.
- Issues requiring a privileged user (e.g., workspace admin) to already be compromised.
- Denial-of-service resulting from reasonable use of Databricks platform limits.

## Disclosure Policy

- We follow a **coordinated disclosure** process and will credit reporters in release notes unless they prefer anonymity.
- We request that reporters give us a reasonable window (typically 90 days) to issue a fix before any public disclosure.
13 changes: 12 additions & 1 deletion app/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,18 @@ RBAC is enforced — routes use `require_role(*roles)` from `backend/dependencie
2. **Business user adjusts existing rules** — load → edit → optional dry-run → save (creates new version + approval request)
3. **Engineer reviews and approves rules** — review GUI/YAML → optional dry-run → configure checks storage → approve → export to Delta table
4. **Engineer generates rules via profiler** — select table → configure sampling → run profiler → review candidates → save
5. **User browses and discovers rules** — filter by table/domain/owner/status → view versions → compare → import/export
5. **Engineer pins a schema contract** — open Schema validation flow → pick target table → DDL (snapshot or hand-write) or reference table → strict/compatible mode → dry-run → save
6. **Data product owner imports rules** — Import rules page → pick **From DQX YAML** or **From data contract** tab → review preview → save drafts
7. **User browses and discovers rules** — filter by table/domain/owner/status → view versions → compare → import/export

### Dataset-level rule convention (`__sql_check__/<name>`)

Single-table rules carry a real `table_fqn`. Dataset-level rules (cross-table SQL checks and `has_valid_schema` schema-validation rules) instead use the synthetic prefix `__sql_check__/<name>` so they all bucket under the **Cross-table rules** group in the UI catalog and edit-router.

- **SQL checks** store their query body inline; the runner reads it from `arguments.sql_query`.
- **`has_valid_schema` checks** store the real table FQN in `user_metadata.target_table`; the runner creates the input view from *that* table while keeping the synthetic FQN for history grouping, and dispatches through the standard row-level engine (NOT the SQL fast-path).

Both code paths live in `backend/routes/v1/dryrun.py` and `backend/services/scheduler_service.py`. If you add another dataset-level rule kind, follow the same convention and update both dispatchers in lock-step.

## Internal Storage

Expand Down
Loading
Loading