mlrun · GiladShapira94 · Apr 15, 2026 · Apr 15, 2026 · Apr 16, 2026 · Apr 16, 2026
diff --git a/.claude/skills/bump/SKILL.md b/.claude/skills/bump/SKILL.md
@@ -0,0 +1,24 @@
+---
+name: bump
+description: Bump the chart version in charts/mlrun-ce/Chart.yaml (patch, minor, or rc)
+allowed-tools: Read(charts/mlrun-ce/Chart.yaml) Edit(charts/mlrun-ce/Chart.yaml) Read(charts/mlrun-ce/README.md) Edit(charts/mlrun-ce/README.md)
+---
+
+Bump the version in `charts/mlrun-ce/Chart.yaml`.
+
+Usage: /bump <patch|minor|rc>
+
+- `patch` — increment the patch digit: `0.11.0` → `0.11.1`
+- `minor` — increment the minor digit and reset patch: `0.11.3` → `0.12.0`
+- `rc` — increment the RC counter on the current version: `0.11.0-rc.34` → `0.11.0-rc.35`
+  - If the current version has no RC suffix, add `-rc.1`: `0.11.0` → `0.11.0-rc.1`
+
+Steps:
+1. Read the current version from `charts/mlrun-ce/Chart.yaml` (the `version:` field).
+2. Compute the new version according to the argument above.
+3. Show the user: "Bumping `<old>` → `<new>`" and ask for confirmation before writing.
+4. On confirmation, update the `version:` field in `charts/mlrun-ce/Chart.yaml` in-place.
+5. Remind the user: version bumps must be committed before opening a PR, and the PR title must follow `[Scope] description` format.
+6. Update the MLRun CE version under Version Matrix in `charts/mlrun-ce/README.md`.
+
+If no argument is given, show the current version and list the three options with the resulting version for each, then ask which to apply.
diff --git a/.claude/skills/pr/SKILL.md b/.claude/skills/pr/SKILL.md
@@ -0,0 +1,107 @@
+---
+name: pr
+description: Analyze branch changes and generate a fully filled PR description ready to paste into GitHub
+allowed-tools: Bash(git diff*) Bash(git log*)
+---
+
+Analyze the current branch changes and generate a fully filled PR description ready to paste into GitHub.
+
+## Steps
+
+1. **Gather context** — run these in parallel:
+   - `git diff upstream/development...HEAD` — full diff against the base branch
+   - `git log upstream/development..HEAD --oneline` — commit list
+   - `git diff upstream/development...HEAD --name-only` — changed files
+
+2. **Analyze the diff** carefully:
+   - What components or templates were changed? (check which `templates/` subdirs, `values.yaml` sections, `requirements.yaml`, `Chart.yaml`)
+   - Were any new values keys added? Do they need to be reflected in the three install-mode values files?
+   - Were any Secrets, ConfigMaps, or port numbers changed? (potential breaking changes)
+   - Was `Chart.yaml` version bumped? If not, flag it.
+   - Were `requirements.yaml` or `requirements.lock` changed?
+   - Does `charts/mlrun-ce/README.md` need updating (new NodePort, new component, new install step)?
+
+3. **Detect breaking changes** — flag as breaking if any of:
+   - A value key was renamed or removed
+   - A Secret or ConfigMap name changed
+   - A NodePort number changed
+   - A sub-chart was upgraded with a major version bump
+   - The storage credentials structure changed
+   - Any hook annotation or hook-weight changed in a way that affects upgrade order
+
+5. Provide an option PR title following the `[Scope] description` format, where Scope is one of: `['feature', 'fix', 'docs', 'improvement', 'revert', 'breaking', 'ci']`. For example: `[Feature] Add Redis support to mlrun-ce`.
+6. **Fill the PR template** — produce the complete filled template below. Be specific and concrete; do not use placeholder text.
+
+---
+
+Apply these checklist rules before writing the output:
+- `[x]` — you can confirm this item is satisfied from the diff alone
+- `[ ]` — requires human action, judgment, or external system access
+
+Specific rules:
+- "tested" → always `[ ]`
+- "documentation PR" → always `[ ]`
+- "QA tests / Jira ticket" → always `[ ]`
+- "installation verified" → always `[ ]`
+- `Chart.yaml` version bump → `[x]` if diff shows version changed, otherwise `[ ]` and add to Warnings
+- Multi-namespace values files → `[x]` if all three are in the diff OR the change has no effect on install-mode values; `[ ]` with a note if a new value was added only to `values.yaml`
+- README update → `[x]` if `charts/mlrun-ce/README.md` is in the diff OR no new NodePorts/components were added; otherwise `[ ]`
+
+Output exactly this structure with real content (no placeholder text):
+
+```markdown
+### 📝 Description
+<2-4 sentences: what changed, why, and what it affects>
+
+---
+
+### 🛠️ Changes Made
+<concrete bullet list — file paths, value keys, resource names>
+
+---
+
+### ✅ Checklist
+- [ ] I have tested the changes in this PR
+- [ ] I confirmed whether my changes require a change in documentation and if so, I created another PR in MLRun for the relevant documentation.
+- [ ] I confirmed whether my changes require a changes in QA tests, for example: credentials changes, resources naming change and if so, I updated the relevant Jira ticket for QA.
+- [ ] I increased the Chart version in `charts/mlrun-ce/Chart.yaml`.
+- [ ] I confirmed that the installation works both on a local Docker Desktop environment and on a real cluster when using the required [prerequisites](https://docs.mlrun.org/en/stable/install-mlrun-ce/kubernetes-install.html#prerequisites).
+  - [ ] If installation issues were found, I updated the relevant Jira ticket with the issue and steps to reproduce, or updated the prerequisites documentation if the issue is related to missing or outdated prerequisites.
+- [ ] If needed, update https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/README.md with the relevant installation instructions and version Matrix.
+- [ ] If needed, update the following values files for multi namespace support:
+  - [ ] [Admin values](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/admin_installation_values.yaml)
+  - [ ] [User values Node Port](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/non_admin_installation_values.yaml)
+  - [ ] [User values ClusterIP](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml)
+
+---
+
+### 🧪 Testing
+<what was tested: lint, helm template dry-run, Kind cluster, manual — based on nature of changes>
+
+---
+
+### 🔗 References
+- Ticket link:
+- External links:
+- Design docs links (Optional):
+
+---
+
+### 🚨 Breaking Changes?
+
+- [ ] Yes (explain below)
+- [ ] No
+
+<if breaking: bullet list of what downstream consumers must change — value keys to rename, Secrets to recreate, ports to update>
+
+---
+
+### 🔍️ Additional Notes
+<follow-up tasks, known issues, affected areas — omit if nothing to add>
+```
+
+Then replace each `[ ]` with `[x]` on items you can confirm from the diff, following the rules above.
+
+After outputting the filled template, add a short **"Warnings"** section (outside the template) listing anything that needs human attention before opening the PR (missing version bump, unsynced values files, potential breaking changes, etc.).
+
+Between every sentence that end with a `.` add a two new lines to make it more readable.
diff --git a/.cursorrules b/.cursorrules
@@ -0,0 +1,65 @@
+# MLRun Community Edition — Cursor Rules
+
+## Source of Truth
+
+`AGENTS.md` is the authoritative reference for this project. Read it before making any suggestions. It covers architecture, design patterns, template conventions, component dependencies, how to add new components, and common debugging scenarios.
+
+`CONTRIBUTING.md` covers the development workflow, commit format, and PR process.
+
+## Preferred Response Patterns
+<!-- Mirrors CLAUDE.md — duplicated here because Cursor has no @file import support -->
+
+- Helm install commands: always include `--namespace mlrun --wait`
+- Values changes: show `--set` flags or a patch values file overlay, not edits to `values.yaml` directly
+- New templates: show the complete file including the `{{- if .Values.<component>.enabled }}` guard and `include "mlrun-ce.common.labels"` call
+- Service references within templates: use `{{ .Release.Namespace }}`, never hardcode namespace strings
+- After any `requirements.yaml` change: remind the user to run `make helm-update-dependencies` and commit `requirements.lock`
+- If a change affects the default installation, remind the user to update all three values files (`admin_installation_values.yaml`, `non_admin_installation_values.yaml`, `non_admin_cluster_ip_installation_values.yaml`) with the appropriate default
+- If a change adds a new component, changes a component version, or changes the installation process, remind the user to update `charts/mlrun-ce/README.md`
+
+## Common Tasks (Claude Code has `/render`, `/bump`, `/pr` skills for these)
+
+When a user asks you to help with the following tasks, use the commands below — these are the manual equivalents of the Claude Code skills defined in `.claude/commands/`.
+
+**Render chart templates** (`/render` in Claude Code)
+```bash
+# Full chart
+helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml
+
+# Single template or directory
+helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml \
+  --show-only templates/<path>
+
+# With a values overlay (e.g. non-admin install)
+helm template mlrun charts/mlrun-ce \
+  -f charts/mlrun-ce/values.yaml \
+  -f charts/mlrun-ce/non_admin_installation_values.yaml
+```
+
+**Bump Chart version** (`/bump` in Claude Code)
+Read the current version from `charts/mlrun-ce/Chart.yaml` and increment:
+- `patch` — `0.11.0` → `0.11.1`
+- `minor` — `0.11.3` → `0.12.0`
+- `rc` — `0.11.0-rc.34` → `0.11.0-rc.35` (or append `-rc.1` if no suffix)
+
+Always show the old → new version before writing and confirm with the user.
+
+**Generate PR description** (`/pr` in Claude Code)
+Run `git diff upstream/development...HEAD`, `git log upstream/development..HEAD --oneline`, and `git diff upstream/development...HEAD --name-only`, then fill in `.github/pull_request_template.md` based on the changes. Check `[x]` on checklist items confirmable from the diff; leave `[ ]` on items requiring human action. Flag missing version bumps, unsynced values files, and breaking changes.
+
+## Do Not Suggest
+
+- `helm upgrade --install` without running `make helm-update-dependencies` first
+- Adding a new sub-chart to `requirements.yaml` for custom resources — add templates to `charts/mlrun-ce/templates/<component>/` instead
+- `kubectl apply` for resources managed by this chart
+- `Chart.yaml` apiVersion v2 dependency blocks (this chart uses apiVersion v1 + `requirements.yaml`)
+- Creating a second credentials Secret — mount the existing `storage-credentials` Secret via `envFrom`
+- Hardcoding namespace names in templates — use `{{ .Release.Namespace }}`
+- Using `kafka.enabled + strimzi-kafka-operator.enabled` as a combined condition — the template guard is only `kafka.enabled`; Strimzi is a prerequisite, not a co-guard
+- Treating `seaweedfs-s3-config` as a SeaweedFS dependency — SeaweedFS *creates* it; Pipelines and MLRun *consume* it
+
+## Workflow (from CONTRIBUTING.md)
+
+- Fork-based workflow: PRs target `upstream/development`, not `origin/development`
+- Branch naming: `<scope>/<short-description-or-ticket>` — e.g. `feature/add-redis-support`, `fix/CE-111`
+- Always bump `charts/mlrun-ce/Chart.yaml` version before opening a PR
diff --git a/.gitignore b/.gitignore
@@ -4,3 +4,6 @@ charts/mlrun-ce/charts/*
 **/.DS_Store
 *.DS_Store
 **/__pycache__
+
+# Claude Code local settings (machine-specific, not for commit)
+.claude/settings.local.json
diff --git a/AGENTS.md b/AGENTS.md
@@ -18,14 +18,6 @@ make helm-repo-add
 
 # Package the chart as a tarball
 make package
-
-# Run full local end-to-end test on a Kind cluster (requires docker, kind, kubectl, helm)
-./tests/kind-test.sh full          # Create Kind cluster + install chart
-./tests/kind-test.sh create        # Create cluster only
-./tests/kind-test.sh install       # Install chart (assumes cluster exists)
-./tests/kind-test.sh verify        # Verify installation
-./tests/kind-test.sh delete        # Delete Kind cluster
-CLEANUP_ON_EXIT=true ./tests/kind-test.sh  # Auto-cleanup after test
 ```
 
 ## Architecture
@@ -39,27 +31,91 @@ CLEANUP_ON_EXIT=true ./tests/kind-test.sh  # Auto-cleanup after test
 
 ### Template Organization (`charts/mlrun-ce/templates/`)
 
-- `config/` — ConfigMaps and Secrets shared across components: MLRun env config, Jupyter env config, S3 credentials secret, Pipelines config, Spark config, Grafana dashboards
+- `config/` — ConfigMaps and Secrets shared across components: MLRun env config, Jupyter env config, storage credentials secret, Pipelines config, Spark config, Grafana dashboards
 - `seaweedfs/` — SeaweedFS-specific resources: S3 IAM config secret, bucket init job, admin UI NodePort service, ingress
 - `kafka/` — Kafka Strimzi custom resources: KafkaNodePool, Kafka cluster CR, bootstrap alias Service, RBAC, NetworkPolicy
 - `timescaledb/` — TimescaleDB Deployment, Service, PVC
 - `jupyter-notebook/` — Jupyter Deployment and supporting resources
 - `pipelines/` — Kubeflow Pipelines resources
+- `spark-operator/` — Spark controller RBAC
 - `persistency/` — PVC definitions
 - `aws/` — AWS-specific resources
 
 ### Key Design Patterns
 
-**S3 credentials propagation**: The top-level `s3.accessKey`/`s3.secretKey`/`s3.bucket` values flow into a `s3-credentials` Secret (created by `templates/config/s3-credentials-secret.yaml`), which is then mounted via `envFrom` in MLRun API and Jupyter pods. SeaweedFS uses the same credentials via the `seaweedfs-s3-config` Secret.
+**S3 credentials propagation**: The top-level `storage.s3.accessKey`/`storage.s3.secretKey`/`storage.s3.bucket` values flow into a `storage-credentials` Secret (created by `templates/config/storage-secret.yaml`), which is then mounted via `envFrom` in MLRun API and Jupyter pods. SeaweedFS uses the same credentials via the `seaweedfs-s3-config` Secret.
 
 **Global registry anchor**: `global.registry: &userRegistry` in `values.yaml` uses YAML anchors to multiplex the same docker registry config to both `nuclio.global.registry` and `mlrun.global.registry`.
 
-**SeaweedFS as S3 backend**: SeaweedFS replaced MinIO. The helpers in `_helpers.tpl` (`mlrun-ce.s3.*`) generate the SeaweedFS service URL. Legacy `mlrun-ce.minio.*` helpers are kept as aliases pointing to the SeaweedFS helpers.
-
 **Component enable/disable**: Most components can be disabled via `<component>.enabled: false`. The Kafka setup requires the Strimzi operator (deployed as a sub-chart via `strimzi-kafka-operator`) and custom Strimzi CRs in `templates/kafka/`.
 
 ### Values Files
 
 - `charts/mlrun-ce/admin_installation_values.yaml` — admin install
 - `charts/mlrun-ce/non_admin_installation_values.yaml` — non-admin install
 - `charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml` — non-admin with ClusterIP
+
+## Quick-Start Dev Workflow
+
+From a fresh clone to a linted chart:
+
+1. `make helm-repo-add` — adds all external repos (reads `requirements.yaml`; idempotent)
+2. `make helm-update-dependencies` — downloads sub-chart tarballs into `charts/mlrun-ce/charts/` (must run before any lint or template render)
+3. `make helm-lint` — runs `helm lint charts/mlrun-ce` + `ct lint --target-branch development`
+   - `ct` only lints charts with changes relative to the target branch; always run from a feature branch, not directly on `development`
+4. Render all templates locally (no cluster needed):
+   ```bash
+   helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml
+   ```
+5. Render a single template file:
+   ```bash
+   helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml --show-only templates/kafka/kafka-cluster.yaml
+   ```
+6. Schema-validate without a cluster:
+   ```bash
+   helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml | kubectl apply --dry-run=client -f -
+   ```
+
+## Component Dependency Map
+
+| Component | Enabled by | Runtime dependencies | Key templates / notes |
+|---|---|---|---|
+| MLRun API + UI + DB | `mlrun.enabled` | `storage-credentials` Secret, `mlrun-common-env` ConfigMap; mlrun-db (MySQL) is bundled inside the mlrun sub-chart | `templates/config/mlrun-env-configmap.yaml`; rest is in the `mlrun` sub-chart |
+| Jupyter | `jupyterNotebook.enabled` | `storage-credentials` Secret, `jupyter-common-env` ConfigMap | `templates/jupyter-notebook/` |
+| Nuclio | always on (no `enabled` guard in umbrella) | `global.registry` must be set | sub-chart only — no custom templates |
+| MPI Operator | always on (no `enabled` guard in umbrella) | none | sub-chart only — no custom templates |
+| SeaweedFS | `seaweedfs.enabled` | PVC for data storage; creates `seaweedfs-s3-config` Secret consumed by Pipelines and MLRun | `templates/seaweedfs/`; `seaweedfs.s3.enableAuth: true` must be set or the Secret is skipped |
+| Spark Operator | `spark-operator.enabled` | none | sub-chart + `templates/spark-operator/spark-controller-rbac.yaml` |
+| Kafka | `kafka.enabled` | Strimzi CRDs — `strimzi-kafka-operator` sub-chart must also be enabled as a prerequisite; CRs use post-install hooks to wait for CRDs | `templates/kafka/` |
+| Pipelines | `pipelines.enabled` | SeaweedFS (`seaweedfs.enabled` checked at render time; adds init container to wait for it), `mlrun-pipelines-config` ConfigMap | `templates/pipelines/`, `templates/config/mlrun-pipelines-config.yaml` |
+| TimescaleDB | `timescaledb.enabled` | none; uses its own `<release>-timescaledb-secret` for the DB password | `templates/timescaledb/` — custom StatefulSet, not a sub-chart |
+| Prometheus + Grafana | `kube-prometheus-stack.enabled` | none at runtime; model monitoring dashboards are pre-loaded as static JSON ConfigMaps | sub-chart config in `values.yaml`; dashboards in `templates/config/model-monitoring-*.yml` |
+
+## How to Add a New Component
+
+1. Add a top-level block to `values.yaml`:
+   ```yaml
+   myComponent:
+     enabled: true
+   ```
+2. Create `charts/mlrun-ce/templates/myComponent/`.
+3. Every template file must open with `{{- if .Values.myComponent.enabled }}` and close with `{{- end }}`.
+4. Add label helpers to `_helpers.tpl` following the `mlrun-ce.<component>.labels` / `mlrun-ce.<component>.selectorLabels` pattern.
+5. NodePort selection — avoid all currently occupied ports:
+   - 30010 Grafana, 30020 Prometheus, 30040 Jupyter, 30050 Nuclio
+   - 30060 MLRun UI, 30070 MLRun API, 30093 SeaweedFS Admin, 30094 SeaweedFS S3
+   - 30100 Pipelines, 30110 TimescaleDB
+6. NodePort services must be optional and only created when the component is enabled.
+7. Must create a NodePort service if the component exposes a user-facing UI or API that should be accessible outside the cluster. If the component is internal-only, use a ClusterIP service instead.
+8. Storage credentials — mount the existing `storage-credentials` Secret via `envFrom.secretRef`; do not create a second credentials secret.
+8. CRD dependencies — if the component depends on CRDs from a sub-chart, use `helm.sh/hook: post-install,post-upgrade` with an appropriate `hook-weight` on the CRs (see `templates/kafka/` for the established pattern).
+10. Update all three values files to explicitly set `myComponent.enabled: true/false` as appropriate for each install mode.
+11. Add the component's service URL to `templates/NOTES.txt` using the existing conditional pattern.
+12. Update `charts/mlrun-ce/README.md` if a new NodePort is exposed.
+13. Bump the version in `charts/mlrun-ce/Chart.yaml`.
+14. Keep secrets and ENV's naming consistent with existing patterns (`storage-credentials` Secret, `mlrun-common-env` ConfigMap, etc.).
+15. Add a section to this AGENTS.md file describing the component's architecture, dependencies, and any special design patterns used.
+16. Try to reuse existing patterns and templates as much as possible — for example, if the component needs a ConfigMap of environment variables, add them to `templates/config/` and follow the same pattern as `mlrun-common-env` or `jupyter-common-env`.
+17. Try and customize the component's configuration via `values.yaml` rather than hardcoding values in the templates. For example, if the component needs a port number, add a `myComponent.port` value and reference it in the template, rather than hardcoding a port.
+18. Each k8s that support limit and request should be added to the values file and template or use the default values from the sub-chart if it already supports it.
+15. Run `make helm-lint` and fix any lint errors before opening a PR.