Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3e26ccb
first commit
GiladShapira94 Apr 15, 2026
91aa535
fix
GiladShapira94 Apr 15, 2026
355728a
fix run issue
GiladShapira94 Apr 16, 2026
ba88c03
fix run issue
GiladShapira94 Apr 16, 2026
5ed1f83
fix run issue
GiladShapira94 Apr 16, 2026
9640e9c
remove label
GiladShapira94 Apr 16, 2026
cbe71fc
fix after review
GiladShapira94 Apr 20, 2026
6264877
Merge remote-tracking branch 'upstream/development' into ce-worfklows
GiladShapira94 Apr 26, 2026
3413976
fix after review
GiladShapira94 Apr 26, 2026
8ec1302
Merge remote-tracking branch 'upstream/development' into ce-worfklows
GiladShapira94 Apr 26, 2026
5364f3c
first commit
GiladShapira94 Apr 27, 2026
61988fb
fix installation issue
GiladShapira94 Apr 27, 2026
ae6842e
Merge pull request #1 from GiladShapira94/ce-worfklows
GiladShapira94 Apr 27, 2026
3ac86b3
Update release.yml
GiladShapira94 Apr 27, 2026
9618888
change chart version
GiladShapira94 Apr 28, 2026
c95a662
Update pr-validation.yml
GiladShapira94 Apr 28, 2026
59655d2
Update pr-validation.yml
GiladShapira94 Apr 28, 2026
e945709
Merge pull request #3 from GiladShapira94/CEML-696
GiladShapira94 Apr 28, 2026
55691a6
Update release.yml
GiladShapira94 Apr 28, 2026
efa5752
[Fix] testing fix
GiladShapira94 Apr 28, 2026
a3dbc06
Merge remote-tracking branch 'origin/development' into development
GiladShapira94 Apr 28, 2026
252af2c
print the release rc
GiladShapira94 Apr 28, 2026
c6b4d1f
Merge remote-tracking branch 'upstream/development' into development
GiladShapira94 May 3, 2026
f46516c
Add Claude and Cursor instructions files and skills
GiladShapira94 May 4, 2026
d69ac44
fix make helm-line
GiladShapira94 May 4, 2026
eaad867
commit
GiladShapira94 May 4, 2026
b6083e3
fix small changes
GiladShapira94 May 4, 2026
f0b314a
fix small changes
GiladShapira94 May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .claude/skills/bump/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: bump
description: Bump the chart version in charts/mlrun-ce/Chart.yaml (patch, minor, or rc)
allowed-tools: Read(charts/mlrun-ce/Chart.yaml) Edit(charts/mlrun-ce/Chart.yaml) Read(charts/mlrun-ce/README.md) Edit(charts/mlrun-ce/README.md)
---

Bump the version in `charts/mlrun-ce/Chart.yaml`.

Usage: /bump <patch|minor|rc>

- `patch` β€” increment the patch digit: `0.11.0` β†’ `0.11.1`
- `minor` β€” increment the minor digit and reset patch: `0.11.3` β†’ `0.12.0`
- `rc` β€” increment the RC counter on the current version: `0.11.0-rc.34` β†’ `0.11.0-rc.35`
- If the current version has no RC suffix, add `-rc.1`: `0.11.0` β†’ `0.11.0-rc.1`

Steps:
1. Read the current version from `charts/mlrun-ce/Chart.yaml` (the `version:` field).
2. Compute the new version according to the argument above.
3. Show the user: "Bumping `<old>` β†’ `<new>`" and ask for confirmation before writing.
4. On confirmation, update the `version:` field in `charts/mlrun-ce/Chart.yaml` in-place.
5. Remind the user: version bumps must be committed before opening a PR, and the PR title must follow `[Scope] description` format.
6. Update the MLRun CE version under Version Matrix in `charts/mlrun-ce/README.md`.

If no argument is given, show the current version and list the three options with the resulting version for each, then ask which to apply.
107 changes: 107 additions & 0 deletions .claude/skills/pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
name: pr
description: Analyze branch changes and generate a fully filled PR description ready to paste into GitHub
allowed-tools: Bash(git diff*) Bash(git log*)
---

Analyze the current branch changes and generate a fully filled PR description ready to paste into GitHub.

## Steps

1. **Gather context** β€” run these in parallel:
- `git diff upstream/development...HEAD` β€” full diff against the base branch
- `git log upstream/development..HEAD --oneline` β€” commit list
- `git diff upstream/development...HEAD --name-only` β€” changed files

2. **Analyze the diff** carefully:
- What components or templates were changed? (check which `templates/` subdirs, `values.yaml` sections, `requirements.yaml`, `Chart.yaml`)
- Were any new values keys added? Do they need to be reflected in the three install-mode values files?
- Were any Secrets, ConfigMaps, or port numbers changed? (potential breaking changes)
- Was `Chart.yaml` version bumped? If not, flag it.
- Were `requirements.yaml` or `requirements.lock` changed?
- Does `charts/mlrun-ce/README.md` need updating (new NodePort, new component, new install step)?

3. **Detect breaking changes** β€” flag as breaking if any of:
- A value key was renamed or removed
- A Secret or ConfigMap name changed
- A NodePort number changed
- A sub-chart was upgraded with a major version bump
- The storage credentials structure changed
- Any hook annotation or hook-weight changed in a way that affects upgrade order

5. Provide an option PR title following the `[Scope] description` format, where Scope is one of: `['feature', 'fix', 'docs', 'improvement', 'revert', 'breaking', 'ci']`. For example: `[Feature] Add Redis support to mlrun-ce`.
6. **Fill the PR template** β€” produce the complete filled template below. Be specific and concrete; do not use placeholder text.

---

Apply these checklist rules before writing the output:
- `[x]` β€” you can confirm this item is satisfied from the diff alone
- `[ ]` β€” requires human action, judgment, or external system access

Specific rules:
- "tested" β†’ always `[ ]`
- "documentation PR" β†’ always `[ ]`
- "QA tests / Jira ticket" β†’ always `[ ]`
- "installation verified" β†’ always `[ ]`
- `Chart.yaml` version bump β†’ `[x]` if diff shows version changed, otherwise `[ ]` and add to Warnings
- Multi-namespace values files β†’ `[x]` if all three are in the diff OR the change has no effect on install-mode values; `[ ]` with a note if a new value was added only to `values.yaml`
- README update β†’ `[x]` if `charts/mlrun-ce/README.md` is in the diff OR no new NodePorts/components were added; otherwise `[ ]`

Output exactly this structure with real content (no placeholder text):

```markdown
### πŸ“ Description
<2-4 sentences: what changed, why, and what it affects>

---

### πŸ› οΈ Changes Made
<concrete bullet list β€” file paths, value keys, resource names>

---

### βœ… Checklist
- [ ] I have tested the changes in this PR
- [ ] I confirmed whether my changes require a change in documentation and if so, I created another PR in MLRun for the relevant documentation.
- [ ] I confirmed whether my changes require a changes in QA tests, for example: credentials changes, resources naming change and if so, I updated the relevant Jira ticket for QA.
- [ ] I increased the Chart version in `charts/mlrun-ce/Chart.yaml`.
- [ ] I confirmed that the installation works both on a local Docker Desktop environment and on a real cluster when using the required [prerequisites](https://docs.mlrun.org/en/stable/install-mlrun-ce/kubernetes-install.html#prerequisites).
- [ ] If installation issues were found, I updated the relevant Jira ticket with the issue and steps to reproduce, or updated the prerequisites documentation if the issue is related to missing or outdated prerequisites.
- [ ] If needed, update https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/README.md with the relevant installation instructions and version Matrix.
- [ ] If needed, update the following values files for multi namespace support:
- [ ] [Admin values](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/admin_installation_values.yaml)
- [ ] [User values Node Port](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/non_admin_installation_values.yaml)
- [ ] [User values ClusterIP](https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml)

---

### πŸ§ͺ Testing
<what was tested: lint, helm template dry-run, Kind cluster, manual β€” based on nature of changes>

---

### πŸ”— References
- Ticket link:
- External links:
- Design docs links (Optional):

---

### 🚨 Breaking Changes?

- [ ] Yes (explain below)
- [ ] No

<if breaking: bullet list of what downstream consumers must change β€” value keys to rename, Secrets to recreate, ports to update>

---

### πŸ”οΈ Additional Notes
<follow-up tasks, known issues, affected areas β€” omit if nothing to add>
```

Then replace each `[ ]` with `[x]` on items you can confirm from the diff, following the rules above.

After outputting the filled template, add a short **"Warnings"** section (outside the template) listing anything that needs human attention before opening the PR (missing version bump, unsynced values files, potential breaking changes, etc.).

Between every sentence that end with a `.` add a two new lines to make it more readable.
65 changes: 65 additions & 0 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# MLRun Community Edition β€” Cursor Rules

## Source of Truth

`AGENTS.md` is the authoritative reference for this project. Read it before making any suggestions. It covers architecture, design patterns, template conventions, component dependencies, how to add new components, and common debugging scenarios.

`CONTRIBUTING.md` covers the development workflow, commit format, and PR process.

## Preferred Response Patterns
<!-- Mirrors CLAUDE.md β€” duplicated here because Cursor has no @file import support -->

- Helm install commands: always include `--namespace mlrun --wait`
- Values changes: show `--set` flags or a patch values file overlay, not edits to `values.yaml` directly
- New templates: show the complete file including the `{{- if .Values.<component>.enabled }}` guard and `include "mlrun-ce.common.labels"` call
- Service references within templates: use `{{ .Release.Namespace }}`, never hardcode namespace strings
- After any `requirements.yaml` change: remind the user to run `make helm-update-dependencies` and commit `requirements.lock`
- If a change affects the default installation, remind the user to update all three values files (`admin_installation_values.yaml`, `non_admin_installation_values.yaml`, `non_admin_cluster_ip_installation_values.yaml`) with the appropriate default
- If a change adds a new component, changes a component version, or changes the installation process, remind the user to update `charts/mlrun-ce/README.md`

## Common Tasks (Claude Code has `/render`, `/bump`, `/pr` skills for these)

When a user asks you to help with the following tasks, use the commands below β€” these are the manual equivalents of the Claude Code skills defined in `.claude/commands/`.

**Render chart templates** (`/render` in Claude Code)
```bash
# Full chart
helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml

# Single template or directory
helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml \
--show-only templates/<path>

# With a values overlay (e.g. non-admin install)
helm template mlrun charts/mlrun-ce \
-f charts/mlrun-ce/values.yaml \
-f charts/mlrun-ce/non_admin_installation_values.yaml
```

**Bump Chart version** (`/bump` in Claude Code)
Read the current version from `charts/mlrun-ce/Chart.yaml` and increment:
- `patch` β€” `0.11.0` β†’ `0.11.1`
- `minor` β€” `0.11.3` β†’ `0.12.0`
- `rc` β€” `0.11.0-rc.34` β†’ `0.11.0-rc.35` (or append `-rc.1` if no suffix)

Always show the old β†’ new version before writing and confirm with the user.

**Generate PR description** (`/pr` in Claude Code)
Run `git diff upstream/development...HEAD`, `git log upstream/development..HEAD --oneline`, and `git diff upstream/development...HEAD --name-only`, then fill in `.github/pull_request_template.md` based on the changes. Check `[x]` on checklist items confirmable from the diff; leave `[ ]` on items requiring human action. Flag missing version bumps, unsynced values files, and breaking changes.

## Do Not Suggest

- `helm upgrade --install` without running `make helm-update-dependencies` first
- Adding a new sub-chart to `requirements.yaml` for custom resources β€” add templates to `charts/mlrun-ce/templates/<component>/` instead
- `kubectl apply` for resources managed by this chart
- `Chart.yaml` apiVersion v2 dependency blocks (this chart uses apiVersion v1 + `requirements.yaml`)
- Creating a second credentials Secret β€” mount the existing `storage-credentials` Secret via `envFrom`
- Hardcoding namespace names in templates β€” use `{{ .Release.Namespace }}`
- Using `kafka.enabled + strimzi-kafka-operator.enabled` as a combined condition β€” the template guard is only `kafka.enabled`; Strimzi is a prerequisite, not a co-guard
- Treating `seaweedfs-s3-config` as a SeaweedFS dependency β€” SeaweedFS *creates* it; Pipelines and MLRun *consume* it

## Workflow (from CONTRIBUTING.md)

- Fork-based workflow: PRs target `upstream/development`, not `origin/development`
- Branch naming: `<scope>/<short-description-or-ticket>` β€” e.g. `feature/add-redis-support`, `fix/CE-111`
- Always bump `charts/mlrun-ce/Chart.yaml` version before opening a PR
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ charts/mlrun-ce/charts/*
**/.DS_Store
*.DS_Store
**/__pycache__

# Claude Code local settings (machine-specific, not for commit)
.claude/settings.local.json
80 changes: 68 additions & 12 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,6 @@ make helm-repo-add

# Package the chart as a tarball
make package

# Run full local end-to-end test on a Kind cluster (requires docker, kind, kubectl, helm)
./tests/kind-test.sh full # Create Kind cluster + install chart
./tests/kind-test.sh create # Create cluster only
./tests/kind-test.sh install # Install chart (assumes cluster exists)
./tests/kind-test.sh verify # Verify installation
./tests/kind-test.sh delete # Delete Kind cluster
CLEANUP_ON_EXIT=true ./tests/kind-test.sh # Auto-cleanup after test
```

## Architecture
Expand All @@ -39,27 +31,91 @@ CLEANUP_ON_EXIT=true ./tests/kind-test.sh # Auto-cleanup after test

### Template Organization (`charts/mlrun-ce/templates/`)

- `config/` β€” ConfigMaps and Secrets shared across components: MLRun env config, Jupyter env config, S3 credentials secret, Pipelines config, Spark config, Grafana dashboards
- `config/` β€” ConfigMaps and Secrets shared across components: MLRun env config, Jupyter env config, storage credentials secret, Pipelines config, Spark config, Grafana dashboards
- `seaweedfs/` β€” SeaweedFS-specific resources: S3 IAM config secret, bucket init job, admin UI NodePort service, ingress
- `kafka/` β€” Kafka Strimzi custom resources: KafkaNodePool, Kafka cluster CR, bootstrap alias Service, RBAC, NetworkPolicy
- `timescaledb/` β€” TimescaleDB Deployment, Service, PVC
- `jupyter-notebook/` β€” Jupyter Deployment and supporting resources
- `pipelines/` β€” Kubeflow Pipelines resources
- `spark-operator/` β€” Spark controller RBAC
- `persistency/` β€” PVC definitions
- `aws/` β€” AWS-specific resources

### Key Design Patterns

**S3 credentials propagation**: The top-level `s3.accessKey`/`s3.secretKey`/`s3.bucket` values flow into a `s3-credentials` Secret (created by `templates/config/s3-credentials-secret.yaml`), which is then mounted via `envFrom` in MLRun API and Jupyter pods. SeaweedFS uses the same credentials via the `seaweedfs-s3-config` Secret.
**S3 credentials propagation**: The top-level `storage.s3.accessKey`/`storage.s3.secretKey`/`storage.s3.bucket` values flow into a `storage-credentials` Secret (created by `templates/config/storage-secret.yaml`), which is then mounted via `envFrom` in MLRun API and Jupyter pods. SeaweedFS uses the same credentials via the `seaweedfs-s3-config` Secret.

**Global registry anchor**: `global.registry: &userRegistry` in `values.yaml` uses YAML anchors to multiplex the same docker registry config to both `nuclio.global.registry` and `mlrun.global.registry`.

**SeaweedFS as S3 backend**: SeaweedFS replaced MinIO. The helpers in `_helpers.tpl` (`mlrun-ce.s3.*`) generate the SeaweedFS service URL. Legacy `mlrun-ce.minio.*` helpers are kept as aliases pointing to the SeaweedFS helpers.

**Component enable/disable**: Most components can be disabled via `<component>.enabled: false`. The Kafka setup requires the Strimzi operator (deployed as a sub-chart via `strimzi-kafka-operator`) and custom Strimzi CRs in `templates/kafka/`.

### Values Files

- `charts/mlrun-ce/admin_installation_values.yaml` β€” admin install
- `charts/mlrun-ce/non_admin_installation_values.yaml` β€” non-admin install
- `charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml` β€” non-admin with ClusterIP

## Quick-Start Dev Workflow

From a fresh clone to a linted chart:

1. `make helm-repo-add` β€” adds all external repos (reads `requirements.yaml`; idempotent)
2. `make helm-update-dependencies` β€” downloads sub-chart tarballs into `charts/mlrun-ce/charts/` (must run before any lint or template render)
3. `make helm-lint` β€” runs `helm lint charts/mlrun-ce` + `ct lint --target-branch development`
- `ct` only lints charts with changes relative to the target branch; always run from a feature branch, not directly on `development`
4. Render all templates locally (no cluster needed):
```bash
helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml
```
5. Render a single template file:
```bash
helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml --show-only templates/kafka/kafka-cluster.yaml
```
6. Schema-validate without a cluster:
```bash
helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml | kubectl apply --dry-run=client -f -
```

## Component Dependency Map

| Component | Enabled by | Runtime dependencies | Key templates / notes |
|---|---|---|---|
| MLRun API + UI + DB | `mlrun.enabled` | `storage-credentials` Secret, `mlrun-common-env` ConfigMap; mlrun-db (MySQL) is bundled inside the mlrun sub-chart | `templates/config/mlrun-env-configmap.yaml`; rest is in the `mlrun` sub-chart |
| Jupyter | `jupyterNotebook.enabled` | `storage-credentials` Secret, `jupyter-common-env` ConfigMap | `templates/jupyter-notebook/` |
| Nuclio | always on (no `enabled` guard in umbrella) | `global.registry` must be set | sub-chart only β€” no custom templates |
| MPI Operator | always on (no `enabled` guard in umbrella) | none | sub-chart only β€” no custom templates |
| SeaweedFS | `seaweedfs.enabled` | PVC for data storage; creates `seaweedfs-s3-config` Secret consumed by Pipelines and MLRun | `templates/seaweedfs/`; `seaweedfs.s3.enableAuth: true` must be set or the Secret is skipped |
| Spark Operator | `spark-operator.enabled` | none | sub-chart + `templates/spark-operator/spark-controller-rbac.yaml` |
| Kafka | `kafka.enabled` | Strimzi CRDs β€” `strimzi-kafka-operator` sub-chart must also be enabled as a prerequisite; CRs use post-install hooks to wait for CRDs | `templates/kafka/` |
| Pipelines | `pipelines.enabled` | SeaweedFS (`seaweedfs.enabled` checked at render time; adds init container to wait for it), `mlrun-pipelines-config` ConfigMap | `templates/pipelines/`, `templates/config/mlrun-pipelines-config.yaml` |
| TimescaleDB | `timescaledb.enabled` | none; uses its own `<release>-timescaledb-secret` for the DB password | `templates/timescaledb/` β€” custom StatefulSet, not a sub-chart |
| Prometheus + Grafana | `kube-prometheus-stack.enabled` | none at runtime; model monitoring dashboards are pre-loaded as static JSON ConfigMaps | sub-chart config in `values.yaml`; dashboards in `templates/config/model-monitoring-*.yml` |

## How to Add a New Component

1. Add a top-level block to `values.yaml`:
```yaml
myComponent:
enabled: true
```
2. Create `charts/mlrun-ce/templates/myComponent/`.
3. Every template file must open with `{{- if .Values.myComponent.enabled }}` and close with `{{- end }}`.
4. Add label helpers to `_helpers.tpl` following the `mlrun-ce.<component>.labels` / `mlrun-ce.<component>.selectorLabels` pattern.
5. NodePort selection β€” avoid all currently occupied ports:
- 30010 Grafana, 30020 Prometheus, 30040 Jupyter, 30050 Nuclio
- 30060 MLRun UI, 30070 MLRun API, 30093 SeaweedFS Admin, 30094 SeaweedFS S3
- 30100 Pipelines, 30110 TimescaleDB
6. NodePort services must be optional and only created when the component is enabled.
7. Must create a NodePort service if the component exposes a user-facing UI or API that should be accessible outside the cluster. If the component is internal-only, use a ClusterIP service instead.
8. Storage credentials β€” mount the existing `storage-credentials` Secret via `envFrom.secretRef`; do not create a second credentials secret.
8. CRD dependencies β€” if the component depends on CRDs from a sub-chart, use `helm.sh/hook: post-install,post-upgrade` with an appropriate `hook-weight` on the CRs (see `templates/kafka/` for the established pattern).
10. Update all three values files to explicitly set `myComponent.enabled: true/false` as appropriate for each install mode.
11. Add the component's service URL to `templates/NOTES.txt` using the existing conditional pattern.
12. Update `charts/mlrun-ce/README.md` if a new NodePort is exposed.
13. Bump the version in `charts/mlrun-ce/Chart.yaml`.
14. Keep secrets and ENV's naming consistent with existing patterns (`storage-credentials` Secret, `mlrun-common-env` ConfigMap, etc.).
15. Add a section to this AGENTS.md file describing the component's architecture, dependencies, and any special design patterns used.
16. Try to reuse existing patterns and templates as much as possible β€” for example, if the component needs a ConfigMap of environment variables, add them to `templates/config/` and follow the same pattern as `mlrun-common-env` or `jupyter-common-env`.
17. Try and customize the component's configuration via `values.yaml` rather than hardcoding values in the templates. For example, if the component needs a port number, add a `myComponent.port` value and reference it in the template, rather than hardcoding a port.
18. Each k8s that support limit and request should be added to the values file and template or use the default values from the sub-chart if it already supports it.
15. Run `make helm-lint` and fix any lint errors before opening a PR.
Loading