giantswarm · pipo02mix · May 21, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
@@ -12,7 +12,7 @@
 user_questions:
   - How can I deploy Ray clusters on Giant Swarm using KubeRay?
   - How do I configure KubeRay for distributed machine learning workloads?
-last_review_date: 2025-10-21
+last_review_date: 2026-05-19
 ---
 
 [Ray](https://www.ray.io/) is a unified framework for scaling AI and Python applications. It provides a simple, universal API for building distributed applications and includes libraries for machine learning, reinforcement learning, and hyperparameter tuning. [KubeRay](https://ray-project.github.io/kuberay/) is the official Kubernetes operator for Ray that automates the deployment, scaling, and management of Ray clusters on Kubernetes.
@@ -51,30 +51,32 @@
   --name=kuberay-operator \
   --organization=${ORGANIZATION} \
   --target-namespace=kuberay-system \
-  --version=1.1.0 > kuberay-operator.yaml
+  --version=1.1.0 2>/dev/null > kuberay-operator.yaml
 
 kubectl apply -f kuberay-operator.yaml
 ```
 
+**Note**: `kubectl gs template app` may print a deprecation banner in latest releases of `kubectl gs` related to a transition how apps are deployed. That is why we are redirecting the `stderr`.
+
 ### Verifying the installation
 
 Check that the KubeRay operator is running:
 
-```bash
-kubectl get pods -n kuberay-system
+```nohighlight
+$ kubectl get pods -n kuberay-system
 
 NAME                                READY   STATUS    RESTARTS   AGE
 kuberay-operator-7b5c8f6d4b-xyz12   1/1     Running   0          2m
 ```
 
 Verify that the Custom Resource Definitions (CRDs) are installed:
 
-```bash
-kubectl get crd | grep ray
+```nohighlight
+$ kubectl get crd | grep ray
 
-rayclusters.ray.io                    2025-10-12T10:00:00Z
-rayjobs.ray.io                        2025-10-12T10:00:00Z
-rayservices.ray.io                    2025-10-12T10:00:00Z
+rayclusters.ray.io                    2026-05-19T10:00:00Z
+rayjobs.ray.io                        2026-05-19T10:00:00Z
+rayservices.ray.io                    2026-05-19T10:00:00Z
 ```
 
 ## Deploying a Ray cluster
@@ -83,26 +85,53 @@
 
 ### Basic Ray cluster configuration
 
-Create a basic Ray cluster configuration:
+Create a basic Ray cluster configuration. The manifest below works on a standard Giant Swarm workload cluster with PSS-restricted policies enforced by Kyverno (the default on most installations):
 
 ```yaml
-apiVersion: ray.io/v1alpha1
+apiVersion: ray.io/v1
 kind: RayCluster
 metadata:
   name: sample-raycluster
   namespace: default
 spec:
   rayVersion: '2.50.1'
   enableInTreeAutoscaling: true
+  # The operator injects an autoscaler sidecar into the head pod when
+  # enableInTreeAutoscaling is true. PSS-restricted clusters reject it
+  # unless we set its securityContext explicitly.
+  autoscalerOptions:
+    securityContext:
+      runAsNonRoot: true
+      runAsUser: 1000
+      allowPrivilegeEscalation: false
+      seccompProfile:
+        type: RuntimeDefault
+      capabilities:
+        drop: [ALL]
   headGroupSpec:
     rayStartParams:
       dashboard-host: '0.0.0.0'
       block: 'true'
     template:
       spec:
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 1000
+          runAsGroup: 100
+          fsGroup: 100
+          seccompProfile:
+            type: RuntimeDefault
         containers:
         - name: ray-head
           image: rayproject/ray:2.50.1
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1000
+            allowPrivilegeEscalation: false
+            seccompProfile:
+              type: RuntimeDefault
+            capabilities:
+              drop: [ALL]
           ports:
           - containerPort: 6379
             name: gcs-server
@@ -111,12 +140,15 @@
           - containerPort: 10001
             name: client
           resources:
+            # 4Gi memory is the practical minimum for the head: the Ray
+            # dashboard subprocesses sit around ~1.94Gi on idle, so a 2Gi
+            # limit OOMs the moment you submit a job.
             limits:
               cpu: "2"
-              memory: "2Gi"
+              memory: "4Gi"
             requests:
               cpu: "1"
-              memory: "1Gi"
+              memory: "2Gi"
   workerGroupSpecs:
   - replicas: 2
     minReplicas: 1
@@ -125,24 +157,24 @@
     rayStartParams: {}
     template:
       spec:
-        affinity:
-          nodeAffinity:
-            preferredDuringSchedulingIgnoredDuringExecution:
-            - weight: 1
-              preference:
-                matchExpressions:
-                - key: nvidia.com/gpu.present
-                  operator: In
-                  values:
-                  - "true"
-        runtimeClassName: nvidia
-        tolerations:
-          - key: nvidia.com/gpu
-            value: "true"
-            effect: NoSchedule
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 1000
+          runAsGroup: 100
+          fsGroup: 100
+          seccompProfile:
+            type: RuntimeDefault
         containers:
         - name: ray-worker
           image: rayproject/ray:2.50.1
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1000
+            allowPrivilegeEscalation: false
+            seccompProfile:
+              type: RuntimeDefault
+            capabilities:
+              drop: [ALL]
           resources:
             limits:
               cpu: "2"
@@ -158,28 +190,32 @@
 kubectl apply -f ray-cluster.yaml
 ```
 
+**Note**:  The manifest above schedules `Ray` workers on any node. If you want workers to land on GPU nodes, add a `runtimeClassName: nvidia` plus a toleration for the `nvidia.com/gpu` taint to the worker `template.spec`. Drop those settings on non-GPU clusters, they prevent scheduling there.
+
 ### Verifying the Ray cluster deployment
 
 Check the status of your Ray cluster:
 
-```bash
-kubectl get raycluster
+```nohighlight
+$ kubectl get raycluster
 
-NAME                 DESIRED WORKERS   AVAILABLE WORKERS   STATUS   AGE
-sample-raycluster    2                 2                   ready    3m
+NAME                DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
+sample-raycluster   2                 2                   6      6Gi      0      ready    3m
 ```
 
 List the Ray cluster pods:
 
-```bash
-kubectl get pods -l ray.io/cluster=sample-raycluster
+```nohighlight
+$ kubectl get pods -l ray.io/cluster=sample-raycluster
 
 NAME                                          READY   STATUS    RESTARTS   AGE
-sample-raycluster-head-xxxxx                  1/1     Running   0          3m
-sample-raycluster-worker-small-group-xxxxx    1/1     Running   0          3m
-sample-raycluster-worker-small-group-yyyyy    1/1     Running   0          3m
+sample-raycluster-head-xxxxx                  2/2     Running   0          3m
+sample-raycluster-small-group-worker-xxxxx    1/1     Running   0          3m
+sample-raycluster-small-group-worker-yyyyy    1/1     Running   0          3m
 ```
 
+The head pod shows `2/2` containers because the operator injects an autoscaler sidecar alongside the Ray head when `enableInTreeAutoscaling: true`.
+
 ## Accessing the Ray cluster
 
 ### Using the Ray Dashboard
@@ -190,37 +226,64 @@
 kubectl port-forward service/sample-raycluster-head-svc 8265:8265
 ```
 
+`sample-raycluster-head-svc` is a headless service (`ClusterIP: None`), but `kubectl port-forward` resolves it to the head pod and works the same way.
+
 Open your browser and navigate to `http://localhost:8265` to access the Ray Dashboard.
 
 ![Ray UI](ray-ui.png)
 
 ## Running a test job
 
-Once your Ray cluster is running, you can submit a computing job using the Ray Job Submission SDK to test the cluster capabilities.
+Once your Ray cluster is running, submit a computing job to validate it. We'll calculate the value of π using the Monte Carlo method. The Python script lives [in this gist](https://gist.githubusercontent.com/pipo02mix/a32771ec8358d338426c915e2b7a8078/raw/9bb509f37dba7edf09f042cee5e71f78aa0ccb10/dt.py).
 
-First, make sure you have the Ray client on your local machine:
+Make sure the dashboard port is still forwarded:
 
 ```bash
-pip install -U "ray[default]"
+kubectl port-forward service/sample-raycluster-head-svc 8265:8265
 ```
 
-Set up port forwarding to access your Ray cluster:
+You can submit the job in two ways.
+
+### Option A: Ray CLI
+
+Install the Ray client if you don't already have it:
 
 ```bash
-kubectl port-forward service/sample-raycluster-head-svc 8265:8265
+pip install -U "ray[default]"
 ```
 
-Let's calculate the value of pi using the Monte Carlo method. The Python script can be found [in this gist file](https://gist.githubusercontent.com/pipo02mix/a32771ec8358d338426c915e2b7a8078/raw/9bb509f37dba7edf09f042cee5e71f78aa0ccb10/dt.py). You can use this command to submit the job to the Ray cluster API.
+Then submit the job. The `working_dir` points at the gist so you don't need a local copy:
 
 ```bash
-# Submit a job using Ray CLI
 ray job submit \
   --address="http://localhost:8265" \
-  --runtime-env-json='{"pip": ["numpy"], "working_dir": "."}' \
-  -- python https://gist.githubusercontent.com/pipo02mix/a32771ec8358d338426c915e2b7a8078/raw/9bb509f37dba7edf09f042cee5e71f78aa0ccb10/dt.py
+  --runtime-env-json='{"pip": ["numpy"], "working_dir": "https://gist.githubusercontent.com/pipo02mix/a32771ec8358d338426c915e2b7a8078/archive/9bb509f37dba7edf09f042cee5e71f78aa0ccb10.zip"}' \
+  -- python dt.py
+```
+
+### Option B: REST API (no Python required)
+
+If you don't have Python or `ray` installed locally, submit the job directly with `curl`:
+
+```bash
+curl -X POST http://localhost:8265/api/jobs/ \
+  -H "Content-Type: application/json" \
+  -d '{
+    "entrypoint": "python dt.py",
+    "runtime_env": {
+      "pip": ["numpy"],
+      "working_dir": "https://gist.githubusercontent.com/pipo02mix/a32771ec8358d338426c915e2b7a8078/archive/9bb509f37dba7edf09f042cee5e71f78aa0ccb10.zip"
+    }
+  }'
+```
+
+The response includes a `submission_id`. Poll the status with:
+
+```bash
+curl -s http://localhost:8265/api/jobs/<submission_id>
 ```
 
-Observe in the dashboard how the job is executed in parallel and how resources are scaled based on load.
+Either way, observe in the dashboard how the job is executed in parallel and how resources are scaled based on load.
 
 ![Ray Job UI](job-ui.png)