Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
d622357
[DOC] Make RVS parallel execution in all test runner example YAML
yansun1996 Oct 28, 2025
d95fd0a
[Feature] Support offline driver build for OpenShift (#1010)
yansun1996 Oct 29, 2025
eccb82b
[Feature] Update driver build source image description and related do…
yansun1996 Oct 31, 2025
961986e
update utils container (#1019)
biluriuday Oct 29, 2025
eca5d4f
add suspend and resume functionality for remediation workflows (#996)
biluriuday Nov 4, 2025
0d3c53d
[Fix] fix helm e2e and DeviceConfig field rendering (#1029)
yansun1996 Nov 11, 2025
eeefd20
[Helm] Update NFD rules to label vnics along with nics (#968)
yuva29 Sep 23, 2025
3ff0ddf
[Feature] Add default PrometheusRule in OLM bundle to support OCP COO…
yansun1996 Nov 18, 2025
891fd1f
[DOC] do not add toleration to devicePlugin to make it auto-restart a…
yansun1996 Nov 25, 2025
9682a19
[DOC] Add docs for ROCm 7.1.1 new test runner recipes (#1052)
yansun1996 Dec 5, 2025
e0862cf
[DOC] Add notification for new amdgpu versioning scheme (#1065)
yansun1996 Dec 11, 2025
1f16a2d
[Fix] Auto assign PROJECT_VERSION to be default operand image tag (#1…
yansun1996 Dec 15, 2025
cb7f2ae
[Build] update helm lint command to match with helm v3/v4
yansun1996 Dec 17, 2025
33c86e2
remediation e2e tests and bug fixes (#1054)
biluriuday Dec 5, 2025
102ffbc
make max parallel workflows configurable for auto remediation (#1058)
biluriuday Dec 16, 2025
24feaa4
[DOC] Add known limitations and platform supported
yansun1996 Dec 19, 2025
63cc876
[dcm] Documentation updates for DCM instinct docs (#1073)
nikhilsk Dec 17, 2025
e23b156
[Fix] fix blocked helm upgrade due to unexpected node status (#1085)
yansun1996 Jan 5, 2026
60acce7
Adding DCM to list of components in README (#1088)
nikhilsk Jan 7, 2026
58e4e39
[Fix] fix base img registry template in dockerfile
yansun1996 Jan 8, 2026
ce42b4b
remediation e2e tests for suspend and resume actions (#1068)
biluriuday Jan 13, 2026
b590e00
[DOC] Add know issue description for recent Device Plugin fix
yansun1996 Jan 14, 2026
c121134
[CI] Setup hourly build for operator utils image
yansun1996 Jan 20, 2026
b26eeb7
auto remediation - use public test runner image as default image (#1110)
biluriuday Jan 23, 2026
90496de
GPUOP-525 update auto node remediation documentation (#1109)
biluriuday Jan 23, 2026
9cd3392
customize auto node remediation options (#1107)
biluriuday Jan 30, 2026
dc01fcc
[DOC] Fix docs sanity
yansun1996 Feb 5, 2026
422706e
[Build] Deprecate helm charts for OpenShift
yansun1996 Feb 4, 2026
c6d1a7c
GPUOP-540 GPUOP-543 (#1119)
biluriuday Feb 4, 2026
26b0404
[Fix] Add missing node watch RBAC to exporter ClusterRole
yansun1996 Feb 9, 2026
6e3d1a3
[DOC] Make ubuntu precompile dockerfile work for older docker version
yansun1996 Feb 17, 2026
d53b8bd
[Fix] Support building driver on dirty build kernel
yansun1996 Feb 17, 2026
ae48e0c
Add PCI ID for AMD Radeon RX 9060 XT (#1142)
bhatturu Feb 18, 2026
541ac7b
reduce timeout for e2e sanity job (#1117)
biluriuday Feb 16, 2026
e1b6556
update docs and configmap field name (#1132)
biluriuday Feb 17, 2026
40e9f20
auto-node-remediation - provide auto start option to user (#1134)
biluriuday Feb 19, 2026
dce4f78
handle argo crds in helm charts (#1120)
biluriuday Feb 20, 2026
74e3389
Add Radeon AI PRO R9700 to gpu-nfd-default-rule files (#416)
tzmtl Feb 24, 2026
25cd316
docs: fix broken DCM ConfigMap link and Mangaer typo in applying-part…
ci-penbot-01 Feb 24, 2026
bec31e2
gpuop-566 fix for maxAllowedRuns exceeding set limit (#1152) (#443)
ci-penbot-01 Feb 24, 2026
f43667f
docs: add OpenShift/OCP guidance to DCM partitioning docs (#449)
nikhilsk Feb 25, 2026
7079260
[Fix] amend missing RBAC for ANR to work on OpenShift and fix templat…
yansun1996 Feb 26, 2026
c9fb0a0
[Fix] fix the empty pod error and slowness of techsupport script (#11…
yansun1996 Feb 27, 2026
5ba7be9
remediation device config section validation (#1146) (#448)
biluriuday Mar 2, 2026
bb4255a
[Fix] allow overriding TEST_ARGS through ENV (#1165) (#453)
ci-penbot-01 Mar 2, 2026
e26f98d
[DOC] Add ca-certificates to installation prerequisites (#1154) (#447)
ci-penbot-01 Mar 2, 2026
d2419ea
Add DRA Driver support in Operator CRD (#1127)
bhatnitish Feb 9, 2026
a151d06
Add DRA Driver Support in GPU Operator (#1139)
bhatnitish Feb 25, 2026
7a8a2b4
Enhance e2e test debugging with per-test artifact collection (#1162)
bhatnitish Feb 26, 2026
20871f9
e2e test enhancements (#1164)
bhatnitish Mar 2, 2026
568b196
Add DRA Driver docs to sphinx TOC
bhatnitish Mar 3, 2026
31055de
[DOC] Add openshift installation instructions by CLI
yansun1996 Mar 3, 2026
f9b17a7
[DOC] Fix for air-gapped installation docs for Ubuntu
yansun1996 Mar 3, 2026
4c97ff3
Add default DRA Driver image
bhatnitish Mar 5, 2026
a835210
feat: add configurable PodResourceAPISocketPath for device plugin
uMetalooper Mar 3, 2026
f09c330
amend changes for adding podResourceAPISocketPath
yansun1996 Mar 5, 2026
3889f2c
fix: keep container MountPath hardcoded for kubelet device plugin socket
uMetalooper Mar 5, 2026
1c2ef6b
move back to KubeletSocketPath
yansun1996 Mar 6, 2026
ae3b034
[Feature] OpenSource GPU Validation Cluster (#463)
yansun1996 Mar 6, 2026
701291a
Add default DRA Driver image (#1178) (#461)
ci-penbot-01 Mar 9, 2026
4b345a6
GPUOP-569 cleanup crd during helm uninstall (#1177) (#462)
ci-penbot-01 Mar 9, 2026
864d30c
Add cmdLineArguments docs and CLI options reference for DRA driver (#…
bhatnitish Mar 11, 2026
184875c
auto node remediation fixes and npd documentation update (#466)
biluriuday Mar 11, 2026
1849706
[Fix] Amend Radeon Pro W7900D PCI ID into NFD rule (#468)
yansun1996 Mar 12, 2026
89a9a99
[Fix] fix the AutoNodeRemediation pods check in OpenShift (#1187) (#470)
ci-penbot-01 Mar 12, 2026
a2d5641
update troubleshooting documentation section (#1190) (#471)
ci-penbot-01 Mar 12, 2026
92e0d5d
feat: add HostNetwork field to DevicePlugin and MetricsExporter (#465)
uMetalooper Mar 13, 2026
55b25ff
[Fix] fix the values.yaml handling for auto node remediation (#1188) …
yansun1996 Mar 13, 2026
2edd32e
GPUOP-419 document device plugin crash to known issues list (#1194) (…
ci-penbot-01 Mar 13, 2026
80db4d8
[Feature] Add support to disable watch for KMM resources to totally d…
ci-penbot-01 Mar 17, 2026
3be6623
GPUOP-583 584 ANR fixes (#1202) (#475)
ci-penbot-01 Mar 18, 2026
5597725
[Feature] Global image pull secrets injection (#1196) (#476)
yansun1996 Mar 19, 2026
3049a51
Address Feedback of GPU validation cluster (#1206) (#478)
ci-penbot-01 Mar 23, 2026
1950fbe
GPUOP-588 filter out workflow related pods from drain step (#1210) (#…
ci-penbot-01 Mar 24, 2026
e063875
[Fix] collect KMM builder/signer pod logs in techsupport (#1225) (#486)
yansun1996 Mar 26, 2026
5a9da19
[Build] Misc fix on build and versioning (#485)
yansun1996 Mar 27, 2026
1d12fff
Remove EPEL, BaseOS and AppStream repos from KMM module build in Core…
LaVLaS Mar 30, 2026
301bba3
Add DRA driver support to techsupport collection (#1239) (#487)
ci-penbot-01 Mar 31, 2026
1838126
Add helm uninstall hang troubleshooting section (#1240) (#489)
ci-penbot-01 Mar 31, 2026
f4882b9
fix gpuop-595 and 601 (#1246) (#491)
ci-penbot-01 Mar 31, 2026
8364180
[Docs] fix incorrect pod label selector in DRA driver verification co…
ci-penbot-01 Mar 31, 2026
84b9a49
handle imagePullSecrets in ANR workflow test runner and kmmmodule (#1…
ci-penbot-01 Mar 31, 2026
bd0756d
[Fix] Base image upgrade to reduce CVE (#1227) (#493)
yansun1996 Apr 1, 2026
0f60b65
[CP 1235] upgrade Argo workflow CRDs and controller to v4.0.3 (#495)
ci-penbot-01 Apr 2, 2026
a80ca2f
update ignore file and dir list (#497)
spraveenio Apr 2, 2026
ac2086b
skip non rendering files and directories (#498)
spraveenio Apr 3, 2026
5abf58b
[Fix] GPUOP-607 fail the ANR workflow when imagePullBackOff (#1274) (…
ci-penbot-01 Apr 3, 2026
6705daa
[CP 1283] GPUOP-618 fix helm upgrade issue with latest Argo CRDs (#501)
ci-penbot-01 Apr 3, 2026
b18f660
anr - fixes (#1268) (#502)
ci-penbot-01 Apr 3, 2026
114969c
[Fix] GPUOP-608 read workflow directly from apisrv to avoid race cond…
ci-penbot-01 Apr 3, 2026
8ccaa90
[Fix] upgrade grpc to address CVE-2026-33186 (#1272) (#505)
yansun1996 Apr 3, 2026
1b9635b
ignore file list update (#517)
spraveenio Apr 10, 2026
2d94c88
enable npd and anr e2e sims (#1245) (#506)
ci-penbot-01 Apr 21, 2026
15957b7
Fix Node Assignments calculations on operator restart (#1242) (#507)
ci-penbot-01 Apr 21, 2026
21a7cc1
DCM: mount default ConfigMap when spec.configManager.config is omitte…
ci-penbot-01 Apr 21, 2026
9dd94e0
feat(tests/k8s-e2e): add GPU Operator e2e test suite with DME verific…
ci-penbot-01 Apr 21, 2026
498f0b4
disble upgrade jobs - will be running them from v1.5.0 throttle (#135…
ci-penbot-01 Apr 21, 2026
36662f0
[DOC] Add SecurityContextConstraints RBAC for OpenShift support in ma…
ci-penbot-01 Apr 21, 2026
0a74ce0
GPUOP-640 update remediation documentation (#1372) (#1379) (#529)
ci-penbot-01 Apr 21, 2026
d130246
metricsclient cli change (#1293) (#521)
ci-penbot-01 Apr 21, 2026
11ef5cd
[Fix] Add workflow and workflow-triggered pod collection to techsuppo…
ci-penbot-01 Apr 21, 2026
4e279a2
[Documentation] Update OpenShift (OLM) installation instruction to in…
LaVLaS Apr 21, 2026
e2d7a4c
e2e test infra enhancements (#1172)
bhatnitish Apr 24, 2026
4d5058e
Add DRA driver RBAC for OpenShift support (#1344)
bhatnitish Apr 16, 2026
0296a94
Create DeviceClass from operator code on OpenShift when DRA is enable…
bhatnitish Apr 22, 2026
77f6744
add support for detecting SLES nodes and automatically selecting appr…
Priyankasaggu11929 Oct 23, 2025
db84bdf
add SLES Dockerfile template (DockerfileTemplate.sles) using prebuilt…
Priyankasaggu11929 Oct 23, 2025
f322b40
tests: update internal/utils_test.go for added support for SLES 15 SP*
Priyankasaggu11929 Oct 23, 2025
7dfec5e
use "registry.suse.com" as the default base image registry if OS == "…
Priyankasaggu11929 Oct 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 2 additions & 6 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,11 @@ name: Linting

on:
push:
branches:
- develop
branches:
- main
- staging
pull_request:
branches:
- develop
branches:
- main
- staging

jobs:
call-workflow-passing-data:
Expand Down
22 changes: 20 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
*.tgz
/*gpu-operator*.tgz
/helm-charts-k8s/charts/*.tgz
/helm-charts-openshift/charts/*.tgz
*.out

# node app for e2e test
Expand All @@ -12,4 +11,23 @@ nodeapp

# Sphinx documentation
_build/
.venv/
.venv/

.job.yml
branch_policy.yml
dev.env
box.rb
entrypoint.sh
asset-build/*
ci-internal/*
devops/*
tests/jobs/*
tests/pytests/*

.cline_storage
.vscode
branch_policy.yml
tools-internal/*
CLAUDE.md
.claude/*
.copilot/*
297 changes: 297 additions & 0 deletions .job.yml

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions .markdownlint-cli2.jsonc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"globs": ["**/*.md"],
"ignores": [
"**/vendor/**",
"**/.git/**",
"**/ci-internal/**",
"**/docs/cicd/**",
"**/tests/e2e/**",
"**/helm-charts-k8s/README.md",
"**/internal-example/**",
"**/.claude/**",
"**/tests/pytests/**",
"**/knowledge/**",
"**/tests/test-plan/**"
]
}
Loading