feat: add imagePullSecrets support for container-based skills#1725
feat: add imagePullSecrets support for container-based skills#1725ppeau wants to merge 3 commits intokagent-dev:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class support for authenticating OCI pulls for container-based skills by allowing agents to reference kubernetes.io/dockerconfigjson secrets and wiring those credentials into the skills-init workflow.
Changes:
- Extend the Agent/SandboxAgent
spec.skillsschema and Go types withimagePullSecrets. - Update the agent manifest translation to optionally prepend a
docker-auth-initinitContainer that merges multiple dockerconfigjson secrets into a single Dockerconfig.json, and setDOCKER_CONFIGforskills-init. - Update the
skills-initimage to includejq, and add unit/e2e coverage for the new behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| helm/kagent-crds/templates/kagent.dev_sandboxagents.yaml | Exposes spec.skills.imagePullSecrets in the Helm-rendered SandboxAgent CRD schema. |
| helm/kagent-crds/templates/kagent.dev_agents.yaml | Exposes spec.skills.imagePullSecrets in the Helm-rendered Agent CRD schema. |
| go/core/test/e2e/invoke_api_test.go | Adds an e2e test verifying docker-auth-init is injected and the agent still functions end-to-end. |
| go/core/internal/controller/translator/agent/manifest_builder.go | Passes ImagePullSecrets through and adapts to buildSkillsInitContainer returning multiple init containers. |
| go/core/internal/controller/translator/agent/git_skills_test.go | Adds translator unit tests validating volumes/mounts/env for imagePullSecrets. |
| go/core/internal/controller/translator/agent/adk_api_translator.go | Implements docker-auth-init, merge script generation, volume/mount wiring, and DOCKER_CONFIG env injection. |
| go/api/v1alpha2/zz_generated.deepcopy.go | Regenerates DeepCopy to include the new ImagePullSecrets field. |
| go/api/v1alpha2/agent_types.go | Adds ImagePullSecrets []LocalObjectReference to SkillForAgent. |
| go/api/config/crd/bases/kagent.dev_sandboxagents.yaml | Updates the base CRD schema for SandboxAgent to include imagePullSecrets. |
| go/api/config/crd/bases/kagent.dev_agents.yaml | Updates the base CRD schema for Agent to include imagePullSecrets. |
| docker/skills-init/Dockerfile | Installs jq so the merge init container can build a combined Docker config. |
| .gitattributes | Forces LF endings for *.sh.tmpl to avoid cross-platform template/script breakage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| container, skillsVolumes, err := buildSkillsInitContainer( | ||
| gitRefs, | ||
| spec.Skills.GitAuthSecretRef, | ||
| skills, | ||
| spec.Skills.InsecureSkipVerify, | ||
| manifestCtx.deployment.SecurityContext, | ||
| initEnv, | ||
| getDefaultResources(initResources), | ||
| spec.Skills.ImagePullSecrets, | ||
| ) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to build skills init container: %w", err) | ||
| } | ||
|
|
||
| *volumes = append(*volumes, skillsVolumes...) | ||
| return []corev1.Container{container}, nil | ||
| return container, nil |
There was a problem hiding this comment.
buildSkillsInitContainer now returns a slice of containers, but the receiving variable is still named container, which makes the call site harder to read. Consider renaming it to containers (or similar) to reflect the type and avoid confusion.
| for _, secret := range imagePullSecrets { | ||
| volName := "pull-secret-" + secret.Name | ||
| volumes = append(volumes, corev1.Volume{ | ||
| Name: volName, | ||
| VolumeSource: corev1.VolumeSource{ | ||
| Secret: &corev1.SecretVolumeSource{ | ||
| SecretName: secret.Name, | ||
| }, | ||
| }, | ||
| }) |
There was a problem hiding this comment.
Volume names are derived from the Secret name ("pull-secret-" + secret.Name). Kubernetes Secret names may contain characters (notably '.') and/or length that are invalid for Pod volume names, which can make the generated Deployment fail admission. Consider generating a safe volume name (e.g., index-based or hashed) and keep the SecretName field pointing at the original secret.
| // Shared EmptyDir volume for the merged Docker config. | ||
| volumes = append(volumes, corev1.Volume{ | ||
| Name: "kagent-docker-config", | ||
| VolumeSource: corev1.VolumeSource{ | ||
| EmptyDir: &corev1.EmptyDirVolumeSource{}, | ||
| }, | ||
| }) | ||
|
|
||
| // Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>. | ||
| authInitVolumeMounts := []corev1.VolumeMount{ | ||
| {Name: "kagent-docker-config", MountPath: "/docker-config-out"}, | ||
| } | ||
| for _, secret := range imagePullSecrets { | ||
| volName := "pull-secret-" + secret.Name | ||
| volumes = append(volumes, corev1.Volume{ | ||
| Name: volName, | ||
| VolumeSource: corev1.VolumeSource{ | ||
| Secret: &corev1.SecretVolumeSource{ | ||
| SecretName: secret.Name, | ||
| }, | ||
| }, | ||
| }) | ||
| authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{ | ||
| Name: volName, | ||
| MountPath: "/docker-secrets/" + secret.Name, | ||
| ReadOnly: true, | ||
| }) | ||
| } | ||
|
|
||
| mergeScript := buildDockerAuthMergeScript(imagePullSecrets) | ||
| dockerAuthInitContainer := corev1.Container{ | ||
| Name: "docker-auth-init", | ||
| Image: DefaultSkillsInitImageConfig.Image(), | ||
| Command: []string{"/bin/sh", "-c", mergeScript}, | ||
| VolumeMounts: authInitVolumeMounts, | ||
| } | ||
| containers = append(containers, dockerAuthInitContainer) | ||
|
|
||
| // Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG. | ||
| volumeMounts = append(volumeMounts, corev1.VolumeMount{ | ||
| Name: "kagent-docker-config", | ||
| MountPath: "/.kagent/.docker", | ||
| ReadOnly: true, | ||
| }) |
There was a problem hiding this comment.
imagePullSecrets entries with an empty Name (LocalObjectReference allows this) or repeated names will currently produce invalid SecretVolumeSource.SecretName values and/or duplicate volume names. It would be safer to validate that every reference has a non-empty name and to de-duplicate (or error) before creating volumes/mounts.
| // Shared EmptyDir volume for the merged Docker config. | |
| volumes = append(volumes, corev1.Volume{ | |
| Name: "kagent-docker-config", | |
| VolumeSource: corev1.VolumeSource{ | |
| EmptyDir: &corev1.EmptyDirVolumeSource{}, | |
| }, | |
| }) | |
| // Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>. | |
| authInitVolumeMounts := []corev1.VolumeMount{ | |
| {Name: "kagent-docker-config", MountPath: "/docker-config-out"}, | |
| } | |
| for _, secret := range imagePullSecrets { | |
| volName := "pull-secret-" + secret.Name | |
| volumes = append(volumes, corev1.Volume{ | |
| Name: volName, | |
| VolumeSource: corev1.VolumeSource{ | |
| Secret: &corev1.SecretVolumeSource{ | |
| SecretName: secret.Name, | |
| }, | |
| }, | |
| }) | |
| authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{ | |
| Name: volName, | |
| MountPath: "/docker-secrets/" + secret.Name, | |
| ReadOnly: true, | |
| }) | |
| } | |
| mergeScript := buildDockerAuthMergeScript(imagePullSecrets) | |
| dockerAuthInitContainer := corev1.Container{ | |
| Name: "docker-auth-init", | |
| Image: DefaultSkillsInitImageConfig.Image(), | |
| Command: []string{"/bin/sh", "-c", mergeScript}, | |
| VolumeMounts: authInitVolumeMounts, | |
| } | |
| containers = append(containers, dockerAuthInitContainer) | |
| // Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG. | |
| volumeMounts = append(volumeMounts, corev1.VolumeMount{ | |
| Name: "kagent-docker-config", | |
| MountPath: "/.kagent/.docker", | |
| ReadOnly: true, | |
| }) | |
| validImagePullSecrets := make([]corev1.LocalObjectReference, 0, len(imagePullSecrets)) | |
| seenImagePullSecrets := make(map[string]struct{}, len(imagePullSecrets)) | |
| for _, secret := range imagePullSecrets { | |
| if secret.Name == "" { | |
| continue | |
| } | |
| if _, seen := seenImagePullSecrets[secret.Name]; seen { | |
| continue | |
| } | |
| seenImagePullSecrets[secret.Name] = struct{}{} | |
| validImagePullSecrets = append(validImagePullSecrets, secret) | |
| } | |
| if len(validImagePullSecrets) > 0 { | |
| // Shared EmptyDir volume for the merged Docker config. | |
| volumes = append(volumes, corev1.Volume{ | |
| Name: "kagent-docker-config", | |
| VolumeSource: corev1.VolumeSource{ | |
| EmptyDir: &corev1.EmptyDirVolumeSource{}, | |
| }, | |
| }) | |
| // Mount each imagePullSecret as a read-only directory under /docker-secrets/<name>. | |
| authInitVolumeMounts := []corev1.VolumeMount{ | |
| {Name: "kagent-docker-config", MountPath: "/docker-config-out"}, | |
| } | |
| for _, secret := range validImagePullSecrets { | |
| volName := "pull-secret-" + secret.Name | |
| volumes = append(volumes, corev1.Volume{ | |
| Name: volName, | |
| VolumeSource: corev1.VolumeSource{ | |
| Secret: &corev1.SecretVolumeSource{ | |
| SecretName: secret.Name, | |
| }, | |
| }, | |
| }) | |
| authInitVolumeMounts = append(authInitVolumeMounts, corev1.VolumeMount{ | |
| Name: volName, | |
| MountPath: "/docker-secrets/" + secret.Name, | |
| ReadOnly: true, | |
| }) | |
| } | |
| mergeScript := buildDockerAuthMergeScript(validImagePullSecrets) | |
| dockerAuthInitContainer := corev1.Container{ | |
| Name: "docker-auth-init", | |
| Image: DefaultSkillsInitImageConfig.Image(), | |
| Command: []string{"/bin/sh", "-c", mergeScript}, | |
| VolumeMounts: authInitVolumeMounts, | |
| } | |
| containers = append(containers, dockerAuthInitContainer) | |
| // Mount the merged config into skills-init so krane picks it up via DOCKER_CONFIG. | |
| volumeMounts = append(volumeMounts, corev1.VolumeMount{ | |
| Name: "kagent-docker-config", | |
| MountPath: "/.kagent/.docker", | |
| ReadOnly: true, | |
| }) | |
| } |
| Name: "docker-auth-init", | ||
| Image: DefaultSkillsInitImageConfig.Image(), | ||
| Command: []string{"/bin/sh", "-c", mergeScript}, | ||
| VolumeMounts: authInitVolumeMounts, |
There was a problem hiding this comment.
docker-auth-init is created without SecurityContext or resource requirements, while skills-init uses the pod/deployment securityContext and configured resources. This can cause PodSecurity admission failures or unexpected resource usage differences. Consider applying the same initSecCtx and resources (or a deliberate minimal set) to docker-auth-init as well.
| Name: "docker-auth-init", | |
| Image: DefaultSkillsInitImageConfig.Image(), | |
| Command: []string{"/bin/sh", "-c", mergeScript}, | |
| VolumeMounts: authInitVolumeMounts, | |
| Name: "docker-auth-init", | |
| Image: DefaultSkillsInitImageConfig.Image(), | |
| Command: []string{"/bin/sh", "-c", mergeScript}, | |
| VolumeMounts: authInitVolumeMounts, | |
| SecurityContext: initSecCtx, | |
| Resources: resources, |
2129a46 to
58d8a73
Compare
EItanya
left a comment
There was a problem hiding this comment.
Overall this makes sense, but before we go down the road of adding a new API, is there anyway we can re-use the ImagePullSecrets which already get used for image pulling, or do those remain on the node and never mounted into the pod themselves?
|
Hi @EItanya, great question. We actually explored reusing the existing deployment imagePullSecrets first before adding a new field. On the technical side, imagePullSecrets defined on a pod spec are consumed exclusively by the kubelet to pull container images. They are never mounted into the pod or made accessible to running containers. This means krane, executing inside the skills-init init container, has no way to read those credentials. We hit this wall directly during testing. Even if it were technically possible, there is a design reason why it would not be the right approach in enterprise environments. The registry used to deploy the kagent system and the registry hosting skill images are typically owned by completely different teams with different security boundaries. The platform/ops team manages the kagent deployment and its registry credentials, while the line-of-business team produces and owns the skill images, hosted in their own private registry (Artifactory, ACR, ECR, etc.). Reusing the deployment imagePullSecrets would couple these two security contexts together, violate the principle of least privilege, and make skills effectively unusable for any team whose registry differs from the one used to deploy kagent. That is the common case at scale. The new imagePullSecrets field under spec.skills directly mirrors the standard Kubernetes pattern where a pod can reference multiple imagePullSecrets for different registries. It introduces no new concept, just applies the same model at the skill level. Happy to discuss further if needed! |
Ok I buy that logic. What do you think about renaming the field to |
Good point, I can see both sides here. For keeping imagePullSecrets: Skills are stored and pulled as OCI artifacts using the exact same kubernetes.io/dockerconfigjson secrets as normal container images. The name imagePullSecrets is the standard Kubernetes convention, so it feels familiar right away. Anyone who’s used Kubernetes already knows what it means and how to set it up. For renaming to pullSecrets: Skills aren’t executed as containers, they’re more like content or configuration that gets pulled. The original imagePullSecrets name is specifically tied to pulling runnable container images at the pod level, so using it in this context could feel a bit overloaded. pullSecrets is more neutral and probably more accurate here. Both options are valid. Just say the word and I’ll rename the field right away. 👍 |
Ok I buy that logic, let's stick with it for now. Just resolve merge conflicts and we'll get this merged |
Add authentication support for pulling skill images from private registries (Artifactory, ACR, ECR, etc.) by introducing a new imagePullSecrets field under spec.skills. When imagePullSecrets is set, a docker-auth-init init container is prepended that merges all kubernetes.io/dockerconfigjson secrets into a single config.json using jq. The skills-init container then reads that config via the DOCKER_CONFIG env var, which krane picks up automatically when pulling skill images. Closes kagent-dev#1222 Signed-off-by: ppeau <patrice.peau@gmail.com>
acde26f to
2689d6c
Compare
Done! ✅ |
| func buildDockerAuthMergeScript(imagePullSecrets []corev1.LocalObjectReference) string { | ||
| var sb strings.Builder | ||
| sb.WriteString(`set -e | ||
| mkdir -p /docker-config-out | ||
| merged='{"auths":{}}' | ||
| `) | ||
| for _, secret := range imagePullSecrets { | ||
| sb.WriteString(`if [ -f /docker-secrets/` + secret.Name + `/.dockerconfigjson ]; then | ||
| merged="$(printf '%s\n%s\n' "$merged" "$(cat /docker-secrets/` + secret.Name + `/.dockerconfigjson)" | jq -s '.[0].auths * .[1].auths | {"auths": .}')" | ||
| fi | ||
| `) | ||
| } | ||
| sb.WriteString(`printf '%s' "$merged" > /docker-config-out/config.json | ||
| `) | ||
| return sb.String() |
There was a problem hiding this comment.
Sorry for the continue reviews, is there anyway you could put this into a tmpl file similar to this existing one. In the future I want to move away from these scripts altogether, but I think they're a bit simpler to understand for now.
There was a problem hiding this comment.
Perfect, I'll check that out right away. I'll get back to you as soon as it's ready.
There was a problem hiding this comment.
Hi @Eltanya, done! The docker-auth-init script has been moved to a dedicated docker-auth-init.sh.tmpl file, following the same pattern as skills-init.sh.tmpl (//go:embed + template.Must + typed data struct). All existing tests pass including the imagePullSecrets ones.
I'll let you resolve the conversation if it looks good to you 👍
…n-minor-patch group (kagent-dev#1793) Bumps the python-minor-patch group in /python with 1 update: [authlib](https://github.com/authlib/authlib). Updates `authlib` from 1.7.0 to 1.7.1 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/authlib/authlib/releases">authlib's releases</a>.</em></p> <blockquote> <h2>v1.7.1</h2> <h2>What's Changed</h2> <ul> <li>Fix authlib.jose deprecation warning poping from _joserfc_helpers by <a href="https://github.com/azmeuk"><code>@azmeuk</code></a> in <a href="https://redirect.github.com/authlib/authlib/pull/881">authlib/authlib#881</a></li> <li>Fix redirecting to unvalidated <code>redirect_uri</code> on <code>InvalidScopeError</code> in <code>OpenIDImplicitGrant</code> and <code>OpenIDHybridGrant</code>.</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/authlib/authlib/compare/v1.7.0...v1.7.1">https://github.com/authlib/authlib/compare/v1.7.0...v1.7.1</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/authlib/authlib/commit/485016a31b64e5aae93e5ce21b34517ce59ecf1d"><code>485016a</code></a> chore: bump to 1.7.1</li> <li><a href="https://github.com/authlib/authlib/commit/7b4ecd7c88ac96ba7b759187a4b6d109d9031ee0"><code>7b4ecd7</code></a> fix: redirecting to unvalidated redirect_uri on InvalidScopeError in OIDC grants</li> <li><a href="https://github.com/authlib/authlib/commit/c304a216512e449d1440b159960ba319f4300d2b"><code>c304a21</code></a> Merge pull request <a href="https://redirect.github.com/authlib/authlib/issues/881">#881</a> from azmeuk/880-deprecation-warnings</li> <li><a href="https://github.com/authlib/authlib/commit/4165ada169ef9dfa41081b7071bc82420b665c1e"><code>4165ada</code></a> fix: authlib.jose deprecation warning poping from _joserfc_helpers</li> <li>See full diff in <a href="https://github.com/authlib/authlib/compare/v1.7.0...1.7.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: ppeau <patrice.peau@gmail.com>
Signed-off-by: ppeau <patrice.peau@gmail.com>
daaa447 to
9a9ce6e
Compare
Closes #1222
Problem
Container-based skills using
kraneto pull OCI images had no way to authenticate against private registries (Artifactory, ACR, ECR, etc.). TheimagePullSecretsdefined on the agent deployment were not passed to theskills-initinit container, causing authentication failures like:No matching credentials were found for "docker.artifactory.dev.example.com"
Error: pulling ...: Authentication is required
Solution
Follows the approach discussed in #1222 by @s10gopal:
imagePullSecretsfield underspec.skillsaccepting a list ofkubernetes.io/dockerconfigjsonsecretsimagePullSecretsis set, a newdocker-auth-initinit container is prepended — it merges all referenced secrets into a singleconfig.jsonusingjqskills-initcontainer reads that merged config via theDOCKER_CONFIGenv var, whichkranepicks up automatically when pulling skill imagesChanges
go/api/v1alpha2/agent_types.go: addImagePullSecrets []corev1.LocalObjectReferencetoSkillForAgentstructgo/api/v1alpha2/zz_generated.deepcopy.go: regenerated DeepCopy for new fieldgo/core/internal/controller/translator/agent/adk_api_translator.go:buildSkillsInitContainernow returns[]Container, prependsdocker-auth-initwhenimagePullSecretsare presentdocker/skills-init/Dockerfile: addjqto the Alpine base image.gitattributes: enforce LF line endings on*.sh.tmplfiles (prevents shell script breakage on Windows contributors)Usage
Testing
Validated end-to-end on a local Kubernetes cluster with a private registry protected by htpasswd authentication:
Skill image hosted on the private registry, inaccessible without credentials
Agent configured with imagePullSecrets referencing a dockerconfigjson secret
docker-auth-init merged the credentials, skills-init pulled the image successfully via krane
Skill was correctly loaded and executed by the agent