Skip to content

OCPBUGS-64841: passwd & group: Add containers user & group#224

Merged
dustymabe merged 2 commits intocoreos:mainfrom
travier:main-fix-containers-user-group
Mar 28, 2026
Merged

OCPBUGS-64841: passwd & group: Add containers user & group#224
dustymabe merged 2 commits intocoreos:mainfrom
travier:main-fix-containers-user-group

Conversation

@travier
Copy link
Copy Markdown
Member

@travier travier commented Mar 26, 2026

group: Add openvswitch to hugetlbfs group

The openvswitch user and group have been part of the passwd & group
files for, at least, as long as we've published RHCOS sources publicly:

We did not remove them when we re-visited our fixed UIDs/GID in the
split between the RHEL boot image and the new OCP node image ([1], [2] &
[3]). Thus they are now part of the base RHEL boot image, even though
the openvswitch package is not included there.

Although technically unnecessary, this is fine and simplify things a bit
as we do not have to update the user & group entries during the node
image build, which is currently a problematic topic (see [4]).

Thus instead of adding openvswitch to hugetlbfs group in the node image
build, we add it here directly to simplify the logic.

[1] openshift/os#1661
[2] #29
[3] #31
[4] openshift/os#1917


passwd & group: Add containers user & group

Adding users and groups during a container image layered build is
currently non-ergonomic with bootable containers. Thus instead of doing
that in openshift/os for the node layer, we directly include the user &
group here, which also guarentees us that the UID/GID remain stable.

See openshift/os#1917 for the original version
of this change and the full details about what makes adding user/group
in the node layer non-ergonomic.

Unfortunately we can not use the UID/GID that were used in the last
"full" RHCOS image (4.18) as those are now used for dnsmasq (see [1]).
Thus use the first UID & GID available for both user and group, going
downward.

[1] openshift/os#1917 (comment)

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-64841

travier added 2 commits March 26, 2026 11:34
The openvswitch user and group have been part of the passwd & group
files for, at least, as long as we've published RHCOS sources publicly:
- https://github.com/openshift/os/blame/bdb5b8153ed68c88e2485d9e7bd66ea6eb54d6c1/passwd#L27
- https://github.com/openshift/os/blame/release-4.19/group#L47

We did not remove them when we re-visited our fixed UIDs/GID in the
split between the RHEL boot image and the new OCP node image ([1], [2] &
[3]). Thus they are now part of the base RHEL boot image, even though
the openvswitch package is not included there.

Although technically unnecessary, this is fine and simplify things a bit
as we do not have to update the user & group entries during the node
image build, which is currently a problematic topic (see [4]).

Thus instead of adding openvswitch to hugetlbfs group in the node image
build, we add it here directly to simplify the logic.

[1] openshift/os#1661
[2] coreos#29
[3] coreos#31
[4] openshift/os#1917
Adding users and groups during a container image layered build is
currently non-ergonomic with bootable containers. Thus instead of doing
that in openshift/os for the node layer, we directly include the user &
group here, which also guarentees us that the UID/GID remain stable.

See openshift/os#1917 for the original version
of this change and the full details about what makes adding user/group
in the node layer non-ergonomic.

Unfortunately we can not use the UID/GID that were used in the last
"full" RHCOS image (4.18) as those are now used for dnsmasq (see [1]).
Thus use the first UID & GID available for both user and group, going
downward.

[1] openshift/os#1917 (comment)

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-64841
@openshift-ci-robot
Copy link
Copy Markdown

@travier: This pull request references Jira Issue OCPBUGS-64841, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

group: Add openvswitch to hugetlbfs group

The openvswitch user and group have been part of the passwd & group
files for, at least, as long as we've published RHCOS sources publicly:

We did not remove them when we re-visited our fixed UIDs/GID in the
split between the RHEL boot image and the new OCP node image ([1], [2] &
[3]). Thus they are now part of the base RHEL boot image, even though
the openvswitch package is not included there.

Although technically unnecessary, this is fine and simplify things a bit
as we do not have to update the user & group entries during the node
image build, which is currently a problematic topic (see [4]).

Thus instead of adding openvswitch to hugetlbfs group in the node image
build, we add it here directly to simplify the logic.

[1] openshift/os#1661
[2] #29
[3] #31
[4] openshift/os#1917


passwd & group: Add containers user & group

Adding users and groups during a container image layered build is
currently non-ergonomic with bootable containers. Thus instead of doing
that in openshift/os for the node layer, we directly include the user &
group here, which also guarentees us that the UID/GID remain stable.

See openshift/os#1917 for the original version
of this change and the full details about what makes adding user/group
in the node layer non-ergonomic.

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-64841

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'containers' user and group, both assigned ID 790, which are intended for rootless container operations. Additionally, the 'hugetlbfs' group entry has been updated to include 'openvswitch' as a member. There is no feedback to provide.

@travier travier force-pushed the main-fix-containers-user-group branch from b687d93 to 5d3577a Compare March 26, 2026 11:31
@travier travier requested review from dustymabe and jlebon March 26, 2026 11:32
travier added a commit to travier/os that referenced this pull request Mar 26, 2026
We are moving the group inclusion directly to the RHEL base image
instead of working around it here in the OCP node layer.

See: openshift#1917
See: coreos/rhel-coreos-config#224
See: https://redhat.atlassian.net/browse/OCPBUGS-64841
@travier
Copy link
Copy Markdown
Member Author

travier commented Mar 26, 2026

Workaround removal for the node layer: openshift/os#1918

@dustymabe
Copy link
Copy Markdown
Member

This looks good from my perspective but would love someone who knows more to review/approve too.

I will note that I assume we want to apply this to the rhel-9.6 branch too probably? It's not 100% clear, but rhel-9.6 builds diffefently (the legacy path not via container) so it's worth confirming the behavior there too when we do that backport.

/approve

@travier
Copy link
Copy Markdown
Member Author

travier commented Mar 27, 2026

Yes, we'll need it in RHEL 9.6 as well.

@travier
Copy link
Copy Markdown
Member Author

travier commented Mar 27, 2026

× kola-runext-72.service
     Loaded: loaded (/etc/systemd/system/kola-runext-72.service; static)
     Active: failed (Result: exit-code) since Thu 2026-03-26 12:45:16 UTC; 414ms ago
   Duration: 1.768s
 Invocation: 4199947779ce40f695e81cafbe5d1b5a
    Process: 23786 ExecStart=/usr/local/bin/kola-runext-file-context-policy-match (code=exited, status=1/FAILURE)
   Main PID: 23786 (code=exited, status=1/FAILURE)
   Mem peak: 57.5M
        CPU: 1.576s

Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + '[' -n 'Would relabel /var/home/core/.local/share/containers/storage/overlay-containers from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0' ']'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + exceptions=(['/var/opt/cni']='1' ['/etc/iscsi/initiatorname.iscsi']='1')
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + declare -A exceptions
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23792]: ++ echo 'Would relabel /var/home/core/.local/share/containers/storage/overlay-containers from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23792]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23792]: Would relabel /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json from system_u:object_r:data_home_t:s0 to system_u:object_r:container_ro_file_t:s0'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23793]: ++ grep 'Would relabel'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23794]: ++ cut -d ' ' -f 3
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + paths='/var/home/core/.local/share/containers/storage/overlay-containers
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + found=
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + read -r path
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + [[ noexception == \n\o\e\x\c\e\p\t\i\o\n ]]
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + echo 'Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + found=1
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + read -r path
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + [[ noexception == \n\o\e\x\c\e\p\t\i\o\n ]]
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + echo 'Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers/containers.lock
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + found=1
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + read -r path
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + [[ noexception == \n\o\e\x\c\e\p\t\i\o\n ]]
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + echo 'Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Unexpected mislabeled file found: /var/home/core/.local/share/containers/storage/overlay-containers/volatile-containers.json
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + found=1
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + read -r path
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + '[' 1 == 1 ']'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + fatal 'Some unexpected mislabeled files were found.'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + echo 'Some unexpected mislabeled files were found.'
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: Some unexpected mislabeled files were found.
Mar 26 12:45:16 qemu0 kola-runext-file-context-policy-match[23786]: + exit 1
Mar 26 12:45:16 qemu0 systemd[1]: kola-runext-72.service: Main process exited, code=exited, status=1/FAILURE
Mar 26 12:45:16 qemu0 systemd[1]: kola-runext-72.service: Failed with result 'exit-code'.
Mar 26 12:45:16 qemu0 systemd[1]: kola-runext-72.service: Consumed 1.576s CPU time, 57.5M memory peak.

Looks like coreos/fedora-coreos-tracker#2095 (also openshift/os#1916 (comment))

@travier
Copy link
Copy Markdown
Member Author

travier commented Mar 27, 2026

/retest

@dustymabe
Copy link
Copy Markdown
Member

The ext.config.shared.selinux.file-context-policy-match is a red herring. Basically we know the problem and the fix is in flight. It only fails when running with other tests and so if it's the only test that fails in a run then the rerun will pass and we won't get a reported failure.

The problem here is that two tests failed:

 --- FAIL: rpmostree.install-uninstall (24.59s)
        harness.go:1935: mach.Start() failed: machine 9e0c55b3-b1cb-4fe6-a92a-cfb4f25b0660 entered emergency.target in initramfs

Copy link
Copy Markdown
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can not use the UID/GID that were used in the last "full" RHCOS image (4.18) as those are now used for dnsmasq (see [1]).

I think that's fine, but just the fact that it can happen is... 😢

We really need to get away from this setup.

@dustymabe
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dustymabe, jlebon, travier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [dustymabe,jlebon,travier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dustymabe
Copy link
Copy Markdown
Member

/jira refresh

1 similar comment
@dustymabe
Copy link
Copy Markdown
Member

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

@dustymabe: This pull request references Jira Issue OCPBUGS-64841, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dustymabe dustymabe merged commit 49b67de into coreos:main Mar 28, 2026
12 of 13 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@travier: Jira Issue OCPBUGS-64841: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-64841 has not been moved to the MODIFIED state.

Details

In response to this:

group: Add openvswitch to hugetlbfs group

The openvswitch user and group have been part of the passwd & group
files for, at least, as long as we've published RHCOS sources publicly:

We did not remove them when we re-visited our fixed UIDs/GID in the
split between the RHEL boot image and the new OCP node image ([1], [2] &
[3]). Thus they are now part of the base RHEL boot image, even though
the openvswitch package is not included there.

Although technically unnecessary, this is fine and simplify things a bit
as we do not have to update the user & group entries during the node
image build, which is currently a problematic topic (see [4]).

Thus instead of adding openvswitch to hugetlbfs group in the node image
build, we add it here directly to simplify the logic.

[1] openshift/os#1661
[2] #29
[3] #31
[4] openshift/os#1917


passwd & group: Add containers user & group

Adding users and groups during a container image layered build is
currently non-ergonomic with bootable containers. Thus instead of doing
that in openshift/os for the node layer, we directly include the user &
group here, which also guarentees us that the UID/GID remain stable.

See openshift/os#1917 for the original version
of this change and the full details about what makes adding user/group
in the node layer non-ergonomic.

Unfortunately we can not use the UID/GID that were used in the last
"full" RHCOS image (4.18) as those are now used for dnsmasq (see [1]).
Thus use the first UID & GID available for both user and group, going
downward.

[1] openshift/os#1917 (comment)

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-64841

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@travier travier deleted the main-fix-containers-user-group branch April 7, 2026 08:12
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/os that referenced this pull request Apr 10, 2026
We are moving the group inclusion directly to the RHEL base image
instead of working around it here in the OCP node layer.

See: openshift#1917
See: coreos/rhel-coreos-config#224
See: https://redhat.atlassian.net/browse/OCPBUGS-64841
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/os that referenced this pull request Apr 10, 2026
We are moving the group inclusion directly to the RHEL base image
instead of working around it here in the OCP node layer.

See: openshift#1917
See: coreos/rhel-coreos-config#224
See: https://redhat.atlassian.net/browse/OCPBUGS-64841
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/os that referenced this pull request Apr 10, 2026
We are moving the group inclusion directly to the RHEL base image
instead of working around it here in the OCP node layer.

See: openshift#1917
See: coreos/rhel-coreos-config#224
See: https://redhat.atlassian.net/browse/OCPBUGS-64841
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

Fix included in release 4.22.0-0.nightly-2026-04-11-163821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants