Skip to content

LCORE-1872: Fix llama-stack container startup issues#1800

Merged
tisnik merged 2 commits into
lightspeed-core:mainfrom
anik120:fix-llama-stack-container-start-issue
May 27, 2026
Merged

LCORE-1872: Fix llama-stack container startup issues#1800
tisnik merged 2 commits into
lightspeed-core:mainfrom
anik120:fix-llama-stack-container-start-issue

Conversation

@anik120
Copy link
Copy Markdown
Contributor

@anik120 anik120 commented May 26, 2026

Description

Fixes two issues preventing llama-stack container from starting:

  1. SQLite compatibility error - Switch base image from ubi9/ubi-minimal to ubi9/python-312. The minimal image ships with SQLite < 3.30.0 which lacks sqlite3_deserialize support required by Python 3.12's _sqlite3 module. The python-312 base image includes a compatible SQLite version and properly configured Python dependencies.

  2. Read-only filesystem error - Config enrichment script now writes to /tmp/enriched-run.yaml instead of /opt/app-root/run.yaml. The latter is mounted read-only from the host, causing OSError when the enrichment script attempts to write the enriched configuration.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • Chores
    • Updated container build to use Python 3.12 UBI base image for improved runtime consistency.
  • Chores
    • Adjusted entrypoint behavior to write enriched configuration to a different runtime location.
  • Tests
    • Test manifests updated to run the enrichment step using the project's virtualenv Python interpreter.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Walkthrough

Rebase test container onto Red Hat UBI Python 3.12 with dnf-installed build tools, change entrypoint to write enriched config to /tmp/enriched-run.yaml, and update test/prow manifests to invoke the enrichment script using the virtualenv Python.

Changes

Infrastructure and Deployment Updates

Layer / File(s) Summary
Container base image and build tools
deploy/llama-stack/test.containerfile
Containerfile rebased to Red Hat UBI Python 3.12 image; build dependencies (git, tar, gcc, gcc-c++, make) installed via dnf install with dnf clean all, replacing prior microdnf-based Python/tool installation.
Enriched config output path
scripts/llama-stack-entrypoint.sh
ENRICHED_CONFIG updated to /tmp/enriched-run.yaml; enrichment block writes enriched config to this path.
Manifests: use venv Python for enrichment
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml, tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
Pod/container startup commands updated to run llama_stack_configuration.py via /opt/app-root/.venv/bin/python3 instead of python3 during Lightspeed config enrichment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • radofuchs
  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the two main fixes in the changeset: replacing the base image to resolve SQLite compatibility and updating the config path to fix the read-only filesystem error.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/llama-stack-entrypoint.sh`:
- Line 8: The ENRICHED_CONFIG path is inconsistent: the script sets
ENRICHED_CONFIG="/tmp/enriched-run.yaml" but the E2E manifests still hardcode
the old "/opt/app-root/run.yaml"; update the manifests to use
ENRICHED_CONFIG="/tmp/enriched-run.yaml" to match the script (or refactor to a
single shared source for the enrichment path) so the enrichment behavior cannot
diverge; look for the ENRICHED_CONFIG variable and any hardcoded
"/opt/app-root/run.yaml" occurrences in the llama-stack entrypoint and the E2E
manifest templates and make them use the same "/tmp/enriched-run.yaml" value (or
reference the centralized variable).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ab071ab-5175-4264-aa92-1c69444cb139

📥 Commits

Reviewing files that changed from the base of the PR and between 42844d0 and 9c1b725.

📒 Files selected for processing (2)
  • deploy/llama-stack/test.containerfile
  • scripts/llama-stack-entrypoint.sh
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-12T15:14:34.788Z
Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.

Applied to files:

  • scripts/llama-stack-entrypoint.sh
🔇 Additional comments (2)
deploy/llama-stack/test.containerfile (2)

1-2: Clarify SQLite compatibility claim for the ubi9/python-312 base image

Red Hat docs I found don’t state the bundled SQLite version or guarantee the sqlite3_deserialize capability, so the “>= 3.30.0 with sqlite3_deserialize support” PR objective needs an in-container check for the exact image tag used at deploy/llama-stack/test.containerfile (lines 1-2):

  • python -c "import sqlite3; print(sqlite3.sqlite_version)"
  • python -c "import sqlite3; print(hasattr(sqlite3.Connection, 'deserialize'))"

6-9: ⚡ Quick win

llslibdev dependencies likely won’t need cmake/cargo in deploy/llama-stack/test.containerfile (lines 6-9)
The llslibdev dependency group in pyproject.toml doesn’t directly include fastuuid/maturin/cargo, and uv.lock provides manylinux wheels for fastuuid (so uv sync --locked --group llslibdev should not require a Rust toolchain on typical Linux platforms). Only targets without matching wheels would fall back to the fastuuid sdist and need cargo/build tooling.

Comment thread scripts/llama-stack-entrypoint.sh
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik requested a review from radofuchs May 26, 2026 14:15
@anik120 anik120 force-pushed the fix-llama-stack-container-start-issue branch from 9c1b725 to da6a5cf Compare May 26, 2026 14:16
Fixes two issues preventing llama-stack container from starting:

1.SQLite compatibility error - Switch base image from ubi9/ubi-minimal to ubi9/python-312.
The minimal image ships with SQLite < 3.30.0 which lacks sqlite3_deserialize support required
by Python 3.12's _sqlite3 module. The python-312 base image includes a compatible SQLite version
and properly configured Python dependencies.

2.Read-only filesystem error - Config enrichment script now writes to /tmp/enriched-run.yaml
instead of /opt/app-root/run.yaml. The latter is mounted read-only from the host, causing OSError
when the enrichment script attempts to write the enriched configuration.

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
@anik120 anik120 force-pushed the fix-llama-stack-container-start-issue branch from da6a5cf to 3b50327 Compare May 26, 2026 14:41
The enrichment script was failing with `ModuleNotFoundError: No module
named 'yaml'` because it was invoked with system Python instead of
virtualenv Python where dependencies are installed.

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/llama-stack/test.containerfile`:
- Line 2: The Dockerfile uses the mutable tag
"registry.access.redhat.com/ubi9/python-312" in the FROM instruction; replace
that tag with the corresponding immutable digest by finding the correct sha256
for the image and updating the FROM line to use the digest form (e.g., FROM
registry.access.redhat.com/ubi9/python-312@sha256:<digest>), ensuring you pick
the exact digest that matches the desired image variant/architecture for
reproducible builds and supply-chain traceability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8b7f2dad-9e5e-4d8d-97a8-6475fba17b5c

📥 Commits

Reviewing files that changed from the base of the PR and between 9c1b725 and 5dd5008.

📒 Files selected for processing (4)
  • deploy/llama-stack/test.containerfile
  • scripts/llama-stack-entrypoint.sh
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-02-19T10:06:50.647Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.

Applied to files:

  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml
📚 Learning: 2026-05-12T15:14:34.788Z
Learnt from: syedriko
Repo: lightspeed-core/lightspeed-stack PR: 1727
File: scripts/konflux_requirements.sh:9-15
Timestamp: 2026-05-12T15:14:34.788Z
Learning: In this repo, the `.konflux/` directory is committed/tracked and is guaranteed to exist in a fresh clone. Therefore, shell scripts that write output under `.konflux/` (e.g., create files like `.konflux/<...>`) should not waste effort by calling `mkdir -p .konflux` first. Only add directory-creation logic if the script may run in an environment/repo state where `.konflux/` might not be present.

Applied to files:

  • scripts/llama-stack-entrypoint.sh
🪛 Trivy (0.69.3)
deploy/llama-stack/test.containerfile

[warning] 2-2: ':latest' tag used

Specify a tag in the 'FROM' statement for image 'registry.access.redhat.com/ubi9/python-312'

Rule: DS-0001

Learn more

(IaC/Dockerfile)


[info] 1-1: No HEALTHCHECK defined

Add HEALTHCHECK instruction in your Dockerfile

Rule: DS-0026

Learn more

(IaC/Dockerfile)

🔇 Additional comments (4)
deploy/llama-stack/test.containerfile (1)

7-9: LGTM!

scripts/llama-stack-entrypoint.sh (1)

8-8: LGTM!

Also applies to: 15-15

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml (1)

160-160: LGTM!

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-prow.yaml (1)

150-150: LGTM!

# Upstream llama-stack built from Red Hat UBI
FROM registry.access.redhat.com/ubi9/ubi-minimal
# Upstream llama-stack built from Red Hat UBI Python 3.12 image
FROM registry.access.redhat.com/ubi9/python-312
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What is the current sha256 digest for registry.access.redhat.com/ubi9/python-312, and does Red Hat recommend digest pinning for reproducible builds?

💡 Result:

The container image registry.access.redhat.com/ubi9/python-312 is a mutable tag that points to updated image versions over time [1][2]. Consequently, it does not have a single, static SHA256 digest. Because Red Hat frequently updates these images to address security vulnerabilities and bug fixes, the digest associated with the ubi9/python-312 tag changes whenever a new image is pushed to the registry [1][3]. To find the current digest for your specific environment, you can use the command: podman inspect registry.access.redhat.com/ubi9/python-312 --format '{{.Digest}}' Red Hat explicitly recommends digest pinning for scenarios requiring reproducible builds and robust supply chain security [4][2][5]. While tags are convenient for general development, they can change at any time, which introduces ambiguity and prevents reproducibility [2]. By referencing an image by its immutable SHA256 digest, you ensure that your builds consistently use the exact same image content, effectively eliminating discrepancies caused by tag updates [2][5]. Furthermore, for certain Red Hat ecosystem requirements, such as Operator bundle certification, using image digests instead of tags is mandatory [5].

Citations:


Pin the base image to an immutable digest (deploy/llama-stack/test.containerfile:2).

registry.access.redhat.com/ubi9/python-312 is a mutable tag, so the image contents/digest can change over time, hurting reproducibility and supply-chain traceability. Red Hat recommends digest pinning for reproducible builds and stronger supply-chain security; replace the tag with the appropriate sha256 digest.

Suggested change
-FROM registry.access.redhat.com/ubi9/python-312
+FROM registry.access.redhat.com/ubi9/python-312@sha256:<resolved_digest>
🧰 Tools
🪛 Trivy (0.69.3)

[warning] 2-2: ':latest' tag used

Specify a tag in the 'FROM' statement for image 'registry.access.redhat.com/ubi9/python-312'

Rule: DS-0001

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/llama-stack/test.containerfile` at line 2, The Dockerfile uses the
mutable tag "registry.access.redhat.com/ubi9/python-312" in the FROM
instruction; replace that tag with the corresponding immutable digest by finding
the correct sha256 for the image and updating the FROM line to use the digest
form (e.g., FROM registry.access.redhat.com/ubi9/python-312@sha256:<digest>),
ensuring you pick the exact digest that matches the desired image
variant/architecture for reproducible builds and supply-chain traceability.

Copy link
Copy Markdown
Contributor

@radofuchs radofuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 9700c70 into lightspeed-core:main May 27, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants