Fix OTEL telemetry collection for OpenShell backend#81
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughSecurity note: review OTEL binding and subprocess environment handling for CWE-922 (Insecure Storage of Sensitive Information) and CWE-78 / CWE-20 (OS Command Injection / Input Validation) risks. This PR extends OpenTelemetry support to the OpenShell backend by threading OTEL port and log path through the backend contract, making the collector bind address configurable, implementing sandbox-local OTEL log capture and download in OpenShellBackend, introducing a harness gateway host constant, and wiring OTEL initialization through the CLI. The integration allows the OpenShell sandbox to run an embedded OTEL collector, export telemetry locally, and return logs to the host for summary reporting. Podman backend signatures are updated for contract compliance, and the e2e test documentation is extended with OTEL verification procedures. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/agentic_ci/otel.py (1)
113-125:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop masking collector bind failures behind a timeout.
A bad
bind_addror bind conflict will make the child process exit before it writesotel-port. Right now that collapses into a generic 5s timeout becausestderris discarded and the loop never checks whether the subprocess already died. Fail fast onproc.poll()and surface stderr/return code here.Suggested fix
proc = subprocess.Popen( [sys.executable, "-m", "agentic_ci.otel"], env=env, - stderr=subprocess.DEVNULL, + stderr=subprocess.PIPE, + text=True, ) for _ in range(50): + if proc.poll() is not None: + stderr = proc.stderr.read().strip() if proc.stderr else "" + raise RuntimeError( + f"OTEL collector exited before startup: {stderr or proc.returncode}" + ) if os.path.exists(port_file): break time.sleep(0.1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/agentic_ci/otel.py` around lines 113 - 125, The current startup loop masks collector bind failures because stderr is discarded and the code never checks if the child process exited; update the Popen invocation that creates proc (the subprocess.Popen call running "agentic_ci.otel") to capture stderr (e.g., stderr=subprocess.PIPE) and then in the waiting loop that checks os.path.exists(port_file) poll proc with proc.poll(); if proc.poll() is not None (process exited) immediately read the captured stderr and raise a RuntimeError that includes proc.returncode and the decoded stderr to surface bind/exit errors; only kill the process and raise the generic timeout if the child is still alive after the wait.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/skills/test-e2e-openshell/SKILL.md:
- Line 338: The command `cd /tmp/e2e-workdir` references a directory that is
never created (Section E uses `/tmp/workdir-test`) and Section F omits
`--workdir`, so either remove the `cd /tmp/e2e-workdir` step, change it to `cd
/tmp/workdir-test` to match Section E, or explicitly create `/tmp/e2e-workdir`
and/or add `--workdir /tmp/e2e-workdir` to the command in Section F; update the
`cd /tmp/e2e-workdir` occurrence accordingly so the working directory used by
the skill matches the created path or is not required.
- Line 335: In SKILL.md Section F1 fix the failing directory change and harden
the external image input: ensure the script creates the expected working
directory before cd by adding mkdir -p /tmp/e2e-workdir (or remove the cd to
/tmp/e2e-workdir) so that the step using /tmp/e2e-workdir does not fail; also
replace the unpinned OPENSHELL_SUPERVISOR_IMAGE value
(OPENSHELL_SUPERVISOR_IMAGE=quay.io/mprpic/openshell-supervisor:pr1763) with a
digest-pinned reference and add a note requiring provenance/approval for the
Quay source for CI/E2E to prevent using an untrusted tag.
In `@src/agentic_ci/backends/openshell/__init__.py`:
- Around line 21-42: The OTEL collector start is fire-and-forget and failures
are suppressed: change the logic around _SANDBOX_OTEL_COLLECTOR and the sandbox
fetch path so we verify the collector actually started and surface errors when
otel_enabled is true; specifically, replace the background-shell-with-& pattern
by launching the collector process via a controllable mechanism (or run
foreground with subprocess and PID), poll/connect to
127.0.0.1:_SANDBOX_OTEL_PORT until it accepts connections (with a short timeout
loop) to confirm startup, and make _download_otel_log() propagate or log non-OK
exec/fetch/timeout errors instead of swallowing them so the caller (when
otel_enabled) can fail fast or emit a visible warning; ensure references to
_SANDBOX_OTEL_LOG and the sandbox exec/cat path are preserved while adding
explicit error handling and clear messages when startup or fetch fails.
In `@src/agentic_ci/cli.py`:
- Around line 97-100: The OTEL_RATE_FILE environment cleanup is currently
guarded by the otel_proc truthiness so setting otel_proc = None prevents the env
var from being cleared; update the cleanup block in the shutdown/cleanup
sequence (the code that calls otel.stop_collector and handles otel_proc and
otel_log) to always unset os.environ["OTEL_RATE_FILE"] (and remove it with
os.environ.pop if present) regardless of whether otel_proc is truthy, and ensure
otel_proc is still set to None after stopping the collector; reference
otel_proc, otel.stop_collector, otel_log and OTEL_RATE_FILE to locate and modify
the code.
---
Outside diff comments:
In `@src/agentic_ci/otel.py`:
- Around line 113-125: The current startup loop masks collector bind failures
because stderr is discarded and the code never checks if the child process
exited; update the Popen invocation that creates proc (the subprocess.Popen call
running "agentic_ci.otel") to capture stderr (e.g., stderr=subprocess.PIPE) and
then in the waiting loop that checks os.path.exists(port_file) poll proc with
proc.poll(); if proc.poll() is not None (process exited) immediately read the
captured stderr and raise a RuntimeError that includes proc.returncode and the
decoded stderr to surface bind/exit errors; only kill the process and raise the
generic timeout if the child is still alive after the wait.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5dd768fb-ba90-47b6-99e3-3873ecd074b1
📒 Files selected for processing (8)
.claude/skills/test-e2e-openshell/SKILL.mdsrc/agentic_ci/backend.pysrc/agentic_ci/backends/openshell/__init__.pysrc/agentic_ci/backends/podman.pysrc/agentic_ci/cli.pysrc/agentic_ci/harness.pysrc/agentic_ci/otel.pytests/test_harness.py
| podman exec \ | ||
| -e ANTHROPIC_VERTEX_PROJECT_ID=<your-project-id> \ | ||
| -e CLOUD_ML_REGION=global \ | ||
| -e OPENSHELL_SUPERVISOR_IMAGE=quay.io/mprpic/openshell-supervisor:pr1763 \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify if this supervisor image reference exists in other files or documentation
rg -n "openshell-supervisor:pr1763" --type mdRepository: opendatahub-io/agentic-ci
Length of output: 265
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".claude/skills/test-e2e-openshell/SKILL.md"
echo "== Context around /tmp/e2e-workdir in $file =="
sed -n '320,420p' "$file" | nl -ba | sed -n '1,120p'
echo
echo "== Search for /tmp/e2e-workdir creation/usage =="
rg -n "/tmp/e2e-workdir" "$file"
echo
echo "== Search for mkdir -p /tmp/e2e-workdir =="
rg -n "mkdir\s+-p\s+/tmp/e2e-workdir" "$file" || true
echo
echo "== Search for PR 1763 image refs =="
rg -n "openshell-supervisor:pr1763" -S --hidden .
echo
echo "== Inspect docs/backends/openshell.md around the PR merge note =="
target="docs/backends/openshell.md"
sed -n '1,120p' "$target" | nl -ba | sed -n '1,120p'
sed -n '180,260p' "$target" | nl -ba | sed -n '1,120p'Repository: opendatahub-io/agentic-ci
Length of output: 214
🌐 Web query:
quay.io mprpic openshell-supervisor pr1763
💡 Result:
The query "quay.io mprpic openshell-supervisor pr1763" appears to conflate several distinct projects and references within the NVIDIA OpenShell ecosystem [1][2]. There is no evidence of a pull request #1763 in the NVIDIA/OpenShell repository specifically authored by "mprpic" (Martin Prpič) related to an "openshell-supervisor" image on quay.io [1][3][4]. Here is the context regarding the components mentioned: 1. OpenShell Supervisor: NVIDIA's OpenShell project utilizes an openshell-supervisor component, which is a specialized OCI image used for sandboxing [1][5]. While the project maintains its own container images (e.g., hosted at github.com/NVIDIA/OpenShell/pkgs/container/openshell/supervisor), it is distinct from the Quay.io container registry service [1][6]. 2. Martin Prpič (mprpic): A software developer known for contributions to Red Hat, Opendatahub, and other open-source projects [3][4]. There is no record of him being a primary contributor to the NVIDIA OpenShell supervisor image infrastructure [1][3][4]. 3. Quay.io: This is a container image registry service provided by Red Hat, not a component of the NVIDIA OpenShell software [6][7]. It is possible that "PR 1763" refers to a recent, ongoing, or closed pull request within the NVIDIA/OpenShell repository, but it does not involve the entities or registry paths identified in your query [2]. Other related PRs in the OpenShell repository, such as #1767, have recently addressed Kubernetes and AppArmor configurations for the supervisor [2].
Citations:
- 1: feat: minimal changes to run OpenShell in an existing Kubernetes cluster NVIDIA/OpenShell#996
- 2: feat: Kubernetes support on AppArmor-enabled host nodes NVIDIA/OpenShell#1643
- 3: https://github.com/mprpic
- 4: https://github.com/mprpic?tab=repositories
- 5: NVIDIA/OpenShell@8a813ab
- 6: https://www.projectquay.io/
- 7: https://quay.io/
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SKILL=".claude/skills/test-e2e-openshell/SKILL.md"
BACKEND="docs/backends/openshell.md"
python3 - <<'PY'
from pathlib import Path
path = Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines = path.read_text().splitlines()
def show(start,end):
for i in range(start-1,end):
if 0 <= i < len(lines):
print(f"{i+1}:{lines[i]}")
print("== SKILL.md: lines 320-360 ==")
show(320,360)
print("\n== SKILL.md: lines containing /tmp/e2e-workdir / e2e-workdir ==")
for i,l in enumerate(lines, start=1):
if "/tmp/e2e-workdir" in l or "e2e-workdir" in l:
print(f"{i}:{l}")
print("\n== SKILL.md: lines containing openshell-supervisor:pr1763 ==")
for i,l in enumerate(lines, start=1):
if "openshell-supervisor:pr1763" in l:
print(f"{i}:{l}")
PY
python3 - <<'PY'
from pathlib import Path
path = Path("docs/backends/openshell.md")
lines = path.read_text().splitlines()
def show(start,end):
for i in range(start-1,end):
if 0 <= i < len(lines):
print(f"{i+1}:{lines[i]}")
print("== openshell.md: lines 40-90 (supervisor_image var) ==")
show(40,90)
print("\n== openshell.md: lines 200-240 (PR merge note / tag usage) ==")
show(200,240)
PY
echo
echo "== Quick grep for /tmp/e2e-workdir creation =="
rg -n "mkdir\s+-p\s+/tmp/e2e-workdir|/tmp/e2e-workdir.*(mkdir|create|rm|cleanup)" "$SKILL" || true
echo
echo "== Quick grep for PR tag mention =="
rg -n "pr1763|`#1763`|PR\s*#?1763" "$SKILL" "$BACKEND" || trueRepository: opendatahub-io/agentic-ci
Length of output: 6848
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SKILL=".claude/skills/test-e2e-openshell/SKILL.md"
python3 - <<'PY'
from pathlib import Path
path = Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines = path.read_text().splitlines()
def show(a,b):
for i in range(a-1,b):
if 0 <= i < len(lines):
print(f"{i+1}:{lines[i]}")
print("== SKILL.md: occurrences of workdir/workdir-test ==")
import re
for i,l in enumerate(lines, start=1):
if re.search(r'workdir', l, re.IGNORECASE):
print(f"{i}:{l}")
PY
echo
echo "== SKILL.md: around 'Before you start' (first 260 lines search) =="
python3 - <<'PY'
from pathlib import Path
path=Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines=path.read_text().splitlines()
for idx,l in enumerate(lines, start=1):
if "Before you start" in l:
start=max(1, idx-20); end=idx+60
for i in range(start-1,end):
if i < len(lines):
print(f"{i+1}:{lines[i]}")
break
else:
print("No 'Before you start' heading found in first scan.")
PY
echo
echo "== SKILL.md: around section E and F commands (270-380) =="
python3 - <<'PY'
from pathlib import Path
p=Path(".claude/skills/test-e2e-openshell/SKILL.md')
lines=p.read_text().splitlines()
for i in range(250-1, 390):
if 0<=i<len(lines):
print(f"{i+1}:{lines[i]}")
PYRepository: opendatahub-io/agentic-ci
Length of output: 4392
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SKILL=".claude/skills/test-e2e-openshell/SKILL.md"
python3 - <<'PY'
from pathlib import Path
path = Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines = path.read_text().splitlines()
import re
print("== SKILL.md: occurrences of workdir/workdir-test ==")
for i,l in enumerate(lines, start=1):
if re.search(r'workdir', l, re.IGNORECASE) or "/tmp/" in l and "workdir" in l:
print(f"{i}:{l}")
PY
echo
echo "== SKILL.md: around 'Before you start' heading =="
python3 - <<'PY'
from pathlib import Path
p=Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines=p.read_text().splitlines()
for idx,l in enumerate(lines, start=1):
if "Before you start" in l:
start=max(1, idx-20); end=idx+70
for i in range(start-1, min(end, len(lines))):
print(f"{i+1}:{lines[i]}")
break
else:
print("No 'Before you start' heading found.")
PY
echo
echo "== SKILL.md: around section E and F commands (250-380) =="
python3 - <<'PY'
from pathlib import Path
p=Path(".claude/skills/test-e2e-openshell/SKILL.md")
lines=p.read_text().splitlines()
for i in range(250-1, min(390, len(lines))):
print(f"{i+1}:{lines[i]}")
PYRepository: opendatahub-io/agentic-ci
Length of output: 8395
Harden OPENSHELL_SUPERVISOR_IMAGE supply-chain input + fix invalid cd in Section F
.claude/skills/test-e2e-openshell/SKILL.md(Section F1):cd /tmp/e2e-workdiris used, but the skill only creates/tmp/workdir-test; addmkdir -p /tmp/e2e-workdir(or remove thecd) to prevent runtime failure (CWE-703).- The
quay.io/mprpic/openshell-supervisor:pr1763tag is intentional perdocs/backends/openshell.md(it’s tied to NVIDIA/OpenShell PR#1763for the GCE metadata emulator), but it’s still an unpinned external OCI tag (CWE-494, CWE-829). Pin the image by digest and require provenance/approval for the Quay source for CI/E2E.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/test-e2e-openshell/SKILL.md at line 335, In SKILL.md Section
F1 fix the failing directory change and harden the external image input: ensure
the script creates the expected working directory before cd by adding mkdir -p
/tmp/e2e-workdir (or remove the cd to /tmp/e2e-workdir) so that the step using
/tmp/e2e-workdir does not fail; also replace the unpinned
OPENSHELL_SUPERVISOR_IMAGE value
(OPENSHELL_SUPERVISOR_IMAGE=quay.io/mprpic/openshell-supervisor:pr1763) with a
digest-pinned reference and add a note requiring provenance/approval for the
Quay source for CI/E2E to prevent using an untrusted tag.
| -e OPENSHELL_SUPERVISOR_IMAGE=quay.io/mprpic/openshell-supervisor:pr1763 \ | ||
| -e SANDBOX_IMAGE="$CLAUDE_SANDBOX_IMAGE" \ | ||
| openshell-e2e bash -c ' | ||
| cd /tmp/e2e-workdir && \ |
There was a problem hiding this comment.
Command references uncreated directory.
cd /tmp/e2e-workdir attempts to change to a directory that is never created in the skill instructions. Section E uses /tmp/workdir-test (different path). Since Section F doesn't specify --workdir, this cd is likely unnecessary and will cause the command to fail.
🐛 Proposed fix
openshell-e2e bash -c '
- cd /tmp/e2e-workdir && \
agentic-ci run \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd /tmp/e2e-workdir && \ | |
| openshell-e2e bash -c ' | |
| agentic-ci run \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/test-e2e-openshell/SKILL.md at line 338, The command `cd
/tmp/e2e-workdir` references a directory that is never created (Section E uses
`/tmp/workdir-test`) and Section F omits `--workdir`, so either remove the `cd
/tmp/e2e-workdir` step, change it to `cd /tmp/workdir-test` to match Section E,
or explicitly create `/tmp/e2e-workdir` and/or add `--workdir /tmp/e2e-workdir`
to the command in Section F; update the `cd /tmp/e2e-workdir` occurrence
accordingly so the working directory used by the skill matches the created path
or is not required.
| _SANDBOX_OTEL_COLLECTOR = r""" | ||
| python3 -c ' | ||
| import json,os | ||
| from http.server import HTTPServer,BaseHTTPRequestHandler | ||
| from datetime import datetime,timezone | ||
| class H(BaseHTTPRequestHandler): | ||
| def do_POST(self): | ||
| l=int(self.headers.get("Content-Length",0)) | ||
| b=self.rfile.read(l) if l else b"" | ||
| try: p=json.loads(b) if b else {} | ||
| except json.JSONDecodeError: p={"raw":b.decode("utf-8",errors="replace")} | ||
| r={"ts":datetime.now(timezone.utc).isoformat(),"path":self.path,"payload":p} | ||
| with open("OTEL_LOG","a") as f: f.write(json.dumps(r)+"\n") | ||
| self.send_response(200) | ||
| self.send_header("Content-Type","application/json") | ||
| self.end_headers() | ||
| self.wfile.write(b'"'"'{"partialSuccess":{}}'"'"') | ||
| def log_message(self,*a): pass | ||
| HTTPServer(("127.0.0.1",OTEL_PORT),H).serve_forever() | ||
| ' & | ||
| sleep 1 | ||
| """.replace("OTEL_LOG", _SANDBOX_OTEL_LOG).replace("OTEL_PORT", str(_SANDBOX_OTEL_PORT)) |
There was a problem hiding this comment.
OTEL capture fails silently when the sandbox collector or fetch step breaks.
The collector is started with fire-and-forget & plus a fixed sleep 1, and _download_otel_log() suppresses timeout/exec failures. If python3 is missing, 127.0.0.1:4318 is unavailable, or openshell sandbox exec ... cat fails, the command path still returns success and telemetry is just dropped. When otel_enabled is true, verify the collector actually came up and emit a hard failure or at least a visible warning on fetch failure.
Also applies to: 183-205
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/agentic_ci/backends/openshell/__init__.py` around lines 21 - 42, The OTEL
collector start is fire-and-forget and failures are suppressed: change the logic
around _SANDBOX_OTEL_COLLECTOR and the sandbox fetch path so we verify the
collector actually started and surface errors when otel_enabled is true;
specifically, replace the background-shell-with-& pattern by launching the
collector process via a controllable mechanism (or run foreground with
subprocess and PID), poll/connect to 127.0.0.1:_SANDBOX_OTEL_PORT until it
accepts connections (with a short timeout loop) to confirm startup, and make
_download_otel_log() propagate or log non-OK exec/fetch/timeout errors instead
of swallowing them so the caller (when otel_enabled) can fail fast or emit a
visible warning; ensure references to _SANDBOX_OTEL_LOG and the sandbox exec/cat
path are preserved while adding explicit error handling and clear messages when
startup or fetch fails.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/agentic_ci/cli.py (1)
135-135:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix incomplete OTEL_RATE_FILE cleanup path (CWE-459).
Line 96 sets
otel_proc = Noneon success, so Line 135 never runs andOTEL_RATE_FILEremains set. Move env cleanup outside theif otel_procguard.♻️ Proposed fix
finally: if otel_proc: otel.stop_collector(otel_proc) - os.environ.pop("OTEL_RATE_FILE", None) + os.environ.pop("OTEL_RATE_FILE", None) if not args.keep: backend.stop()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/agentic_ci/cli.py` at line 135, The cleanup for OTEL_RATE_FILE is currently guarded by the truthiness of otel_proc so when otel_proc is set to None (e.g., success path) the env var is left behind; move the os.environ.pop("OTEL_RATE_FILE", None) call out of the if otel_proc guard so it always runs (for example, place it after the surrounding shutdown/cleanup block or in a finally) ensuring OTEL_RATE_FILE is removed regardless of otel_proc's value; reference the otel_proc variable and the literal "OTEL_RATE_FILE" when locating where to relocate the pop call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/agentic_ci/cli.py`:
- Around line 74-77: The OTEL collector is being bound to 0.0.0.0 when
args.backend == "openshell" (bind_addr used in the otel.start_collector call),
which exposes the OTLP receiver; change this so that for "openshell" you either
(a) resolve and use the gateway/interface IP instead of "0.0.0.0" (e.g., obtain
the host/gateway address at startup and assign it to bind_addr before calling
otel.start_collector(run_dir, bind_addr=bind_addr)) or (b) implement token-based
authentication/authorization inside otel.start_collector (or the OTLP HTTP
handler it registers) and require a secret token for incoming telemetry; update
the logic around bind_addr and otel.start_collector accordingly so the collector
is not publicly exposed without auth.
---
Duplicate comments:
In `@src/agentic_ci/cli.py`:
- Line 135: The cleanup for OTEL_RATE_FILE is currently guarded by the
truthiness of otel_proc so when otel_proc is set to None (e.g., success path)
the env var is left behind; move the os.environ.pop("OTEL_RATE_FILE", None) call
out of the if otel_proc guard so it always runs (for example, place it after the
surrounding shutdown/cleanup block or in a finally) ensuring OTEL_RATE_FILE is
removed regardless of otel_proc's value; reference the otel_proc variable and
the literal "OTEL_RATE_FILE" when locating where to relocate the pop call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 84a6c9bf-e1c2-472e-9ad5-ef15c7777771
📒 Files selected for processing (9)
.claude/skills/test-e2e-openshell/SKILL.mdsrc/agentic_ci/backend.pysrc/agentic_ci/backends/openshell/__init__.pysrc/agentic_ci/backends/openshell/sandbox.pysrc/agentic_ci/backends/podman.pysrc/agentic_ci/cli.pysrc/agentic_ci/harness.pysrc/agentic_ci/otel.pytests/test_harness.py
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/test-e2e-openshell/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (3)
- src/agentic_ci/otel.py
- tests/test_harness.py
- src/agentic_ci/harness.py
The OTEL collector binds to 0.0.0.0 on the host but was unreachable from inside the sandbox because the harness pointed the OTEL endpoint at 10.200.0.1 (the gateway bridge IP), which doesn't route to the host network stack. Use host.openshell.internal instead, which resolves to the host inside the sandbox (added in OpenShell PR #1279). The sandbox policy now includes host.openshell.internal:<port> so the gateway proxy forwards OTEL exports to the host-side collector. The OpenShell backend handles OTEL env vars directly (instead of delegating to the harness) because it needs the OpenShell-specific hostname and a shorter export interval (5s vs 10s) to capture metrics from short-lived runs. Also adds Section F to the OpenShell e2e test skill for verifying OTEL collection works end-to-end. Signed-off-by: Emilien Macchi <emacchi@redhat.com> Co-Authored-By: Claude <noreply@anthropic.com>
Summary
10.200.0.1(gateway bridge IP), which doesn't route to the host network stackhost.openshell.internalinstead, which resolves to the host inside the sandbox (OpenShell PR #1279)host.openshell.internal:<port>to the sandbox network policy so the gateway proxy forwards OTEL exports to the host-side collectorTest plan
tox -e py313— 501 tests passtox -e lint— cleantox -e check-format— cleantox -e typecheck— cleanhost.openshell.internalresolves correctly and HTTP traffic reaches the host collector through the gateway proxy🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests