-
Notifications
You must be signed in to change notification settings - Fork 5
Fix OTEL telemetry collection for OpenShell backend #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -143,7 +143,7 @@ Verify: | |||||||
| - Output shows `Created sandbox: ci` | ||||||||
| - Output shows `Running Claude Code (claude-haiku-4-5) via openshell backend` | ||||||||
| - Claude's response contains `A1_OK` | ||||||||
| - Token metrics show non-zero counts, cost around `$0.04` | ||||||||
| - Token metrics show non-zero counts and cost is non-zero (e.g. `$0.04`) | ||||||||
| - `Agent exit code: 0` | ||||||||
| - `Sandbox deleted` and `Gateway stopped` at the end | ||||||||
|
|
||||||||
|
|
@@ -174,7 +174,7 @@ Verify: | |||||||
| - Output shows `Auth: API key` | ||||||||
| - Output shows `Creating Anthropic API key provider` | ||||||||
| - Claude's response contains `B1_OK` | ||||||||
| - Token metrics show non-zero counts and cost | ||||||||
| - Token metrics show non-zero counts and cost is non-zero | ||||||||
| - `Agent exit code: 0` | ||||||||
|
|
||||||||
| --- | ||||||||
|
|
@@ -311,6 +311,53 @@ podman exec openshell-e2e rm -rf /tmp/workdir-test | |||||||
|
|
||||||||
| --- | ||||||||
|
|
||||||||
| ## Section F: OTEL telemetry collection | ||||||||
|
|
||||||||
| Verifies that the sandbox-local OTEL collector receives metrics from the | ||||||||
| agent and prints a token/cost summary. Uses Vertex AI auth and Claude Code | ||||||||
| (the only harness that supports OTEL). | ||||||||
|
|
||||||||
| The OpenShell sandbox network isolation prevents reaching an external OTEL | ||||||||
| collector, so agentic-ci embeds a lightweight OTLP receiver inside the | ||||||||
| sandbox on localhost. After the run, the OTEL log is downloaded from the | ||||||||
| sandbox and the summary is printed on the host. | ||||||||
|
|
||||||||
| Requires `OPENSHELL_SUPERVISOR_IMAGE` (see "Before you start"). | ||||||||
|
|
||||||||
| Run cleanup first. | ||||||||
|
|
||||||||
| ### F1. Run with OTEL enabled | ||||||||
|
|
||||||||
| ```bash | ||||||||
| podman exec \ | ||||||||
| -e ANTHROPIC_VERTEX_PROJECT_ID=<your-project-id> \ | ||||||||
| -e CLOUD_ML_REGION=global \ | ||||||||
| -e OPENSHELL_SUPERVISOR_IMAGE=quay.io/mprpic/openshell-supervisor:pr1763 \ | ||||||||
| -e SANDBOX_IMAGE="$CLAUDE_SANDBOX_IMAGE" \ | ||||||||
| openshell-e2e bash -c ' | ||||||||
| cd /tmp/e2e-workdir && \ | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Command references uncreated directory.
🐛 Proposed fix openshell-e2e bash -c '
- cd /tmp/e2e-workdir && \
agentic-ci run \📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||
| agentic-ci run \ | ||||||||
| --backend openshell \ | ||||||||
| --harness claude-code \ | ||||||||
| --image "$SANDBOX_IMAGE" \ | ||||||||
| --model claude-haiku-4-5 \ | ||||||||
| "Respond with exactly: F1_OK" | ||||||||
| ' | ||||||||
| ``` | ||||||||
|
|
||||||||
| Note: no `--no-otel` flag. | ||||||||
|
|
||||||||
| Verify: | ||||||||
| - Output shows `Running Claude Code (claude-haiku-4-5) via openshell backend` | ||||||||
| - Agent runs and completes with `F1_OK` in the response | ||||||||
| - Output shows `Token/Cost Summary (OpenTelemetry)` section | ||||||||
| - Token counts are non-zero (input tokens, output tokens, cache) | ||||||||
| - Cost is non-zero (e.g. `$0.04`) | ||||||||
| - `Agent exit code: 0` | ||||||||
| - `Sandbox deleted` and `Gateway stopped` at the end | ||||||||
|
|
||||||||
| --- | ||||||||
|
|
||||||||
| ## Final cleanup | ||||||||
|
|
||||||||
| ```bash | ||||||||
|
|
@@ -319,7 +366,7 @@ podman rm -f openshell-e2e | |||||||
|
|
||||||||
| ## Running the full suite | ||||||||
|
|
||||||||
| Execute sections in order (A through E), running the cleanup step before each | ||||||||
| Execute sections in order (A through F), running the cleanup step before each | ||||||||
| section. Skip sections whose prerequisites are not met. If any step fails, | ||||||||
| check the gateway log inside the container: | ||||||||
|
|
||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: opendatahub-io/agentic-ci
Length of output: 265
🏁 Script executed:
Repository: opendatahub-io/agentic-ci
Length of output: 214
🌐 Web query:
quay.io mprpic openshell-supervisor pr1763💡 Result:
The query "quay.io mprpic openshell-supervisor pr1763" appears to conflate several distinct projects and references within the NVIDIA OpenShell ecosystem [1][2]. There is no evidence of a pull request
#1763in the NVIDIA/OpenShell repository specifically authored by "mprpic" (Martin Prpič) related to an "openshell-supervisor" image on quay.io [1][3][4]. Here is the context regarding the components mentioned: 1. OpenShell Supervisor: NVIDIA's OpenShell project utilizes anopenshell-supervisorcomponent, which is a specialized OCI image used for sandboxing [1][5]. While the project maintains its own container images (e.g., hosted atgithub.com/NVIDIA/OpenShell/pkgs/container/openshell/supervisor), it is distinct from the Quay.io container registry service [1][6]. 2. Martin Prpič (mprpic): A software developer known for contributions to Red Hat, Opendatahub, and other open-source projects [3][4]. There is no record of him being a primary contributor to the NVIDIA OpenShell supervisor image infrastructure [1][3][4]. 3. Quay.io: This is a container image registry service provided by Red Hat, not a component of the NVIDIA OpenShell software [6][7]. It is possible that "PR 1763" refers to a recent, ongoing, or closed pull request within the NVIDIA/OpenShell repository, but it does not involve the entities or registry paths identified in your query [2]. Other related PRs in the OpenShell repository, such as#1767, have recently addressed Kubernetes and AppArmor configurations for the supervisor [2].Citations:
🏁 Script executed:
Repository: opendatahub-io/agentic-ci
Length of output: 6848
🏁 Script executed:
Repository: opendatahub-io/agentic-ci
Length of output: 4392
🏁 Script executed:
Repository: opendatahub-io/agentic-ci
Length of output: 8395
Harden
OPENSHELL_SUPERVISOR_IMAGEsupply-chain input + fix invalidcdin Section F.claude/skills/test-e2e-openshell/SKILL.md(Section F1):cd /tmp/e2e-workdiris used, but the skill only creates/tmp/workdir-test; addmkdir -p /tmp/e2e-workdir(or remove thecd) to prevent runtime failure (CWE-703).quay.io/mprpic/openshell-supervisor:pr1763tag is intentional perdocs/backends/openshell.md(it’s tied to NVIDIA/OpenShell PR#1763for the GCE metadata emulator), but it’s still an unpinned external OCI tag (CWE-494, CWE-829). Pin the image by digest and require provenance/approval for the Quay source for CI/E2E.🤖 Prompt for AI Agents