feat: 1st class OpenTelemetry support by cpsievert · Pull Request #310 · posit-dev/chatlas

cpsievert · 2026-05-12T20:45:23Z

Summary

Chatlas now emits OpenTelemetry spans that capture the full structure of multi-turn conversations and tool execution — without requiring any provider-specific instrumentor libraries. When a TracerProvider is configured, every chat()/stream() call automatically produces a 3-level span hierarchy:

invoke_agent                      # wraps the full chat loop
├── chat gpt-4o                   # each model API call
├── execute_tool get_weather      # each tool invocation
├── chat gpt-4o                   # follow-up model call
└── ...

Users opt in with pip install "chatlas[otel]" and a standard TracerProvider setup (console exporter, Logfire, or any OTLP-compatible backend). The approach is consistent with Shiny for Python's OTel story — same [otel] extra pattern, same recommended tools, same config-module pattern.

Spans follow the GenAI semantic conventions and record token usage, response model/ID, and optionally full message content (gated by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT). These framework spans complement (not replace) provider-specific SDK instrumentors like opentelemetry-instrumentation-openai-v2.

New chatlas/_otel.py module with span lifecycle functions
Hooks into all 6 core Chat methods (sync + async for agent/chat/tool spans)
7 tests with VCR cassettes covering span hierarchy, token usage, content capture, tool errors, streaming, and no-op behavior
Updated docs/get-started/monitor.qmd with framework-level tracing docs

Test plan

pytest tests/test_otel.py — 7/7 passing
pyright chatlas/_otel.py — 0 errors
ruff check and ruff format — clean
Manual verification with a real exporter (Logfire or console) and live API key

Adds a new chatlas/_otel.py module that emits OpenTelemetry spans for the chat lifecycle: invoke_agent (top-level), chat (per model call), and execute_tool (per tool invocation). Spans follow the GenAI semantic conventions with attributes like gen_ai.usage.input_tokens, gen_ai.response.model, and optional message content capture controlled by the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var. Also adds an `otel` optional dependency extra (`pip install chatlas[otel]`).

Wires the _otel span functions into the six core Chat methods: _chat_impl/_chat_impl_async (agent spans), _submit_turns/ _submit_turns_async (chat spans), and _invoke_tool/ _invoke_tool_async (tool spans). Parent context is passed explicitly via _otel_parent to avoid async context hazards.

Adds 7 tests covering span hierarchy, token usage, content capture (on/off), tool error recording, streaming lifecycle, and no-op behavior. Includes VCR cassettes for replay without live API keys. Updates docs/get-started/monitor.qmd with a new framework-level tracing section (console quickstart, Logfire production path, config-module pattern) before the existing provider-specific content.

The OTel API is ~212KB with no heavy transitive deps, and its default ProxyTracer already no-ops when no SDK is configured. Making it a hard dep lets us drop the lazy initialization (cache_tracer/initialized/is_tracing guards) and always create spans, relying on the no-op machinery for zero overhead when nobody is collecting. Removes the `chatlas[otel]` extra — the API is now always available. Users still opt in to collection by installing opentelemetry-sdk and configuring a TracerProvider.

…amework-spans # Conflicts: # pyproject.toml

Activate chatlas's chat and execute_tool spans in the OTel context around the bounded provider call and tool invocation. This lets spans created by others nest under ours via ambient context: a provider HTTP instrumentor's span becomes a child of the chat span, and work done inside a tool becomes a child of its execute_tool span. Previously those started disconnected root traces. The agent span is deliberately left unactivated -- it brackets the whole streaming loop, and holding context active across a yield would leak it into the consumer's scope (matching ellmer's approach). Adds nesting tests for the provider-HTTP and tool-internal cases (sync + async).

The context activation also makes spans emitted inside a tool nest under its execute_tool span, not just provider HTTP spans under chat spans. Spell out both in the monitor guide.

The per-function 'from ._otel import ...' statements were load-bearing only while opentelemetry was an optional dependency; once it became a hard dep, they were vestigial. _otel.py imports no chatlas modules at top level, so there's no circular-import risk in importing it eagerly.

Provider failures now mark the chat and invoke_agent spans as errored with the GenAI error.type attribute and an exception event, mirroring how tool failures are already recorded. Previously a failed LLM call ended with UNSET status, indistinguishable from one that never finished (and streaming-iteration errors were missed entirely, since use_span's auto-recording only covered the bounded chat_perform call). Generalize record_tool_error to record_error (identical body, now used for chat, tool, and agent spans) and disable use_span's exception recording so record_error is the single source of error truth -- this also fixes a pre-existing double exception event on tool spans.

Type the OTel span parent honestly: _otel_parent is now Optional[Span] instead of Any throughout _chat.py, and start_chat_span/start_tool_span take Optional[Span] instead of Span (they legitimately receive None from the parallel path and direct tool-invocation tests). The parent context is now built only when a parent exists, so a None parent uses the ambient context (root if none active) rather than relying on set_span_in_context accepting None -- which its type signature disallows. Hoist _otel.py's function-body imports (orjson and the _content/_turn types used in isinstance checks) to module top level; neither _content nor _turn imports _otel, so there is no circular-import risk.

Generator and async-generator tools run their body during iteration, not when the tool function is first called. Since iteration happened outside the activate_span scope, any spans such tools emitted between yields did not nest under their execute_tool span. Wrap each next()/__anext__() step in activate_span so the generator body executes while the tool span is active. Use res.__anext__() rather than the anext() builtin to preserve Python 3.9 support (anext is 3.10+).

Add sync and async tests asserting that a span emitted inside a generator tool nests under its execute_tool span. These invoke the tool directly (no VCR needed) and fail against the pre-fix iteration that ran outside the activate_span scope, locking in the df4b0de behavior.

The system-turn/history split feeding start_chat_span was duplicated verbatim in _submit_turns and _submit_turns_async. Hoist it into a single helper.

- Fix the instrumentation scope name to co.posit (Posit's domain is posit.co), matching the reverse-DNS convention. - Only set gen_ai.usage.* token attributes when there are tokens to report, mirroring ellmer. - Keep structured (non-string) tool-result values structured via a json_safe helper, so they nest inside the gen_ai.*.messages JSON rather than being embedded as a double-encoded JSON string. - Add tests for the structured/unserializable tool-response paths and a CHANGELOG entry for the framework-level OpenTelemetry feature.

The system-turn/history split was duplicated in _submit_turns and _submit_turns_async only to feed start_chat_span. Move it into start_chat_span (an OTel-semconv concern) so both call sites just pass the full turn list, removing the duplication and the Chat helper.

Address PR review feedback: - Add a `chatlas[otel]` optional-dependency group (opentelemetry-sdk), so the documented install path actually works (the API is a hard dep; the SDK is what users need to configure an exporter). - Point the monitoring guide at `pip install "chatlas[otel]"` instead of installing opentelemetry-sdk directly. - Use monkeypatch.setattr for _otel.capture_content so the global flag is always restored, even if an assertion fails mid-test.

The OTel spec says instrumentation libraries SHOULD NOT set status to OK and SHOULD leave it UNSET unless there is an error. Setting OK also makes it final, which would mask a later error recorded on the same span. Drop the explicit OK on chat spans (errors still go through record_error) and add a regression test. Note this intentionally diverges from ellmer, which sets "ok".

For streaming, chat_perform only returns the response iterator; the actual network I/O happens while iterating it, which was outside the activate_span(chat_span) scope. So a provider HTTP instrumentor's per-chunk spans started a disconnected trace instead of nesting under the chat span. Activate the chat span around each next()/__anext__() (the bounded fetch) -- not across the yield, so context still can't leak to the consumer. Mirrors the generator-tool span fix; uses __anext__() to stay Python 3.9-compatible.

The streaming loop now activates the chat span around each chunk fetch (not just the initial request), so the old "must not run while the span is active" wording was inaccurate. Clarify that activation is scoped to the bounded provider calls and never spans a yield.

Make the OpenTelemetry guide approachable to newcomers while staying useful to experienced users: - Lead with the production-visibility problem and a short "what is a trace?" primer (span/trace), plus a note on the api-vs-sdk split. - Add a worked travel-assistant example and embed a real trace waterfall (docs/images/otel-trace.svg) captured from an actual run, showing the httpx and tool-internal spans nesting under chatlas's own spans. - Unify the lower-level instrumentation story into one ladder (httpx -> OpenLLMetry -> official per-provider libs) and connect it to the worked example, instead of an orphaned "combining" note far from it. - Document the GenAI semantic-convention attributes accurately and keep the content-capture env var documented in a single place. The figure is regenerated by scripts/gen_otel_trace_svg.py.

… example

…names

cpsievert added 8 commits May 12, 2026 15:39

Merge remote-tracking branch 'origin/main' into worktree-feat+otel-fr…

e63a92c

…amework-spans # Conflicts: # pyproject.toml

docs(otel): note that tool-internal spans nest under execute_tool

cdcff49

The context activation also makes spans emitted inside a tool nest under its execute_tool span, not just provider HTTP spans under chat spans. Spell out both in the monitor guide.

cpsievert requested a review from Copilot June 2, 2026 20:09

Copilot started reviewing on behalf of cpsievert June 2, 2026 20:09 View session

This comment was marked as resolved.

Sign in to view

cpsievert added 7 commits June 2, 2026 15:25

refactor(otel): extract _otel_start_chat_span helper

fee23d9

The system-turn/history split feeding start_chat_span was duplicated verbatim in _submit_turns and _submit_turns_async. Hoist it into a single helper.

cpsievert changed the title ~~Add framework-level OpenTelemetry tracing~~ feat: add 1st class OpenTelemetry tracing Jun 2, 2026

cpsievert requested a review from Copilot June 2, 2026 21:13

Copilot started reviewing on behalf of cpsievert June 2, 2026 21:13 View session

cpsievert changed the title ~~feat: add 1st class OpenTelemetry tracing~~ feat: add 1st class OpenTelemetry support Jun 2, 2026

This comment was marked as resolved.

Sign in to view

cpsievert added 5 commits June 2, 2026 16:24

fix(otel): trace early tool failures and lazy content capture

25a399a

cpsievert requested a review from Copilot June 2, 2026 22:28

cpsievert changed the title ~~feat: add 1st class OpenTelemetry support~~ feat: 1st class OpenTelemetry support Jun 2, 2026

Copilot started reviewing on behalf of cpsievert June 2, 2026 22:28 View session

cpsievert added 2 commits June 2, 2026 17:30

docs(otel): lead the changelog note with user-facing value

4569c6e

test(otel): refresh VCR recordings

ecab24c

This comment was marked as resolved.

Sign in to view

cpsievert added 2 commits June 2, 2026 17:48

docs(otel): add environment-based exporter config, with Connect as an…

c9b6de1

… example

docs(otel): make trace SVG generator resilient to HTTP semconv key re…

6d19840

…names

cpsievert merged commit 2f66dc0 into main Jun 2, 2026
9 checks passed

cpsievert deleted the worktree-feat+otel-framework-spans branch June 3, 2026 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 1st class OpenTelemetry support#310

feat: 1st class OpenTelemetry support#310
cpsievert merged 25 commits into
mainfrom
worktree-feat+otel-framework-spans

cpsievert commented May 12, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cpsievert commented May 12, 2026

Summary

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants