Skip to content

feat: 1st class OpenTelemetry support#310

Merged
cpsievert merged 25 commits into
mainfrom
worktree-feat+otel-framework-spans
Jun 2, 2026
Merged

feat: 1st class OpenTelemetry support#310
cpsievert merged 25 commits into
mainfrom
worktree-feat+otel-framework-spans

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

Summary

Chatlas now emits OpenTelemetry spans that capture the full structure of multi-turn conversations and tool execution — without requiring any provider-specific instrumentor libraries. When a TracerProvider is configured, every chat()/stream() call automatically produces a 3-level span hierarchy:

invoke_agent                      # wraps the full chat loop
├── chat gpt-4o                   # each model API call
├── execute_tool get_weather      # each tool invocation
├── chat gpt-4o                   # follow-up model call
└── ...

Users opt in with pip install "chatlas[otel]" and a standard TracerProvider setup (console exporter, Logfire, or any OTLP-compatible backend). The approach is consistent with Shiny for Python's OTel story — same [otel] extra pattern, same recommended tools, same config-module pattern.

Spans follow the GenAI semantic conventions and record token usage, response model/ID, and optionally full message content (gated by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT). These framework spans complement (not replace) provider-specific SDK instrumentors like opentelemetry-instrumentation-openai-v2.

  • New chatlas/_otel.py module with span lifecycle functions
  • Hooks into all 6 core Chat methods (sync + async for agent/chat/tool spans)
  • 7 tests with VCR cassettes covering span hierarchy, token usage, content capture, tool errors, streaming, and no-op behavior
  • Updated docs/get-started/monitor.qmd with framework-level tracing docs

Test plan

  • pytest tests/test_otel.py — 7/7 passing
  • pyright chatlas/_otel.py — 0 errors
  • ruff check and ruff format — clean
  • Manual verification with a real exporter (Logfire or console) and live API key

cpsievert added 8 commits May 12, 2026 15:39
Adds a new chatlas/_otel.py module that emits OpenTelemetry spans for
the chat lifecycle: invoke_agent (top-level), chat (per model call),
and execute_tool (per tool invocation). Spans follow the GenAI semantic
conventions with attributes like gen_ai.usage.input_tokens,
gen_ai.response.model, and optional message content capture controlled
by the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT env var.

Also adds an `otel` optional dependency extra (`pip install chatlas[otel]`).
Wires the _otel span functions into the six core Chat methods:
_chat_impl/_chat_impl_async (agent spans), _submit_turns/
_submit_turns_async (chat spans), and _invoke_tool/
_invoke_tool_async (tool spans). Parent context is passed
explicitly via _otel_parent to avoid async context hazards.
Adds 7 tests covering span hierarchy, token usage, content capture
(on/off), tool error recording, streaming lifecycle, and no-op
behavior. Includes VCR cassettes for replay without live API keys.

Updates docs/get-started/monitor.qmd with a new framework-level
tracing section (console quickstart, Logfire production path,
config-module pattern) before the existing provider-specific content.
The OTel API is ~212KB with no heavy transitive deps, and its
default ProxyTracer already no-ops when no SDK is configured.
Making it a hard dep lets us drop the lazy initialization
(cache_tracer/initialized/is_tracing guards) and always create
spans, relying on the no-op machinery for zero overhead when
nobody is collecting.

Removes the `chatlas[otel]` extra — the API is now always available.
Users still opt in to collection by installing opentelemetry-sdk
and configuring a TracerProvider.
Activate chatlas's chat and execute_tool spans in the OTel context around
the bounded provider call and tool invocation. This lets spans created by
others nest under ours via ambient context: a provider HTTP instrumentor's
span becomes a child of the chat span, and work done inside a tool becomes a
child of its execute_tool span. Previously those started disconnected root
traces.

The agent span is deliberately left unactivated -- it brackets the whole
streaming loop, and holding context active across a yield would leak it into
the consumer's scope (matching ellmer's approach).

Adds nesting tests for the provider-HTTP and tool-internal cases (sync + async).
The context activation also makes spans emitted inside a tool nest under its
execute_tool span, not just provider HTTP spans under chat spans. Spell out
both in the monitor guide.
The per-function 'from ._otel import ...' statements were load-bearing only
while opentelemetry was an optional dependency; once it became a hard dep,
they were vestigial. _otel.py imports no chatlas modules at top level, so
there's no circular-import risk in importing it eagerly.

This comment was marked as resolved.

cpsievert added 7 commits June 2, 2026 15:25
Provider failures now mark the chat and invoke_agent spans as errored
with the GenAI error.type attribute and an exception event, mirroring
how tool failures are already recorded. Previously a failed LLM call
ended with UNSET status, indistinguishable from one that never finished
(and streaming-iteration errors were missed entirely, since use_span's
auto-recording only covered the bounded chat_perform call).

Generalize record_tool_error to record_error (identical body, now used
for chat, tool, and agent spans) and disable use_span's exception
recording so record_error is the single source of error truth -- this
also fixes a pre-existing double exception event on tool spans.
Type the OTel span parent honestly: _otel_parent is now Optional[Span]
instead of Any throughout _chat.py, and start_chat_span/start_tool_span
take Optional[Span] instead of Span (they legitimately receive None from
the parallel path and direct tool-invocation tests). The parent context
is now built only when a parent exists, so a None parent uses the ambient
context (root if none active) rather than relying on set_span_in_context
accepting None -- which its type signature disallows.

Hoist _otel.py's function-body imports (orjson and the _content/_turn
types used in isinstance checks) to module top level; neither _content
nor _turn imports _otel, so there is no circular-import risk.
Generator and async-generator tools run their body during iteration, not
when the tool function is first called. Since iteration happened outside
the activate_span scope, any spans such tools emitted between yields did
not nest under their execute_tool span. Wrap each next()/__anext__() step
in activate_span so the generator body executes while the tool span is
active.

Use res.__anext__() rather than the anext() builtin to preserve Python
3.9 support (anext is 3.10+).
Add sync and async tests asserting that a span emitted inside a
generator tool nests under its execute_tool span. These invoke the tool
directly (no VCR needed) and fail against the pre-fix iteration that ran
outside the activate_span scope, locking in the df4b0de behavior.
The system-turn/history split feeding start_chat_span was duplicated
verbatim in _submit_turns and _submit_turns_async. Hoist it into a single
helper.
- Fix the instrumentation scope name to co.posit (Posit's domain is
  posit.co), matching the reverse-DNS convention.
- Only set gen_ai.usage.* token attributes when there are tokens to
  report, mirroring ellmer.
- Keep structured (non-string) tool-result values structured via a
  json_safe helper, so they nest inside the gen_ai.*.messages JSON rather
  than being embedded as a double-encoded JSON string.
- Add tests for the structured/unserializable tool-response paths and a
  CHANGELOG entry for the framework-level OpenTelemetry feature.
The system-turn/history split was duplicated in _submit_turns and
_submit_turns_async only to feed start_chat_span. Move it into
start_chat_span (an OTel-semconv concern) so both call sites just pass
the full turn list, removing the duplication and the Chat helper.
@cpsievert cpsievert changed the title Add framework-level OpenTelemetry tracing feat: add 1st class OpenTelemetry tracing Jun 2, 2026
Address PR review feedback:
- Add a `chatlas[otel]` optional-dependency group (opentelemetry-sdk), so
  the documented install path actually works (the API is a hard dep; the
  SDK is what users need to configure an exporter).
- Point the monitoring guide at `pip install "chatlas[otel]"` instead of
  installing opentelemetry-sdk directly.
- Use monkeypatch.setattr for _otel.capture_content so the global flag is
  always restored, even if an assertion fails mid-test.
@cpsievert cpsievert requested a review from Copilot June 2, 2026 21:13
@cpsievert cpsievert changed the title feat: add 1st class OpenTelemetry tracing feat: add 1st class OpenTelemetry support Jun 2, 2026

This comment was marked as resolved.

cpsievert added 5 commits June 2, 2026 16:24
The OTel spec says instrumentation libraries SHOULD NOT set status to OK
and SHOULD leave it UNSET unless there is an error. Setting OK also makes
it final, which would mask a later error recorded on the same span. Drop
the explicit OK on chat spans (errors still go through record_error) and
add a regression test. Note this intentionally diverges from ellmer,
which sets "ok".
For streaming, chat_perform only returns the response iterator; the
actual network I/O happens while iterating it, which was outside the
activate_span(chat_span) scope. So a provider HTTP instrumentor's
per-chunk spans started a disconnected trace instead of nesting under the
chat span. Activate the chat span around each next()/__anext__() (the
bounded fetch) -- not across the yield, so context still can't leak to
the consumer. Mirrors the generator-tool span fix; uses __anext__() to
stay Python 3.9-compatible.
The streaming loop now activates the chat span around each chunk fetch
(not just the initial request), so the old "must not run while the span
is active" wording was inaccurate. Clarify that activation is scoped to
the bounded provider calls and never spans a yield.
Make the OpenTelemetry guide approachable to newcomers while staying useful
to experienced users:

- Lead with the production-visibility problem and a short "what is a trace?"
  primer (span/trace), plus a note on the api-vs-sdk split.
- Add a worked travel-assistant example and embed a real trace waterfall
  (docs/images/otel-trace.svg) captured from an actual run, showing the
  httpx and tool-internal spans nesting under chatlas's own spans.
- Unify the lower-level instrumentation story into one ladder
  (httpx -> OpenLLMetry -> official per-provider libs) and connect it to the
  worked example, instead of an orphaned "combining" note far from it.
- Document the GenAI semantic-convention attributes accurately and keep the
  content-capture env var documented in a single place.

The figure is regenerated by scripts/gen_otel_trace_svg.py.
@cpsievert cpsievert requested a review from Copilot June 2, 2026 22:28
@cpsievert cpsievert changed the title feat: add 1st class OpenTelemetry support feat: 1st class OpenTelemetry support Jun 2, 2026

This comment was marked as resolved.

@cpsievert cpsievert merged commit 2f66dc0 into main Jun 2, 2026
9 checks passed
@cpsievert cpsievert deleted the worktree-feat+otel-framework-spans branch June 3, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants