fix(ai_guard): record model output in LLMObs when blocked after model call#18585
fix(ai_guard): record model output in LLMObs when blocked after model call#18585avara1986 wants to merge 4 commits into
Conversation
… call When AI Guard blocked a request AFTER the model call completed, the OpenAI and Anthropic integrations dropped the model output from the LLMObs span: output extraction is gated on `not span.error`, and the block errors the span. The response was already produced (and is visible in the AI Guard UI), but LLMObs recorded an empty output — APPSEC-68147. The contrib patches now flag the span with an `AI_GUARD_BLOCKED` ctx item when a `DDBlockException` is raised after a successful model call (Anthropic also attaches the response to the request event, which was previously only set after the after-hook). The OpenAI (chat + responses) and Anthropic output extractors honour the flag and still record the model output even though the span is errored by the block. Behaviour is unchanged for genuine model/API errors, where no response exists. Verified end-to-end via the AI Guard OpenAI dogfooding scenario: on an after-model block the LLMObs span output goes from empty (pre-fix) to the full model response (post-fix). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codeowners resolved as |
|
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
| ctx.span._set_ctx_item(AI_GUARD_BLOCKED, True) | ||
| ctx.dispatch_ended_event(*sys.exc_info()) | ||
| raise | ||
| event.response = resp |
There was a problem hiding this comment.
Can we just always set the event.response = resp before we do the AI Guard after core dispatch? This way we can decouple AI Guard errors from non-none responses.
| # Record output when a response exists. ``span.error`` normally | ||
| # suppresses output, but an AI Guard block after the model call errors | ||
| # the span while still having a valid response (APPSEC-68147). | ||
| if response is not None and (not span.error or span._get_ctx_item(AI_GUARD_BLOCKED)): |
There was a problem hiding this comment.
we should just gate this on if response is not None, it should be independent of span.error or ai guard blocked being true
| if span.error or not messages: | ||
| # ``span.error`` normally suppresses output, but an AI Guard block after the | ||
| # model call errors the span while a valid response exists (APPSEC-68147). | ||
| if (span.error and not span._get_ctx_item(AI_GUARD_BLOCKED)) or not messages: |
There was a problem hiding this comment.
what's stopping us from just gating this as if not messages?
Description
JIRA: APPSEC-68147
When AI Guard blocks a request after the model call completes, the OpenAI and Anthropic integrations dropped the model output from the LLM Observability span. Output extraction is gated on
not span.error, and the AI Guard block errors the span — so even though the model response was produced (and is visible in the AI Guard UI), LLM Obs recorded an empty output.Root causes (same symptom, two mechanisms):
contrib/internal/anthropic/patch.py):event.response = respwas only set after the.afterdispatch, so on a block the response was never attached to the event and the ended-event handler recordedresponse=None.llmobs/_integrations/utils.py): the response is available, butopenai_set_meta_tags_from_chat/_from_responseblank the output wheneverspan.erroris set — which the block triggers.Fix
AI_GUARD_BLOCKED(llmobs/_constants.py).DDBlockExceptionis raised after a successful model call (Anthropic also now attaches the response to the request event before the after-hook).Testing
Before this PR
After this PR: