Skip to content

fix(ai_guard): record model output in LLMObs when blocked after model call#18585

Open
avara1986 wants to merge 4 commits into
mainfrom
fix/ai-guard-llmobs-output-after-block
Open

fix(ai_guard): record model output in LLMObs when blocked after model call#18585
avara1986 wants to merge 4 commits into
mainfrom
fix/ai-guard-llmobs-output-after-block

Conversation

@avara1986

@avara1986 avara1986 commented Jun 11, 2026

Copy link
Copy Markdown
Member

Description

JIRA: APPSEC-68147

When AI Guard blocks a request after the model call completes, the OpenAI and Anthropic integrations dropped the model output from the LLM Observability span. Output extraction is gated on not span.error, and the AI Guard block errors the span — so even though the model response was produced (and is visible in the AI Guard UI), LLM Obs recorded an empty output.

Root causes (same symptom, two mechanisms):

  • Anthropic (contrib/internal/anthropic/patch.py): event.response = resp was only set after the .after dispatch, so on a block the response was never attached to the event and the ended-event handler recorded response=None.
  • OpenAI (llmobs/_integrations/utils.py): the response is available, but openai_set_meta_tags_from_chat / _from_response blank the output whenever span.error is set — which the block triggers.

Fix

  • Add a span ctx-item flag AI_GUARD_BLOCKED (llmobs/_constants.py).
  • The contrib patches set the flag when a DDBlockException is raised after a successful model call (Anthropic also now attaches the response to the request event before the after-hook).
  • The OpenAI (chat + responses) and Anthropic output extractors honour the flag: when the span is errored but a valid response exists due to an AI Guard block, the model output is still recorded. Behaviour is unchanged for genuine model/API errors (no response exists).

Testing

Before this PR

image

After this PR:

image

… call

When AI Guard blocked a request AFTER the model call completed, the OpenAI and
Anthropic integrations dropped the model output from the LLMObs span: output
extraction is gated on `not span.error`, and the block errors the span. The
response was already produced (and is visible in the AI Guard UI), but LLMObs
recorded an empty output — APPSEC-68147.

The contrib patches now flag the span with an `AI_GUARD_BLOCKED` ctx item when a
`DDBlockException` is raised after a successful model call (Anthropic also
attaches the response to the request event, which was previously only set after
the after-hook). The OpenAI (chat + responses) and Anthropic output extractors
honour the flag and still record the model output even though the span is
errored by the block. Behaviour is unchanged for genuine model/API errors,
where no response exists.

Verified end-to-end via the AI Guard OpenAI dogfooding scenario: on an
after-model block the LLMObs span output goes from empty (pre-fix) to the full
model response (post-fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codeowners resolved as

tests/appsec/ai_guard/anthropic/test_anthropic.py                       @DataDog/asm-python

@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented Jun 11, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1c85f5c | Docs | Datadog PR Page | Give us feedback!

@avara1986

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@avara1986 avara1986 marked this pull request as ready for review June 12, 2026 13:40
@avara1986 avara1986 requested review from a team as code owners June 12, 2026 13:40
@avara1986 avara1986 requested review from dubloom and sabrenner June 12, 2026 13:40
ctx.span._set_ctx_item(AI_GUARD_BLOCKED, True)
ctx.dispatch_ended_event(*sys.exc_info())
raise
event.response = resp

@Yun-Kim Yun-Kim Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just always set the event.response = resp before we do the AI Guard after core dispatch? This way we can decouple AI Guard errors from non-none responses.

# Record output when a response exists. ``span.error`` normally
# suppresses output, but an AI Guard block after the model call errors
# the span while still having a valid response (APPSEC-68147).
if response is not None and (not span.error or span._get_ctx_item(AI_GUARD_BLOCKED)):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just gate this on if response is not None, it should be independent of span.error or ai guard blocked being true

if span.error or not messages:
# ``span.error`` normally suppresses output, but an AI Guard block after the
# model call errors the span while a valid response exists (APPSEC-68147).
if (span.error and not span._get_ctx_item(AI_GUARD_BLOCKED)) or not messages:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's stopping us from just gating this as if not messages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants