Skip to content

fix(ai_guard): scan Anthropic document content blocks#18576

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 5 commits into
mainfrom
fix/ai-guard-anthropic-document-content
Jun 12, 2026
Merged

fix(ai_guard): scan Anthropic document content blocks#18576
gh-worker-dd-mergequeue-cf854d[bot] merged 5 commits into
mainfrom
fix/ai-guard-anthropic-document-content

Conversation

@avara1986

@avara1986 avara1986 commented Jun 11, 2026

Copy link
Copy Markdown
Member

Description

JIRA: APMSP-3286

The Anthropic AI Guard converter (ddtrace/appsec/_ai_guard/_anthropic.py) listed document in _DROPPED_BLOCK_TYPES, so document content blocks were dropped before evaluation. Anthropic document blocks carry model-visible content:

  • source.type == "text" → plain text in source.data
  • source.type == "content" → nested text/image blocks
  • title / context → model-visible strings

Combined with the before-hook's skip-when-empty path (if not ai_guard_messages: return None), a document-only prompt produced no convertible messages and evaluation was skipped entirely; a benign-text + malicious-document prompt was scanned only on the surrounding text. Either way an attacker who can place document content into a traced Anthropic call bypasses the AI Guard prompt-injection / security check. Streaming and non-streaming hooks share the behavior via the same converter.

This change removes document from _DROPPED_BLOCK_TYPES and adds _format_document_block():

  • text / content sources (and title/context) are extracted and scanned.
  • Binary (base64) / remote (url) sources — which AI Guard cannot read as text — emit a [non-text document] placeholder so a document-only message still yields an evaluable payload (no silent skip), without pretending to OCR binary PDFs.

Resolves APMSP-3286. This is the Anthropic counterpart of the Strands fix in #18574 (APMSP-3089).

Testing

In tests/appsec/ai_guard/anthropic/test_anthropic.py:

  • Updated two tests that asserted the old drop-document behavior (one now uses redacted_thinking to keep empty-wrapper-suppression coverage; the other asserts the binary-document marker).
  • Added converter tests: text source, content source, title/context, document-only evaluability, and binary-source → marker.
  • Added a before-hook regression proving a document-only prompt now reaches client.evaluate instead of being skipped.

test_anthropic.py (82 passed / 5 version-skipped) and test_streaming.py (7 passed / 2 skipped) pass on Python 3.11. lint fmt, typing, and spelling pass.

Risks

Low. Behavior is unchanged for text-only conversations and for genuinely non-scannable blocks (redacted_thinking, etc.). Document blocks now contribute text (or a short placeholder) to the AI Guard payload, which can cause evaluation to run where it previously did not — the intended fix.

Additional Notes

The placeholder keeps binary/remote document sources from silently bypassing evaluation; forwarding richer representations (e.g. document images as image_url parts) could be a follow-up.

🤖 Generated with Claude Code

The Anthropic AI Guard converter treated `document` blocks as non-scannable
and dropped them before evaluation. Anthropic document blocks carry
model-visible content: `source.type == "text"` holds plain text,
`source.type == "content"` nests text/image blocks, and `title`/`context`
are model-visible strings. A document-only prompt therefore produced no
convertible messages and the before-hook skipped evaluation entirely, while
a benign-text + malicious-document prompt was scanned only on the surrounding
text — an AI Guard bypass (APMSP-3286). Streaming and non-streaming hooks
shared the behavior via the same converter.

Document blocks are now converted: readable text sources are scanned, and
binary (`base64`) / remote (`url`) sources emit a `[non-text document]`
placeholder so a document-only message still yields an evaluable payload.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codeowners resolved as

ddtrace/appsec/_ai_guard/_anthropic.py                                  @DataDog/asm-python
releasenotes/notes/ai-guard-anthropic-document-content-9d4b0c945a0db0f7.yaml  @DataDog/apm-python
tests/appsec/ai_guard/anthropic/test_anthropic.py                       @DataDog/asm-python

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 93f77e2 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter

pr-commenter Bot commented Jun 11, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-06-11 14:26:16

Comparing candidate commit 93f77e2 in PR branch fix/ai-guard-anthropic-document-content with baseline commit 42c7b35 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 83 metrics, 0 unstable metrics.

scenario:iastaspectsospath-ospathbasename_aspect

  • 🟥 execution_time [+99.411µs; +109.074µs] or [+23.255%; +25.516%]

@avara1986 avara1986 marked this pull request as ready for review June 11, 2026 13:48
@avara1986 avara1986 requested review from a team as code owners June 11, 2026 13:48
@avara1986

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7a22bb57c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/appsec/_ai_guard/_anthropic.py
@avara1986

Copy link
Copy Markdown
Member Author

/merge

@gh-worker-devflow-routing-ef8351

gh-worker-devflow-routing-ef8351 Bot commented Jun 12, 2026

Copy link
Copy Markdown

View all feedbacks in Devflow UI.

2026-06-12 07:26:19 UTC ℹ️ Start processing command /merge


2026-06-12 07:26:24 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 55m (p90).


2026-06-12 08:09:34 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 28814a8 into main Jun 12, 2026
666 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the fix/ai-guard-anthropic-document-content branch June 12, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants