fix(ai_guard): scan Anthropic document content blocks#18576
fix(ai_guard): scan Anthropic document content blocks#18576gh-worker-dd-mergequeue-cf854d[bot] merged 5 commits into
Conversation
The Anthropic AI Guard converter treated `document` blocks as non-scannable and dropped them before evaluation. Anthropic document blocks carry model-visible content: `source.type == "text"` holds plain text, `source.type == "content"` nests text/image blocks, and `title`/`context` are model-visible strings. A document-only prompt therefore produced no convertible messages and the before-hook skipped evaluation entirely, while a benign-text + malicious-document prompt was scanned only on the surrounding text — an AI Guard bypass (APMSP-3286). Streaming and non-streaming hooks shared the behavior via the same converter. Document blocks are now converted: readable text sources are scanned, and binary (`base64`) / remote (`url`) sources emit a `[non-text document]` placeholder so a document-only message still yields an evaluable payload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codeowners resolved as |
|
BenchmarksBenchmark execution time: 2026-06-11 14:26:16 Comparing candidate commit 93f77e2 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 83 metrics, 0 unstable metrics. scenario:iastaspectsospath-ospathbasename_aspect
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7a22bb57c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
Description
JIRA: APMSP-3286
The Anthropic AI Guard converter (
ddtrace/appsec/_ai_guard/_anthropic.py) listeddocumentin_DROPPED_BLOCK_TYPES, so document content blocks were dropped before evaluation. Anthropicdocumentblocks carry model-visible content:source.type == "text"→ plain text insource.datasource.type == "content"→ nestedtext/imageblockstitle/context→ model-visible stringsCombined with the before-hook's skip-when-empty path (
if not ai_guard_messages: return None), a document-only prompt produced no convertible messages and evaluation was skipped entirely; a benign-text + malicious-document prompt was scanned only on the surrounding text. Either way an attacker who can place document content into a traced Anthropic call bypasses the AI Guard prompt-injection / security check. Streaming and non-streaming hooks share the behavior via the same converter.This change removes
documentfrom_DROPPED_BLOCK_TYPESand adds_format_document_block():text/contentsources (andtitle/context) are extracted and scanned.base64) / remote (url) sources — which AI Guard cannot read as text — emit a[non-text document]placeholder so a document-only message still yields an evaluable payload (no silent skip), without pretending to OCR binary PDFs.Resolves APMSP-3286. This is the Anthropic counterpart of the Strands fix in #18574 (APMSP-3089).
Testing
In
tests/appsec/ai_guard/anthropic/test_anthropic.py:documentbehavior (one now usesredacted_thinkingto keep empty-wrapper-suppression coverage; the other asserts the binary-document marker).textsource,contentsource,title/context, document-only evaluability, and binary-source → marker.client.evaluateinstead of being skipped.test_anthropic.py(82 passed / 5 version-skipped) andtest_streaming.py(7 passed / 2 skipped) pass on Python 3.11.lint fmt,typing, andspellingpass.Risks
Low. Behavior is unchanged for text-only conversations and for genuinely non-scannable blocks (
redacted_thinking, etc.). Document blocks now contribute text (or a short placeholder) to the AI Guard payload, which can cause evaluation to run where it previously did not — the intended fix.Additional Notes
The placeholder keeps binary/remote document sources from silently bypassing evaluation; forwarding richer representations (e.g. document images as
image_urlparts) could be a follow-up.🤖 Generated with Claude Code