v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing by yzhao062 · Pull Request #688 · yzhao062/pyod

yzhao062 · 2026-05-19T22:08:16Z

Summary

KB-tools API for agent-driven and LLM-driven detector routing. Two new surfaces on ADEngine, both keyword-only and backward-compatible with every v3.5.2 caller pattern. Reviewed via four rounds of /implement-review with Codex (rounds 1 and 2 each surfaced real findings that were addressed before merging; rounds 3 and 4 cleared with no new actionable items).

Surface 1 (agent tools). ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None) returns a structured KB snapshot of every shipped detector (strengths, weaknesses, best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered by constraints.exclude_detectors and constraints.data_type_strict, sorted by modality-specific benchmark rank. ADEngine.make_plan(detector_choices, justifications=None, params=None) validates a caller-chosen ordered detector list against the KB and returns a closed-schema DetectionPlan consumable by build_detector / run. Together they let an agent runtime (Claude Code, Codex CLI, MCP tool clients) reason over the KB directly and commit a routing decision without going through hand-coded rules.
Surface 2 (programmatic LLM client). ADEngine.plan_detection(profile, llm_client=callable, top_k=3, llm_strict=None) accepts a user-supplied (prompt: str) -> str callable wrapping any LLM SDK (Anthropic, OpenAI, vLLM, self-hosted). The engine builds the routing prompt internally, invokes the callable, parses the response, and returns the same DetectionPlan shape. On LLM-call or parse failure, falls back to rule-driven routing with a RuntimeWarning; per-call llm_strict=True (or the PYOD3_LLM_STRICT=1 env var) re-raises instead.
top_k generalization. plan_detection now exposes the previously hard-coded valid[1:3] alternatives slice as a top_k parameter (default 3 preserves v3.5.2 behavior; values < 1 are clamped).

This release does not close a numbered issue. It is the library-side substrate for the agentic routing evidence in the PyOD 3 paper (KDD 2027 ADS Cycle 1) §5.4 three-tier comparison.

Approach

Surface 1: `get_kb_for_routing` + `make_plan`

get_kb_for_routing returns one entry per shipped detector for the requested modality, with the same shape across modalities so an agent prompt can render every entry uniformly. Filtering is two-stage: constraints.exclude_detectors is a hard set difference (case-sensitive); constraints.data_type_strict (default True) drops detectors whose modality_match does not include the requested modality. Sort order is the modality-specific benchmark rank with ADBench_overall as the universal fallback (see "Round 1 reviewer fix (b)" below for the rank-key table).

make_plan is the structured commit step. It accepts an ordered detector list, validates each name against the shipped KB (unknown names raise ValueError), overlays per-detector params with engine contamination resolution, and returns a DetectionPlan dict with detector_choice, params, and a justifications field. The returned plan is the same shape consumed by ADEngine.build_detector / run, so an agent that uses get_kb_for_routing + make_plan lands in the same downstream code path as the rule router.

Surface 2: `plan_detection(..., llm_client=callable)`

llm_client is a Protocol (LLMCallable: (prompt: str) -> str) rather than a provider-specific adapter; PyOD ships no SDK glue. Users wrap their own Anthropic / OpenAI / vLLM / self-hosted SDK. The engine builds the routing prompt internally via pyod.utils._llm.build_routing_prompt(kb_context, top_k) and parses the response via pyod.utils._llm.parse_routing_response(response, kb, top_k). The parser tolerates surrounding prose and markdown fences, skips unknown detector names with a logged warning, dedupes, and truncates to top_k; raises RoutingParseError if no JSON array is extractable or no valid detector survives KB validation.

Failure handling is tunable per call. Default behavior is fall back to rule-driven routing with a RuntimeWarning on either LLM-call failure or parse failure. Explicit llm_strict=True re-raises instead; explicit llm_strict=False always falls back; llm_strict=None (default) defers to the PYOD3_LLM_STRICT env var. This three-way precedence replaces the env-only switch added during Round 1 review, which was process-global and incorrect for concurrent callers in the same process.

`top_k` generalization

plan_detection(..., top_k=3) exposes the previously hard-coded valid[1:3] alternatives slice as a parameter. Default 3 preserves v3.5.2 behavior; values < 1 are clamped to 1; the alternatives slice in the returned plan respects the new top_k. The new top_k, llm_client, and llm_strict parameters are keyword-only via a * separator in the plan_detection signature (added in Round 2 review).

Round 1 reviewer fixes (Codex via `/implement-review auto`)

(a) High: _plan_via_llm did not enforce constrained KB context after parsing. Previously the LLM path validated returned detector names only against the global KB and could bypass hard constraints.exclude_detectors constraints. Fix: if the LLM returns a detector excluded by constraints.exclude_detectors or filtered by data_type_strict, the engine raises RoutingParseError and falls back to rule routing with a RuntimeWarning.

(b) Medium: get_kb_for_routing sorted non-tabular modalities alphabetically. The legacy sort key was {modality}.title() + '_overall', which does not match the KB's actual rank fields for any non-tabular modality. Fix: _MODALITY_RANK_KEYS table maps time_series → TSB_AD_overall / TSB_AD_overall_iforest, graph → BOND_deep / BOND_overall, text → NLP_ADBench_overall, image → MVTec_overall, with ADBench_overall as the universal fallback for every modality.

(c) Medium: new per-call kwarg plan_detection(..., llm_strict: bool | None = None). Adds a per-call alternative to the process-global PYOD3_LLM_STRICT env var, with the three-way precedence described above.

Round 2 reviewer fixes (Codex via `/implement-review auto`)

(d) Medium: keyword-only signature. plan_detection's new top_k, llm_client, and llm_strict parameters are now actually keyword-only via a * separator before them in the signature, matching the release-notes contract.

(e) Medium: get_kb_for_routing now stamps each returned detector entry with resolved_rank and resolved_rank_key fields carrying the modality-specific benchmark rank it used for sorting. build_routing_prompt reads those fields so the LLM-facing prompt renders e.g. rank=10 (TSB_AD_overall) for time-series detectors, instead of the empty rank= it would otherwise render under the corrected modality rank-key table.

API compatibility

Every v3.5.2 caller pattern produces identical output:

plan_detection(profile) — unchanged
plan_detection(profile, priority=...) — unchanged
plan_detection(profile, constraints=...) — unchanged

The new top_k, llm_client, and llm_strict parameters are keyword-only with backward-compatible defaults. LLMCallable is a Protocol, not an inheritance-required base class. No breaking API changes.

Out of scope for v3.5.3:

routing_rules.json rule authoring (rules remain the offline fallback).
LLM-decided top_k (caller decides).
Built-in CLI adapter classes for Codex / Claude Code (users wrap subscriptions themselves; the LLMCallable protocol is the integration point).
Async llm_client.

Tests

44 new in pyod/test/test_kb_router_surface1.py covering schema, filters, ordering, KB validation, top_k clamping, stub LLM client canned plan, top_k truncation of LLM response, malformed response fallback, PYOD3_LLM_STRICT=1 re-raise, prose tolerance, markdown-fence tolerance, dedupe, and bare-string entries.
6 Round 1 regression tests covering the constraint-bypass High fix, modality rank-key ordering for time_series and graph, and the three-way llm_strict precedence (True / False / None).
3 Round 2 regression tests covering the keyword-only signature contract, prompt rank annotation under time_series, and the text-modality fallback path when the KB has no rank data.

All 205 existing ADEngine tests continue to pass. Total new test count this release: 53.

Files touched

pyod/utils/ad_engine.py — get_kb_for_routing, make_plan, plan_detection keyword-only params (top_k, llm_client, llm_strict), _plan_via_llm helper, _MODALITY_RANK_KEYS, resolved_rank / resolved_rank_key stamping.
pyod/utils/_llm.py (new) — LLMCallable Protocol, RoutingParseError, build_routing_prompt(kb_context, top_k), parse_routing_response(response, kb, top_k).
pyod/test/test_kb_router_surface1.py (new) — 44 Surface-1 / Surface-2 tests; 6 + 3 regression tests for Round 1 + Round 2 fixes are co-located in the same file.
pyod/version.py — bumped to 3.5.3.
CHANGES.txt — v3.5.3 entry.

Test plan

CI passes (Linux, macOS, Windows). Pre-existing Windows MKL failures on TestFastABOD / TestKnnNearestNeighborsConfig / TestLUNARNearestNeighborsConfig / TestSODNearestNeighborsConfig reproduce on a clean tree without these changes and are unrelated.
Optional downstream: confirm in the PyOD 3 §5.4 evidence harness (pyod3-experiments framework_on_kb) that ADEngine.get_kb_for_routing + ADEngine.make_plan round-trip produces the same DetectionPlan as the harness fallback path on the 5-task ADBench pilot.

Surface 1 (agent tools): - ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None) returns a structured KB snapshot (every shipped detector with strengths, weaknesses, best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered by exclude_detectors + data_type_strict and sorted by modality-specific benchmark rank keys (TSB_AD_overall for time-series, BOND_deep for graph, NLP_ADBench_overall for text, MVTec_overall for image, ADBench_overall fallback). Each entry carries resolved_rank + resolved_rank_key for downstream tools. - ADEngine.make_plan(detector_choices, justifications=None, params=None) validates names against the KB (case-sensitive, must be shipped) and returns a closed-schema DetectionPlan consumable by build_detector / run. Surface 2 (programmatic API): - ADEngine.plan_detection(profile, priority='balanced', constraints=None, *, top_k=3, llm_client=None, llm_strict=None) accepts a (prompt: str) -> str callable. Engine builds the routing prompt internally, invokes the callable, parses the response, enforces the constrained KB context post-parse (LLM cannot bypass exclude_detectors / data_type_strict), and returns a DetectionPlan with note='llm-driven via plan_detection(...)' + evidence=['llm_routing']. On LLM call or parse failure, falls back to rule routing with a RuntimeWarning. llm_strict=True per-call overrides the PYOD3_LLM_STRICT env var; precedence is explicit kwarg > env > default. - pyod/utils/_llm.py: new module with LLMCallable Protocol, RoutingParseError, build_routing_prompt(kb_context, top_k), parse_routing_response(response, kb, top_k). Parser tolerates surrounding prose, markdown fences, BOM/CRLF, skips unknown / non-shipped detectors with a logged warning, dedupes, truncates to top_k. top_k generalization: plan_detection's previous valid[1:3] hard cap is now valid[1:top_k]. Default 3 preserves v3.5.2 behavior. Values < 1 clamp to 1. Tests: 44 new in pyod/test/test_kb_router_surface1.py covering Surface 1 schema / filters / KB validation / ordering / clamp, make_plan single + multi detector / unknown / non-list / contamination overlay / build_detector consumption, plan_detection top_k variants, llm_client stub canned plan / top_k truncation / malformed fallback / strict re-raise / None preserves rule plan, parser robustness (prose / markdown / dedupe / truncate / no-array raise / all-invalid raise / bare-string), the post-parse constrained KB guard (catches both exclude_detectors and data_type_strict bypass), per-call llm_strict three-way precedence, modality-specific rank-key ordering (time-series + graph), keyword-only signature, prompt builder modality annotation. All 205 existing ADEngine-related tests still pass (171 in test_ad_engine.py + test_ad_engine_v3.py + test_ad_engine_compare.py). Backward compatibility: every v3.5.2 caller pattern produces identical output. The new top_k, llm_client, and llm_strict parameters are keyword-only via a * separator before them in the signature. Reviewed via /implement-review auto across 4 Codex rounds (3 substantive fixes + 2 nit fixes); Round 4 verdict: zero new findings, commit-ready. Out of scope: routing_rules.json rule authoring; LLM-decided top_k; built-in CLI adapters for Codex / Claude Code; async llm_client. No breaking API changes.

coveralls · 2026-05-19T22:25:16Z

Coverage Report for CI Build 26128282430

Coverage increased (+0.06%) to 93.874%

Details

Coverage increased (+0.06%) from the base build.
Patch coverage: 19 uncovered changes across 3 files (483 of 502 lines covered, 96.22%).
No coverage regressions found.

Uncovered Changes

File	Changed	Covered	%
pyod/utils/_llm.py	91	75	82.42%
pyod/utils/ad_engine.py	104	102	98.08%
pyod/test/test_kb_router_surface1.py	307	306	99.67%

Coverage Regressions

No coverage regressions found.

Coverage Stats


Relevant Lines:	19802
Covered Lines:	18589
Line Coverage:	93.87%
Coverage Strength:	10.3 hits per line

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688

v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688
yzhao062 wants to merge 1 commit into
masterfrom
development

yzhao062 commented May 19, 2026

Uh oh!

coveralls commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yzhao062 commented May 19, 2026

Summary

Approach

Surface 1: get_kb_for_routing + make_plan

Surface 2: plan_detection(..., llm_client=callable)

top_k generalization

Round 1 reviewer fixes (Codex via /implement-review auto)

Round 2 reviewer fixes (Codex via /implement-review auto)

API compatibility

Tests

Files touched

Test plan

Uh oh!

coveralls commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 26128282430

Coverage increased (+0.06%) to 93.874%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Surface 1: `get_kb_for_routing` + `make_plan`

Surface 2: `plan_detection(..., llm_client=callable)`

`top_k` generalization

Round 1 reviewer fixes (Codex via `/implement-review auto`)

Round 2 reviewer fixes (Codex via `/implement-review auto`)

coveralls commented May 19, 2026 •

edited

Loading