Skip to content

v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688

Open
yzhao062 wants to merge 1 commit into
masterfrom
development
Open

v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688
yzhao062 wants to merge 1 commit into
masterfrom
development

Conversation

@yzhao062
Copy link
Copy Markdown
Owner

Summary

KB-tools API for agent-driven and LLM-driven detector routing. Two new surfaces on ADEngine, both keyword-only and backward-compatible with every v3.5.2 caller pattern. Reviewed via four rounds of /implement-review with Codex (rounds 1 and 2 each surfaced real findings that were addressed before merging; rounds 3 and 4 cleared with no new actionable items).

  • Surface 1 (agent tools). ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None) returns a structured KB snapshot of every shipped detector (strengths, weaknesses, best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered by constraints.exclude_detectors and constraints.data_type_strict, sorted by modality-specific benchmark rank. ADEngine.make_plan(detector_choices, justifications=None, params=None) validates a caller-chosen ordered detector list against the KB and returns a closed-schema DetectionPlan consumable by build_detector / run. Together they let an agent runtime (Claude Code, Codex CLI, MCP tool clients) reason over the KB directly and commit a routing decision without going through hand-coded rules.
  • Surface 2 (programmatic LLM client). ADEngine.plan_detection(profile, llm_client=callable, top_k=3, llm_strict=None) accepts a user-supplied (prompt: str) -> str callable wrapping any LLM SDK (Anthropic, OpenAI, vLLM, self-hosted). The engine builds the routing prompt internally, invokes the callable, parses the response, and returns the same DetectionPlan shape. On LLM-call or parse failure, falls back to rule-driven routing with a RuntimeWarning; per-call llm_strict=True (or the PYOD3_LLM_STRICT=1 env var) re-raises instead.
  • top_k generalization. plan_detection now exposes the previously hard-coded valid[1:3] alternatives slice as a top_k parameter (default 3 preserves v3.5.2 behavior; values < 1 are clamped).

This release does not close a numbered issue. It is the library-side substrate for the agentic routing evidence in the PyOD 3 paper (KDD 2027 ADS Cycle 1) §5.4 three-tier comparison.

Approach

Surface 1: get_kb_for_routing + make_plan

get_kb_for_routing returns one entry per shipped detector for the requested modality, with the same shape across modalities so an agent prompt can render every entry uniformly. Filtering is two-stage: constraints.exclude_detectors is a hard set difference (case-sensitive); constraints.data_type_strict (default True) drops detectors whose modality_match does not include the requested modality. Sort order is the modality-specific benchmark rank with ADBench_overall as the universal fallback (see "Round 1 reviewer fix (b)" below for the rank-key table).

make_plan is the structured commit step. It accepts an ordered detector list, validates each name against the shipped KB (unknown names raise ValueError), overlays per-detector params with engine contamination resolution, and returns a DetectionPlan dict with detector_choice, params, and a justifications field. The returned plan is the same shape consumed by ADEngine.build_detector / run, so an agent that uses get_kb_for_routing + make_plan lands in the same downstream code path as the rule router.

Surface 2: plan_detection(..., llm_client=callable)

llm_client is a Protocol (LLMCallable: (prompt: str) -> str) rather than a provider-specific adapter; PyOD ships no SDK glue. Users wrap their own Anthropic / OpenAI / vLLM / self-hosted SDK. The engine builds the routing prompt internally via pyod.utils._llm.build_routing_prompt(kb_context, top_k) and parses the response via pyod.utils._llm.parse_routing_response(response, kb, top_k). The parser tolerates surrounding prose and markdown fences, skips unknown detector names with a logged warning, dedupes, and truncates to top_k; raises RoutingParseError if no JSON array is extractable or no valid detector survives KB validation.

Failure handling is tunable per call. Default behavior is fall back to rule-driven routing with a RuntimeWarning on either LLM-call failure or parse failure. Explicit llm_strict=True re-raises instead; explicit llm_strict=False always falls back; llm_strict=None (default) defers to the PYOD3_LLM_STRICT env var. This three-way precedence replaces the env-only switch added during Round 1 review, which was process-global and incorrect for concurrent callers in the same process.

top_k generalization

plan_detection(..., top_k=3) exposes the previously hard-coded valid[1:3] alternatives slice as a parameter. Default 3 preserves v3.5.2 behavior; values < 1 are clamped to 1; the alternatives slice in the returned plan respects the new top_k. The new top_k, llm_client, and llm_strict parameters are keyword-only via a * separator in the plan_detection signature (added in Round 2 review).

Round 1 reviewer fixes (Codex via /implement-review auto)

(a) High: _plan_via_llm did not enforce constrained KB context after parsing. Previously the LLM path validated returned detector names only against the global KB and could bypass hard constraints.exclude_detectors constraints. Fix: if the LLM returns a detector excluded by constraints.exclude_detectors or filtered by data_type_strict, the engine raises RoutingParseError and falls back to rule routing with a RuntimeWarning.

(b) Medium: get_kb_for_routing sorted non-tabular modalities alphabetically. The legacy sort key was {modality}.title() + '_overall', which does not match the KB's actual rank fields for any non-tabular modality. Fix: _MODALITY_RANK_KEYS table maps time_seriesTSB_AD_overall / TSB_AD_overall_iforest, graphBOND_deep / BOND_overall, textNLP_ADBench_overall, imageMVTec_overall, with ADBench_overall as the universal fallback for every modality.

(c) Medium: new per-call kwarg plan_detection(..., llm_strict: bool | None = None). Adds a per-call alternative to the process-global PYOD3_LLM_STRICT env var, with the three-way precedence described above.

Round 2 reviewer fixes (Codex via /implement-review auto)

(d) Medium: keyword-only signature. plan_detection's new top_k, llm_client, and llm_strict parameters are now actually keyword-only via a * separator before them in the signature, matching the release-notes contract.

(e) Medium: get_kb_for_routing now stamps each returned detector entry with resolved_rank and resolved_rank_key fields carrying the modality-specific benchmark rank it used for sorting. build_routing_prompt reads those fields so the LLM-facing prompt renders e.g. rank=10 (TSB_AD_overall) for time-series detectors, instead of the empty rank= it would otherwise render under the corrected modality rank-key table.

API compatibility

Every v3.5.2 caller pattern produces identical output:

  • plan_detection(profile) — unchanged
  • plan_detection(profile, priority=...) — unchanged
  • plan_detection(profile, constraints=...) — unchanged

The new top_k, llm_client, and llm_strict parameters are keyword-only with backward-compatible defaults. LLMCallable is a Protocol, not an inheritance-required base class. No breaking API changes.

Out of scope for v3.5.3:

  • routing_rules.json rule authoring (rules remain the offline fallback).
  • LLM-decided top_k (caller decides).
  • Built-in CLI adapter classes for Codex / Claude Code (users wrap subscriptions themselves; the LLMCallable protocol is the integration point).
  • Async llm_client.

Tests

  • 44 new in pyod/test/test_kb_router_surface1.py covering schema, filters, ordering, KB validation, top_k clamping, stub LLM client canned plan, top_k truncation of LLM response, malformed response fallback, PYOD3_LLM_STRICT=1 re-raise, prose tolerance, markdown-fence tolerance, dedupe, and bare-string entries.
  • 6 Round 1 regression tests covering the constraint-bypass High fix, modality rank-key ordering for time_series and graph, and the three-way llm_strict precedence (True / False / None).
  • 3 Round 2 regression tests covering the keyword-only signature contract, prompt rank annotation under time_series, and the text-modality fallback path when the KB has no rank data.

All 205 existing ADEngine tests continue to pass. Total new test count this release: 53.

Files touched

  • pyod/utils/ad_engine.pyget_kb_for_routing, make_plan, plan_detection keyword-only params (top_k, llm_client, llm_strict), _plan_via_llm helper, _MODALITY_RANK_KEYS, resolved_rank / resolved_rank_key stamping.
  • pyod/utils/_llm.py (new) — LLMCallable Protocol, RoutingParseError, build_routing_prompt(kb_context, top_k), parse_routing_response(response, kb, top_k).
  • pyod/test/test_kb_router_surface1.py (new) — 44 Surface-1 / Surface-2 tests; 6 + 3 regression tests for Round 1 + Round 2 fixes are co-located in the same file.
  • pyod/version.py — bumped to 3.5.3.
  • CHANGES.txt — v3.5.3 entry.

Test plan

  • CI passes (Linux, macOS, Windows). Pre-existing Windows MKL failures on TestFastABOD / TestKnnNearestNeighborsConfig / TestLUNARNearestNeighborsConfig / TestSODNearestNeighborsConfig reproduce on a clean tree without these changes and are unrelated.
  • Optional downstream: confirm in the PyOD 3 §5.4 evidence harness (pyod3-experiments framework_on_kb) that ADEngine.get_kb_for_routing + ADEngine.make_plan round-trip produces the same DetectionPlan as the harness fallback path on the 5-task ADBench pilot.

Surface 1 (agent tools):
- ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None) returns a
  structured KB snapshot (every shipped detector with strengths, weaknesses,
  best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered
  by exclude_detectors + data_type_strict and sorted by modality-specific
  benchmark rank keys (TSB_AD_overall for time-series, BOND_deep for graph,
  NLP_ADBench_overall for text, MVTec_overall for image, ADBench_overall
  fallback). Each entry carries resolved_rank + resolved_rank_key for
  downstream tools.
- ADEngine.make_plan(detector_choices, justifications=None, params=None)
  validates names against the KB (case-sensitive, must be shipped) and
  returns a closed-schema DetectionPlan consumable by build_detector / run.

Surface 2 (programmatic API):
- ADEngine.plan_detection(profile, priority='balanced', constraints=None, *,
  top_k=3, llm_client=None, llm_strict=None) accepts a (prompt: str) -> str
  callable. Engine builds the routing prompt internally, invokes the
  callable, parses the response, enforces the constrained KB context
  post-parse (LLM cannot bypass exclude_detectors / data_type_strict), and
  returns a DetectionPlan with note='llm-driven via plan_detection(...)' +
  evidence=['llm_routing']. On LLM call or parse failure, falls back to
  rule routing with a RuntimeWarning. llm_strict=True per-call overrides
  the PYOD3_LLM_STRICT env var; precedence is explicit kwarg > env > default.
- pyod/utils/_llm.py: new module with LLMCallable Protocol, RoutingParseError,
  build_routing_prompt(kb_context, top_k), parse_routing_response(response,
  kb, top_k). Parser tolerates surrounding prose, markdown fences, BOM/CRLF,
  skips unknown / non-shipped detectors with a logged warning, dedupes,
  truncates to top_k.

top_k generalization: plan_detection's previous valid[1:3] hard cap is now
valid[1:top_k]. Default 3 preserves v3.5.2 behavior. Values < 1 clamp to 1.

Tests: 44 new in pyod/test/test_kb_router_surface1.py covering Surface 1
schema / filters / KB validation / ordering / clamp, make_plan single + multi
detector / unknown / non-list / contamination overlay / build_detector
consumption, plan_detection top_k variants, llm_client stub canned plan /
top_k truncation / malformed fallback / strict re-raise / None preserves
rule plan, parser robustness (prose / markdown / dedupe / truncate / no-array
raise / all-invalid raise / bare-string), the post-parse constrained KB
guard (catches both exclude_detectors and data_type_strict bypass), per-call
llm_strict three-way precedence, modality-specific rank-key ordering
(time-series + graph), keyword-only signature, prompt builder modality
annotation. All 205 existing ADEngine-related tests still pass (171 in
test_ad_engine.py + test_ad_engine_v3.py + test_ad_engine_compare.py).

Backward compatibility: every v3.5.2 caller pattern produces identical
output. The new top_k, llm_client, and llm_strict parameters are
keyword-only via a * separator before them in the signature.

Reviewed via /implement-review auto across 4 Codex rounds (3 substantive
fixes + 2 nit fixes); Round 4 verdict: zero new findings, commit-ready.

Out of scope: routing_rules.json rule authoring; LLM-decided top_k; built-in
CLI adapters for Codex / Claude Code; async llm_client. No breaking API
changes.
@coveralls
Copy link
Copy Markdown

coveralls commented May 19, 2026

Coverage Report for CI Build 26128282430

Coverage increased (+0.06%) to 93.874%

Details

  • Coverage increased (+0.06%) from the base build.
  • Patch coverage: 19 uncovered changes across 3 files (483 of 502 lines covered, 96.22%).
  • No coverage regressions found.

Uncovered Changes

File Changed Covered %
pyod/utils/_llm.py 91 75 82.42%
pyod/utils/ad_engine.py 104 102 98.08%
pyod/test/test_kb_router_surface1.py 307 306 99.67%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 19802
Covered Lines: 18589
Line Coverage: 93.87%
Coverage Strength: 10.3 hits per line

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants