v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688
Open
yzhao062 wants to merge 1 commit into
Open
v3.5.3: KB-tools API for agent-driven and LLM-driven detector routing#688yzhao062 wants to merge 1 commit into
yzhao062 wants to merge 1 commit into
Conversation
Surface 1 (agent tools): - ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None) returns a structured KB snapshot (every shipped detector with strengths, weaknesses, best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered by exclude_detectors + data_type_strict and sorted by modality-specific benchmark rank keys (TSB_AD_overall for time-series, BOND_deep for graph, NLP_ADBench_overall for text, MVTec_overall for image, ADBench_overall fallback). Each entry carries resolved_rank + resolved_rank_key for downstream tools. - ADEngine.make_plan(detector_choices, justifications=None, params=None) validates names against the KB (case-sensitive, must be shipped) and returns a closed-schema DetectionPlan consumable by build_detector / run. Surface 2 (programmatic API): - ADEngine.plan_detection(profile, priority='balanced', constraints=None, *, top_k=3, llm_client=None, llm_strict=None) accepts a (prompt: str) -> str callable. Engine builds the routing prompt internally, invokes the callable, parses the response, enforces the constrained KB context post-parse (LLM cannot bypass exclude_detectors / data_type_strict), and returns a DetectionPlan with note='llm-driven via plan_detection(...)' + evidence=['llm_routing']. On LLM call or parse failure, falls back to rule routing with a RuntimeWarning. llm_strict=True per-call overrides the PYOD3_LLM_STRICT env var; precedence is explicit kwarg > env > default. - pyod/utils/_llm.py: new module with LLMCallable Protocol, RoutingParseError, build_routing_prompt(kb_context, top_k), parse_routing_response(response, kb, top_k). Parser tolerates surrounding prose, markdown fences, BOM/CRLF, skips unknown / non-shipped detectors with a logged warning, dedupes, truncates to top_k. top_k generalization: plan_detection's previous valid[1:3] hard cap is now valid[1:top_k]. Default 3 preserves v3.5.2 behavior. Values < 1 clamp to 1. Tests: 44 new in pyod/test/test_kb_router_surface1.py covering Surface 1 schema / filters / KB validation / ordering / clamp, make_plan single + multi detector / unknown / non-list / contamination overlay / build_detector consumption, plan_detection top_k variants, llm_client stub canned plan / top_k truncation / malformed fallback / strict re-raise / None preserves rule plan, parser robustness (prose / markdown / dedupe / truncate / no-array raise / all-invalid raise / bare-string), the post-parse constrained KB guard (catches both exclude_detectors and data_type_strict bypass), per-call llm_strict three-way precedence, modality-specific rank-key ordering (time-series + graph), keyword-only signature, prompt builder modality annotation. All 205 existing ADEngine-related tests still pass (171 in test_ad_engine.py + test_ad_engine_v3.py + test_ad_engine_compare.py). Backward compatibility: every v3.5.2 caller pattern produces identical output. The new top_k, llm_client, and llm_strict parameters are keyword-only via a * separator before them in the signature. Reviewed via /implement-review auto across 4 Codex rounds (3 substantive fixes + 2 nit fixes); Round 4 verdict: zero new findings, commit-ready. Out of scope: routing_rules.json rule authoring; LLM-decided top_k; built-in CLI adapters for Codex / Claude Code; async llm_client. No breaking API changes.
Coverage Report for CI Build 26128282430Coverage increased (+0.06%) to 93.874%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KB-tools API for agent-driven and LLM-driven detector routing. Two new surfaces on
ADEngine, both keyword-only and backward-compatible with every v3.5.2 caller pattern. Reviewed via four rounds of/implement-reviewwith Codex (rounds 1 and 2 each surfaced real findings that were addressed before merging; rounds 3 and 4 cleared with no new actionable items).ADEngine.get_kb_for_routing(profile, top_k=3, constraints=None)returns a structured KB snapshot of every shipped detector (strengths, weaknesses, best_for, avoid_when, complexity, benchmark_rank, modality_match) filtered byconstraints.exclude_detectorsandconstraints.data_type_strict, sorted by modality-specific benchmark rank.ADEngine.make_plan(detector_choices, justifications=None, params=None)validates a caller-chosen ordered detector list against the KB and returns a closed-schemaDetectionPlanconsumable bybuild_detector/run. Together they let an agent runtime (Claude Code, Codex CLI, MCP tool clients) reason over the KB directly and commit a routing decision without going through hand-coded rules.ADEngine.plan_detection(profile, llm_client=callable, top_k=3, llm_strict=None)accepts a user-supplied(prompt: str) -> strcallable wrapping any LLM SDK (Anthropic, OpenAI, vLLM, self-hosted). The engine builds the routing prompt internally, invokes the callable, parses the response, and returns the sameDetectionPlanshape. On LLM-call or parse failure, falls back to rule-driven routing with aRuntimeWarning; per-callllm_strict=True(or thePYOD3_LLM_STRICT=1env var) re-raises instead.top_kgeneralization.plan_detectionnow exposes the previously hard-codedvalid[1:3]alternatives slice as atop_kparameter (default 3 preserves v3.5.2 behavior; values < 1 are clamped).This release does not close a numbered issue. It is the library-side substrate for the agentic routing evidence in the PyOD 3 paper (KDD 2027 ADS Cycle 1) §5.4 three-tier comparison.
Approach
Surface 1:
get_kb_for_routing+make_planget_kb_for_routingreturns one entry per shipped detector for the requested modality, with the same shape across modalities so an agent prompt can render every entry uniformly. Filtering is two-stage:constraints.exclude_detectorsis a hard set difference (case-sensitive);constraints.data_type_strict(default True) drops detectors whosemodality_matchdoes not include the requested modality. Sort order is the modality-specific benchmark rank withADBench_overallas the universal fallback (see "Round 1 reviewer fix (b)" below for the rank-key table).make_planis the structured commit step. It accepts an ordered detector list, validates each name against the shipped KB (unknown names raiseValueError), overlays per-detector params with engine contamination resolution, and returns aDetectionPlandict withdetector_choice,params, and ajustificationsfield. The returned plan is the same shape consumed byADEngine.build_detector/run, so an agent that usesget_kb_for_routing+make_planlands in the same downstream code path as the rule router.Surface 2:
plan_detection(..., llm_client=callable)llm_clientis aProtocol(LLMCallable: (prompt: str) -> str) rather than a provider-specific adapter; PyOD ships no SDK glue. Users wrap their own Anthropic / OpenAI / vLLM / self-hosted SDK. The engine builds the routing prompt internally viapyod.utils._llm.build_routing_prompt(kb_context, top_k)and parses the response viapyod.utils._llm.parse_routing_response(response, kb, top_k). The parser tolerates surrounding prose and markdown fences, skips unknown detector names with a logged warning, dedupes, and truncates totop_k; raisesRoutingParseErrorif no JSON array is extractable or no valid detector survives KB validation.Failure handling is tunable per call. Default behavior is fall back to rule-driven routing with a
RuntimeWarningon either LLM-call failure or parse failure. Explicitllm_strict=Truere-raises instead; explicitllm_strict=Falsealways falls back;llm_strict=None(default) defers to thePYOD3_LLM_STRICTenv var. This three-way precedence replaces the env-only switch added during Round 1 review, which was process-global and incorrect for concurrent callers in the same process.top_kgeneralizationplan_detection(..., top_k=3)exposes the previously hard-codedvalid[1:3]alternatives slice as a parameter. Default 3 preserves v3.5.2 behavior; values < 1 are clamped to 1; the alternatives slice in the returned plan respects the newtop_k. The newtop_k,llm_client, andllm_strictparameters are keyword-only via a*separator in theplan_detectionsignature (added in Round 2 review).Round 1 reviewer fixes (Codex via
/implement-review auto)(a) High:
_plan_via_llmdid not enforce constrained KB context after parsing. Previously the LLM path validated returned detector names only against the global KB and could bypass hardconstraints.exclude_detectorsconstraints. Fix: if the LLM returns a detector excluded byconstraints.exclude_detectorsor filtered bydata_type_strict, the engine raisesRoutingParseErrorand falls back to rule routing with aRuntimeWarning.(b) Medium:
get_kb_for_routingsorted non-tabular modalities alphabetically. The legacy sort key was{modality}.title() + '_overall', which does not match the KB's actual rank fields for any non-tabular modality. Fix:_MODALITY_RANK_KEYStable mapstime_series→TSB_AD_overall/TSB_AD_overall_iforest,graph→BOND_deep/BOND_overall,text→NLP_ADBench_overall,image→MVTec_overall, withADBench_overallas the universal fallback for every modality.(c) Medium: new per-call kwarg
plan_detection(..., llm_strict: bool | None = None). Adds a per-call alternative to the process-globalPYOD3_LLM_STRICTenv var, with the three-way precedence described above.Round 2 reviewer fixes (Codex via
/implement-review auto)(d) Medium: keyword-only signature.
plan_detection's newtop_k,llm_client, andllm_strictparameters are now actually keyword-only via a*separator before them in the signature, matching the release-notes contract.(e) Medium:
get_kb_for_routingnow stamps each returned detector entry withresolved_rankandresolved_rank_keyfields carrying the modality-specific benchmark rank it used for sorting.build_routing_promptreads those fields so the LLM-facing prompt renders e.g.rank=10 (TSB_AD_overall)for time-series detectors, instead of the emptyrank=it would otherwise render under the corrected modality rank-key table.API compatibility
Every v3.5.2 caller pattern produces identical output:
plan_detection(profile)— unchangedplan_detection(profile, priority=...)— unchangedplan_detection(profile, constraints=...)— unchangedThe new
top_k,llm_client, andllm_strictparameters are keyword-only with backward-compatible defaults.LLMCallableis aProtocol, not an inheritance-required base class. No breaking API changes.Out of scope for v3.5.3:
routing_rules.jsonrule authoring (rules remain the offline fallback).top_k(caller decides).LLMCallableprotocol is the integration point).llm_client.Tests
pyod/test/test_kb_router_surface1.pycovering schema, filters, ordering, KB validation,top_kclamping, stub LLM client canned plan,top_ktruncation of LLM response, malformed response fallback,PYOD3_LLM_STRICT=1re-raise, prose tolerance, markdown-fence tolerance, dedupe, and bare-string entries.time_seriesandgraph, and the three-wayllm_strictprecedence (True / False / None).time_series, and the text-modality fallback path when the KB has no rank data.All 205 existing
ADEnginetests continue to pass. Total new test count this release: 53.Files touched
pyod/utils/ad_engine.py—get_kb_for_routing,make_plan,plan_detectionkeyword-only params (top_k,llm_client,llm_strict),_plan_via_llmhelper,_MODALITY_RANK_KEYS,resolved_rank/resolved_rank_keystamping.pyod/utils/_llm.py(new) —LLMCallableProtocol,RoutingParseError,build_routing_prompt(kb_context, top_k),parse_routing_response(response, kb, top_k).pyod/test/test_kb_router_surface1.py(new) — 44 Surface-1 / Surface-2 tests; 6 + 3 regression tests for Round 1 + Round 2 fixes are co-located in the same file.pyod/version.py— bumped to 3.5.3.CHANGES.txt— v3.5.3 entry.Test plan
TestFastABOD/TestKnnNearestNeighborsConfig/TestLUNARNearestNeighborsConfig/TestSODNearestNeighborsConfigreproduce on a clean tree without these changes and are unrelated.framework_on_kb) thatADEngine.get_kb_for_routing+ADEngine.make_planround-trip produces the sameDetectionPlanas the harness fallback path on the 5-task ADBench pilot.