docs: design proposal for adapter lifecycle (#929) — review channel, do not merge#1080
docs: design proposal for adapter lifecycle (#929) — review channel, do not merge#1080planetf1 wants to merge 27 commits into
Conversation
…-computing#929) Draft proposal for agreeing the end-state shape of the intrinsic adapter refactor covered by Epic generative-computing#929. Part I covers problem / goals / terminology / rough end result for high-level review; Part II contains supporting detail (reality breakdown, full thread mapping, observability, docs/tests/tutorials, migration phases, open questions). Branch intended as a preservation point while the design is circulated; not a PR candidate yet. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- ASCII side-by-side of the three "where weights live" realities - Mermaid sequence diagram for the adapter_scope lifecycle - Mermaid tree for the intrinsic.* span hierarchy No content change; visuals make the key shape-agreement points scannable. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ve-computing#929) The ASCII three-realities comparison had a one-column drift on the Reality C top border. Mermaid flowchart with three subgraphs makes the layout self-aligning and more readable. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ing#929) Reword seven instances of "lands" / "landing" to more precise alternatives (merges, ships, is agreed, is added, include). No semantic change. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Title and prose shift to "adapter" as primary term; "intrinsic" noted as legacy vocabulary with a rename-scope decision in Part I §5 Q6. - Reality B clarified: weights ship in the same HF repo as the base model (not "baked in"); both LoRA and aLoRA supported via the technology field on each embedded adapter. - Reality C reframed: OpenAI-compatible support is already kept for embedded (PR generative-computing#881); the gap is non-embedded server-mediated, which may or may not re-land in 0.6.0. - Add §10.1 "Why adapters need bespoke observability" with four concrete failure modes, two already seen in production. - Goals elevate customer-built adapters as a first-class target. - Tests section adds an optional per-adapter qualitative effectiveness suite (opt-in via @pytest.mark.qualitative). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…tive-computing#929) Replace the generic "qualitative effectiveness suite" description with concrete tools already available: - TestBasedEval (in-tree Mellea component, JSON + LLM-as-judge, has CLI via `m eval run`) as the default per-adapter mechanism. - BenchDrift (IBM/BenchDrift) as an optional variation-testing extension for adapters where phrasing-invariance matters (answerability, context-relevance, requirement-check, Guardian). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Prominent linked header to Epic generative-computing#929 + reference table of all related issues/PRs at the top of the doc so context is immediate on share. - §1: add evidence table of seven recent fix-up commits in the adapter area to demonstrate the friction is real. - §2: rewrite customer-adapter goal to distinguish current partial state (m alora train exists; consuming needs hacks; generative-computing#424 open) from the refactor's first-class intent. - §4.1: new backend × reality matrix showing Reality A/B/C support today and planned (with generative-computing#1018 for HF + embedded, generative-computing#27 for vLLM). - New §6 "Impact and blast radius": API surface, user-archetype impact table, code reach, release planning, blocking/unblocking, performance notes, risk register. - §6.3: fix vLLM/aLoRA factual error — upstream vLLM never had aLoRA; we ran a custom vLLM build; PR generative-computing#543 was triggered by upstream declining the aLoRA PR, not by vLLM "dropping" support. Cite generative-computing#27 as the live tracking item. - §5 reviewer questions: drop telemetry coupling as a reviewer question (move to implementation note); reframe Reality C question around generative-computing#27 upstream status rather than C1-vs-C2 choice. - §11: acknowledge TestBasedEval and BenchDrift are part of the broader LLM-unit-testing conversation, not invented for this refactor. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…puting#929) - Break up top-of-doc banner into visibly separated lines (blank blockquote rows) so fields don't run together on render. - Move the referenced-issues table and sequencing note out of the top (too much noise before the reader gets any narrative) into a late "Appendix — Referenced issues and PRs" section with sub-tables for tracking items, rework evidence, and related work. - §4.1 backend matrix expanded to cover all five Mellea backends: LocalHFBackend, OpenAIBackend, OllamaBackend, WatsonxBackend, LiteLLMBackend. Only the first two are in scope for adapter support under this design; the others are out-of-scope with a stated reason. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Reorder to lead readers gently from context → problem → goals → shape → decisions → impact before descending into detail. No content lost; sections rearranged so reviewers can stop at any point and still have learned something useful. Part I changes: - §1 evidence table compressed to one sentence + appendix pointer - §2 goals compressed from 7 detailed bullets to 4 core outcomes - §3 Terminology replaced with a 4-term "key terms" list; full glossary moved to Part II §7 - §4 end result trimmed: keeps the composition box, the three-reality flowchart, and the one-line simplified flow. Binding verb table and lifecycle sequence diagram moved to Part II §9. Full backend matrix moved to Part II §10. - §4.1 is now a one-paragraph summary with a §10 pointer Part II changes: - New §7 Terminology (full glossary moved from Part I §3) - §8 Three realities (subsection numbers corrected) - New §9 End-state design detail (binding table + sequence diagram) - New §10 Backend × reality matrix (full version) - §11-§17 are the former §7-§13 with renumbered intra-doc refs Appendix unchanged in content; now reached after a narrative that builds up to it. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ease version (generative-computing#929) - Remove three re-emergences of the "lands"/"landed" status vocabulary (§1, §6 API surface, appendix sequencing note). - §17 open-question 4 now back-references §5 Q3 rather than duplicating it with slightly different framing. Reviewers see one question about Reality C, not two. - §6 release planning no longer version-pins ("0.6.0 target") since §16 explicitly says the detail is deferred. Uses "target release (minor, exact number TBD)" / "follow-on minor release" instead. Review feedback captured from an independent review pass. Further should-fix items (§11 overlap with §1, §4 density, ⏳ symbol conflation of different blocking types) left for the author's judgement. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- §4 simplified-invocation pseudocode moved to Part II §9.1; Part I §4 now stays at the shape-agreement level without a code block. - §9 sub-numbering adjusted: §9.1 invocation, §9.2 verbs, §9.3 sequence. - §11 rewritten to name the structural cause (backend-keyed dispatch) rather than re-enumerate the three symptoms already listed in §1. - §10 backend matrix uses distinct ⏳🔽 (blocked by this proposal) and ⏳🔼 (blocked upstream) symbols so reviewers see the difference at a glance; legend updated accordingly. - §5 Q5 split into independent Q5a/Q5b/Q5c so each rename level can be agreed or declined on its own. - §6 "Optional, was mandatory — the adapter catalogue" expanded with a concrete clause explaining what changes for callers. - §17 item 5 now includes a one-sentence gloss of what rewind means before raising the placement question. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
generative-computing#929) - Add "Proposal at a glance" summary at top of Part I so the end-state and five gate decisions are visible immediately without reading the full problem/goals preamble (Paul's readability concern) - Q2: record Jacob's position — auto-load yes, auto-unload no; note LocalHFBackend is primarily single-user/local so multi-tenancy concern is lower priority - Q3: update Reality C framing — vLLM path confirmed blocked by Paul; recommendation is to design the slot, leave it empty, not invest in stubs - Q5: surface Jacob's competing model ("Intrinsic" must stay as IBM term; "adapter" = backend artefact, "intrinsic" = user abstraction) and Paul's "create new, deprecate old" preference; present as open question requiring alignment rather than three independent yes/no flags - Add backend consumption detail: clarify that the weights binding (not the backend) owns activation logic — EmbeddedBinding renders controls, LocalFileBinding calls PEFT load_adapter; backend calls binding.activate() uniformly (Jacob's missing detail) - Note granite-common / granite-formatters boundary constraint on io_contract.build_prompt() and io_contract.parse() (Jacob) - Flag io_contract naming as open discussion (Jacob prefers io_config) - Update §8.3 Reality C: vLLM status confirmed blocked, not "deferred" - Update §10 LocalHFBackend: note individual/local use context (Paul) Assisted-by: Claude Code
…erative-computing#929) Moves docs/dev/issue-929-adapter-design-proposal.md to docs/dev/proposals/929-adapter-lifecycle.md to introduce a proposals/ subdir convention and to make the file's draft/review status legible from its path. The proposal itself is unchanged by this commit. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
|
||
| | Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | | ||
| | ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | | ||
| | `LocalHFBackend` | ✅ today | ⏳🔽 [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. Primarily used for individual/local deployments rather than multi-tenant environments (Paul). | |
There was a problem hiding this comment.
From @psschwei in Slack:
on adapter lifecycle: my understanding is the huggingface backends are more for individual/local use (or at least that's what I've gather from others), so I'm not sure they would be used in a multi-tenant environment
Reflected in this row's note: "Primarily used for individual/local deployments rather than multi-tenant environments (Paul)".
There was a problem hiding this comment.
@psschwei — captured in §10 LocalHFBackend row note. Suggest marking resolved when you're happy with the wording.
|
|
||
| 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. | ||
| 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). | ||
| 3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. |
There was a problem hiding this comment.
From @psschwei in Slack:
on vllm: vllm rejected upstreaming aloras so I don't believe there's a path there anymore
Captured here (Q3 "Position received (Paul)") and in §8.3 ("Current status (confirmed by Paul)").
There was a problem hiding this comment.
@psschwei — captured in Q3 body and §8.3. Resolve when you've confirmed the framing reads correctly.
| 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. | ||
| 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). | ||
| 3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. | ||
| 4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. |
There was a problem hiding this comment.
From @psschwei in Slack:
on deprecation window: assume one minor release == 4-6 weeks, I think that's enough time to aim for (can always extend if necessary)
The TL;DR table at the top of Part I records this lean ("Lean: 1 minor release (~4–6 weeks)"). The §5 Q4 body itself still reads "default; confirm".
There was a problem hiding this comment.
@psschwei — Q4 body updated in 2d570f87: now carries your 4–6 week framing as one of two received positions, plus a sub-question on whether breakage can be avoided entirely (under the current Q5 lean, IntrinsicAdapter could stay as a re-export of Adapter, removing the deprecation pressure). Resolve if the framing matches your intent, or push back on the sub-question.
| 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). | ||
| 3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. | ||
| 4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. | ||
| 5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: |
There was a problem hiding this comment.
From @psschwei in Slack:
on renaming: my understanding is "adapters" is the new name and that's what we should migrate to. Rather than renaming, can always create new and deprecate the old.
Captured here as one of three competing positions in Q5 (alongside the proposal's rename and Jake's "keep Intrinsic" stance). cc @jakelorocco.
There was a problem hiding this comment.
@psschwei — Q5 labels updated in 2d570f87: "This proposal's model" → "Original proposal model"; "Jacob's model" → "Current lean (Jacob's framing)". Your "create new + deprecate old" position is still listed as the third option. The proposal now leans toward Jake's split (intrinsic = user-facing, adapter = backend artefact); your alternative remains open. Worth your view on whether Jake's split with deprecation shims for old types covers your "create new, deprecate old" intent — they may converge.
There was a problem hiding this comment.
Heads-up: Q5 has been restructured from three positions to two endpoints (commit 7bfbbbba). Your "create new alongside, deprecate old" position is preserved as a named mechanism variant under "Adapter as the user-facing term going forward" — alongside the original proposal's "rename with shims" as the other mechanism. The fork now reads cleanly as: same endpoint as the original proposal (mechanism is a sub-decision) vs Jake's split (different endpoint). Nothing dropped from your input.
|
|
||
| # Part I — Summary for agreement | ||
|
|
||
| ## Proposal at a glance |
There was a problem hiding this comment.
From @psschwei in Slack:
also, just a general comment: for me at least, this doc was very difficult to follow. I think putting the actual proposal up front (it was buried pretty far down in the doc) and cutting out some of the filler would make it much easier to understand.
The "Proposal at a glance" summary at the top of Part I addresses the "buried" complaint.
There was a problem hiding this comment.
@psschwei — the "Proposal at a glance" summary at the top of Part I (5-row decision-status table + 3-sentence framing) was added specifically to address the "buried" complaint. Filler-trim hasn't happened beyond that. If sections still feel padded, flag and they'll get cut.
|
|
||
| Full glossary (identity, I/O contract, weights binding, role, qualified name, catalog, io.yaml) is in §7 — needed only when you descend into the detail. | ||
|
|
||
| ## 4. Rough end result |
There was a problem hiding this comment.
@jakelorocco's commit-comment review — the §4 portion (originally on 8d9de33#r184732700):
I think one key component about the Adapter definition is that adapters may update their weights and/or their expected output schemas. We need to be able to respond to that with our top level helper functions like check_answerability.
I would also like to make sure we have sane defaults for the Adapter class. Identity is closely tied to the io_contract which is closely tied to the weights. We should make sure that when you download an adapter for a model from HuggingFace Hub, by default you get the correct io_contract (ie the io.yaml from the same repo). In the identity part of the adapter, what is schema versioning? This is almost certainly directly tied to the output parsing of the io_contract.
I don't like the
io_contractname. I would maybe sayio_config.
Status in the current draft:
io_contract→io_confignaming: flagged in §7 glossary as under discussion.- Schema versioning definition: not yet glossed in §7.
- Weight/schema update handling: not yet captured.
- Sane HF defaults (download → matching
io.yaml): not yet captured.
There was a problem hiding this comment.
@jakelorocco — §4 cluster mostly addressed by recent commits:
io_contract→io_confignaming: still flagged in §7 glossary as under discussion (no change).- Schema versioning: defined in §7 (
93aa780e), then narrowed in19f99cc9— see "Second thoughts" comment on the §7 row (#discussion_r3247762196). HF commit SHA is now framed as the version mechanism;schema_versiononly earns its keep for breaking schema changes. - Weight/schema updates: §2.2 + §9 + §17 Q9 in
19f99cc9cover this around HF commits. - Sane HF defaults: §4 + §13 in
067e31af.
Two granite-common questions still open in §17 Q8 — worth your or the granite team's input.
…enerative-computing#929) Five contradictions / drift fixes plus the matching §13 comment cleanup, all from feedback round 3 (Jake commit comment + Paul Slack): - Status callout (line 5): drop "not a PR candidate" framing now that the Draft PR is the review channel; minimal "proposal" wording. - Terminology preamble (line 9): proposal's working position is now Jake's split — Intrinsic = user-facing term, Adapter = backend artefact. Drops the rename framing. - §3 Key terms: split single Adapter glossary entry into Intrinsic + Adapter, matching the new lean. - §5 Q5 labels: "This proposal's model" → "Original proposal model"; "Jacob's model" → "Current lean (Jacob's framing)". Positions and ordering unchanged. The PR thread continues to resolve Q5. - §5 Q4 body: add Working position (Jake's at-least-one-release framing with Paul's 4-6 week unit) and a breakage sub-question coupled to Q5. - §13 example comments: distinguish the adapter (artefact) from the intrinsic (capability) cleanly; code unchanged. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ative-computing#929) Jake req 1 from his commit-comment review (8d9de33#r184732700): - §7 glossary: add Schema version row, flagged as a proposed addition to io.yaml that requires granite-common / granite-formatters team agreement (they own the io.yaml format). - §17 open questions: new Q8 surfacing the cross-team negotiation — worth suggesting to that team, or do they have another approach? Plus a bundled cleanup of two §7 entries the previous round of contradiction fixes missed: - §7 Adapter row: rewrite to match the new Q5 lean (adapter = backend artefact, not user-facing term). - §7 Intrinsic row: rewrite as user-facing capability, no longer "legacy" terminology. - Order swapped so §7 matches §3 (Intrinsic first, Adapter second). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…#929) Jake req 2 from his commit-comment review: identity, io_contract, and weights are tightly coupled, so when an adapter's weights come from a HuggingFace repo, the io_contract should default to the io.yaml in that same repo — callers shouldn't have to pass io_contract= explicitly in the common case. - §4: add "Sane defaults" callout after the Adapter composition diagram. - §13: update the custom-intrinsic example to show the default path (no explicit io_contract); the override path is shown as a commented-out variant. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…n narrowed (generative-computing#929) Jake req 3 from his commit-comment review: adapters may update their weights and/or output schemas; helpers must respond. Reframed around HF commit SHA as the existing version mechanism, rather than inventing a Mellea-side concept: - §2.2 (Safe evolution): weights versioned by HF commit SHA; output schemas stable in the common case; rare breaking schema change handled by pinning, schema_version dispatch, or helpers raising on mismatch. - §7 Schema version glossary: narrowed to "parser-dispatch field for breaking schema changes only"; flags that granite-common may already have a versioning mechanism we should reuse. - §9 callout: weight updates use HF commit SHA; prepare() resolves configured revision and refreshes cache; refresh policy open. - §17 Q9: weight-refresh policy as a focused open question (per-session refresh + long-running-process exception). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
| | **Intrinsic** | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the `Intrinsic` AST component, input/output parsing. Backed by an adapter. The name is kept as IBM's terminology — current Q5 lean (see Part I §5). | | ||
| | **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | | ||
| | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | | ||
| | **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q8) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | |
There was a problem hiding this comment.
Second thoughts on schema_version. Picking up on @jakelorocco's "what is schema versioning?" question from his commit-comment review.
Stepping back: HF commits already version everything in the repo (weights + io.yaml together), so for the common case (new weights, same output schema) the version mechanism exists today. schema_version would only earn its keep when the granite team ships a breaking schema change (different output keys, nesting, or types) and unpinned Mellea callers want graceful parser dispatch instead of a hard fail.
Commit 19f99cc narrows the §7 entry and §17 Q8 to that scope.
Two questions for the granite team:
- Does granite-common already have a versioning mechanism for
io.yamlwe should reuse, rather than Mellea inventingschema_version? - How often do you expect breaking schema changes? If they're rare, "fail loud and let the user pin to a known revision" (Jake req 4 — raise on schema mismatch) may be a simpler safety net than a parser-dispatch mechanism.
Jake req 4 from his commit-comment review: top-level helpers (check_answerability etc.) should raise an exception when the adapter's output structure doesn't match the helper's expectations. - §13: add "Validation on parse" paragraph naming AdapterSchemaMismatchError as the exception, with name + observed keys + expected keys in the message. - §14.1 silent-schema-drift bullet: tie the existing parse_failures counter to the new exception — counter is the dashboard signal, exception is the runtime signal. Closes the silent-drift loophole that PR generative-computing#1008's requirement_check_to_bool regression illustrated. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
) Jake req 5 from his commit-comment review: auto-loaded adapters should pin to a known-good revision rather than blindly tracking upstream's default branch. - §17 Q10: pinning policy as a new open question; recommendation is pin by default via catalogue entries' recorded SHAs, with revision="main" as explicit opt-in to track latest. - §13: catalogue example comment now spells out that the catalogue entry carries a pinned HF commit SHA, with the override pattern. Closes the loop on auto-loading reproducibility — couples cleanly with Q9 weight-refresh policy (pinned adapters skip the refresh check). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…erative-computing#929) Six drift fixes from a coherence review of the doc + one banned-word correction. All flow from the Jake req 1 schema_version reframing (commit 19f99cc) which left several places still asserting schema_version dispatch as if it were a settled mechanism. - §4 composition diagram: schema version field now qualified as "(proposed; see §7 / §17 Q8)". - §12 row 4c: rewritten to layered framing — HF commit SHA is the version; breaking schema changes handled by pinning + Jake req 4; schema_version dispatch optional layer. - §14.2 metrics text: "without a version bump" rephrased as "at a new HF revision that the local parser doesn't yet handle"; ties parse_failures counter to AdapterSchemaMismatchError. - §6 Risk row: rewritten to layered framing matching §12 row 4c; worked example reframed around AdapterSchemaMismatchError. - §14.2 spans diagram: parse span attributes now show "revision, schema_version*" with asterisk denoting proposed. - §15 tutorial title: "Shipping a new schema version" → "Handling a breaking schema change without breaking users"; body lists the three layers. - §5 Q4 body: pre-existing "can this land" → "can this ship" (banned-word cleanup). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
generative-computing#929) Targeted patches to bring §16 in line with the work added in commits 93aa780 through 3bd7dd5: - Phase 0: type list flagged as Q5-conditional (user-facing Intrinsic class only if Jake's split is adopted); catalogue entries gain pinned HF revision SHAs (Jake req 5; Q10). - Phase 1: helpers gain output validation raising AdapterSchemaMismatchError (Jake req 4). - Phase 2: bindings implement the four-verb set; LocalFileBinding.prepare resolves HF revision per Q9 weight-refresh policy. Phase 3 and Phase 4 unchanged. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…ng#929) Final-review pass caught residual "version" ambiguity from the schema_version reframe. Under the layered model (HF commit SHA = the unambiguous version mechanism; schema_version = proposed parser-dispatch field for breaking schema changes only; AdapterSchemaMismatchError = loud safety net), the doc now uses "revision" consistently for the HF commit SHA and reserves "schema_version" for its narrowed proposed purpose. Touched in this commit: - §12 row 3: identity tuple uses `revision` not `version`. - §12 row 4a: parse() validates per Jake req 4; schema_version dispatch reframed as optional (Q8-conditional) layer. - §12 row 6: catalog cleanup row reframed around the v1→v2 PR generative-computing#1008 worked example using Jake req 4 + optional schema_version dispatch. - §14.1 silent-schema-drift bullet: parse_failures counter labelled by (name, revision), matching §14.2. - §14.2 root span attributes: revision (not version). - §14.2 standard attributes list: intrinsic.revision (not intrinsic.version). - §14.2 invocations counter: labelled by name, revision, ... Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…rative-computing#929) After this commit, only 3 questions are genuinely open: Part I §5 Q1 (shape agreement), Part I §5 Q5 (terminology rename scope), and §17 Q4 (schema_version cross-team with granite-common). Everything else has either explicit reviewer agreement or is a working position reviewers can push back on. Part I §5: - Q2 Lifecycle: promoted "Lean" → "Resolved (Jake)" — auto-load yes, auto-unload no. Body rewritten to lead with the resolution. - Q3 Reality C: promoted "Lean" → "Resolved (Paul)" — design slot, leave empty; vLLM blocked confirmed. - Q4 Deprecation: promoted "Lean" → "Resolved (Paul, Jake)" — 1 minor release ≈ 4–6 weeks, longer if user impact warrants. Sub-question on breakage-free shipping kept open. - TL;DR table updated to match. §17 (now "Open questions and implementation positions"): - Removed redundant pointer rows (old Q2, Q4, Q6, Q7 — all back-refs to Part I §5). - Old Q1 (WeightsBinding name) → new Q1 [Position]. - Old Q3 (role vs name) → new Q2 [Position] — free-form string with advisory known-roles registry; pure enum rejected. - Old Q5 (rewind interaction) → new Q3 [Resolved] — Jake confirmed on PR generative-computing#1080 the helpers placement is reasonable. - Old Q8 (schema_version cross-team) → new Q4 [Open] — only genuinely open §17 item. - Old Q9 (weight-refresh policy) → new Q5 [Position] — per-session-start + explicit refresh() for long-running. - Old Q10 (pinning default) → new Q6 [Position] — pin by default. - Body cross-refs throughout updated for renumbering. Each Resolved/Position entry now cites who agreed (where applicable) so reviewers see what's been discussed vs unilaterally asserted. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
) §6 and §16 Phase 4 previously asserted "shimmed for one release" / "shim removal after one minor release" unconditionally. But under Q5's current lean (Jake's split — Intrinsic stays as the user-facing class, Adapter is the backend artefact), the old types can stay as re-exports indefinitely with no deprecation needed at all. Two small additions surface this coupling: - §6 API surface: the deprecated-but-shimmed bullet now notes it applies only if Q5 settles on rename or new-API-alongside. - §16 Phase 4: shim removal now flagged as skipped if Q5 settles on Jake's split. Q4 itself remains Resolved (≥1 minor release if there is a deprecation); the new framing just makes clear that whether there's deprecation at all is Q5-conditional. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The original three Q5 positions (Original proposal / Jake's split / Paul's preference) effectively split into two endpoints: - Original and Paul share the same end-state (Adapter as the user-facing term). They differ only on transition mechanism — rename with deprecation shims (Original) vs create-new-alongside with gradual deprecation (Paul). - Jake's split is the only structurally different endpoint (both names survive permanently with distinct meanings). Q5 now reads as a binary endpoint choice with a sub-decision on transition mechanism if the first endpoint wins. This makes the actual fork visible to reviewers instead of asking them to weigh in on three near-equivalents. - TL;DR row Q5 status: "two endpoints; mechanism sub-decision" - §5 Q5 body: restructured from three bullets to two endpoints + an inline mechanism sub-decision under endpoint 1. No reviewer feedback dropped — Original's framing and Paul's framing are both preserved as named mechanism variants. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…#1028 work into this epic (generative-computing#929) PR generative-computing#1028 (akihikokuroda — implementing issue generative-computing#1003) was closed 2026-05-15 in favour of folding the work into this epic. What's preserved from generative-computing#1003 / PR generative-computing#1028: - Phase 1 absorbs the helper signature work: model_options= on all top-level helpers; documents= keyword-only on factuality_detection / factuality_correction. - The contested portion of PR generative-computing#1028 (manual-Message document detection via _resolve_response scanning Message._docs when documents=None) is now an explicit Phase 1 design call as §17 Q3 sub-question (b) — "scan context for Messages with _docs" vs (a) "require explicit documents=". Doc edits: - §4, §6 (twice), §13: reframe model_options= as folded in from generative-computing#1003, not arriving on top from a separate PR. Add documents= alongside. - §16 Phase 1: explicitly name the generative-computing#1003 helper signature work. - §17 Q3: restructure into two sub-parts — rewind placement (Resolved per Jake on PR generative-computing#1080) and manual-Message detection (Open). - Appendix: update generative-computing#1003 row, add a row for closed PR generative-computing#1028. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
jakelorocco
left a comment
There was a problem hiding this comment.
@planetf1, I believe I've addressed all the open questions. If I haven't please let me know, and I can work towards unblocking you. Thank you!
| - **"Adapter" as the user-facing term going forward.** "Adapter" replaces "Intrinsic" in user-facing code. *Transition mechanism is a sub-decision:* | ||
| - *Rename with shims* (original proposal): rename `Intrinsic` AST class and module path to `Adapter` (or similar); old names aliased as deprecation shims for one release. | ||
| - *Create new alongside* (Paul's preference): stand up a new `Adapter` API beside the existing intrinsic API; old API gradually deprecated. Don't rename. | ||
| - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. |
There was a problem hiding this comment.
I was dealing with out-dated information when I made this comment. We are apparently no longer using the Intrinsic terminology but I haven't heard what it should be replaced with. In Mellea, we can probably use AdapterBasedComponent as a placeholder or leave Intrinsic until we get the final say on what to use.
I stand by my comments that we do need a split between the adapter and the thing that triggers / utilizes the adapter.
There was a problem hiding this comment.
Captured — Q5, preamble, §3, and glossary all updated to use AdapterBasedComponent as the agreed name for now. Rather than leaving the proposal open-ended on terminology, we're going with this and will update in a follow-up PR once the IBM name is settled. Is there anything we need to do to drive that decision, or is that conversation already happening elsewhere?
| with backend.adapter_scope(adapter): | ||
| raw = backend.generate(adapter.io_contract.build_prompt(...)) |
There was a problem hiding this comment.
I don't know if I agree with this change to calling some .build_prompt function. But as long as it returns an Intrinsic / Component, I guess it's fine.
There was a problem hiding this comment.
Noted — added a clarification that build_prompt() returns a Component-compatible object rather than a raw string, consistent with the rest of the pipeline.
| weights=EmbeddedBinding.from_base_model(backend)) | ||
| ``` | ||
|
|
||
| **Backend authors** keep `AdapterMixin` as the backend surface, but it exposes only the verbs a backend naturally has: `load_peft_adapter`, `unload_peft_adapter`, `render_controls`, `set_request_adapter`. Bindings call into these verbs. Adding a new reality = adding a new verb + new binding. |
There was a problem hiding this comment.
I believe these verbs are slightly different (ie more specific) than the ones listed above. But I agree with the sentiment.
There was a problem hiding this comment.
Agreed — noted. The binding verbs are the abstract interface; the AdapterMixin verbs are what bindings call into underneath.
| Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: | ||
|
|
||
| 1. **Masked errors.** The `obtain_lora`-always-called bug (#929 point 1b) showed users a misleading download error while the real cause (adapter-type mismatch) stayed invisible. A span at the `prepare` boundary recording the exception would have surfaced the actual cause on first run. | ||
| 2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise on schema mismatch), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, revision)` is the dashboard signal; the exception is the runtime signal. |
There was a problem hiding this comment.
I agree with this part. The code that gets called by the top-level helper functions like check_requirement needs no real versioning / understanding of what's going on. Only when we introduce these helpers does that matter. As a result, erroring here is the right spot to do it.
There was a problem hiding this comment.
Confirmed — already in §13 and §14.
| Two existing tools — already part of Mellea's broader LLM-unit-testing conversation rather than bespoke to this refactor — fit naturally here and should be preferred over ad-hoc harnesses: | ||
|
|
||
| - **`TestBasedEval`** (in-tree — `mellea/templates/prompts/default/TestBasedEval.jinja2`, documented at `docs.mellea.ai/how-to/unit-test-generative-code`) is Mellea's own LLM-as-judge component. Each adapter gets a JSON file of test cases (`{input, target, guidelines}`); a judge model returns `{"score": 0|1, "justification": "..."}`. Runnable from the CLI (`m eval run tests/eval_data/<adapter>.json --backend ollama --model granite4.1:3b`) so the same fixtures power both CI and interactive debugging. This is the default mechanism for per-adapter qualitative coverage. | ||
| - **BenchDrift** (`github.com/IBM/BenchDrift`) addresses a second failure mode: an adapter that works on its canonical phrasing but breaks on semantically-equivalent rephrasings. BenchDrift generates syntactic variations of each test case while preserving meaning, then scores consistency across variations. Worth applying to the adapters where phrasing-invariance is a real concern — answerability, context-relevance, requirement-check, and the Guardian family all qualify. Optional extension rather than baseline coverage, but enabling it per-adapter is a one-config-file step once the `TestBasedEval` fixtures exist. |
There was a problem hiding this comment.
I don't think we need to utilize these for our tests. These were things that were contributed to Mellea. I would start without these and then write up a general proposal for test improvement if we want to utilize them.
There was a problem hiding this comment.
Noted — recommendation removed. Goal stays; approach left open for a separate proposal if needed.
| - **`TestBasedEval`** (in-tree — `mellea/templates/prompts/default/TestBasedEval.jinja2`, documented at `docs.mellea.ai/how-to/unit-test-generative-code`) is Mellea's own LLM-as-judge component. Each adapter gets a JSON file of test cases (`{input, target, guidelines}`); a judge model returns `{"score": 0|1, "justification": "..."}`. Runnable from the CLI (`m eval run tests/eval_data/<adapter>.json --backend ollama --model granite4.1:3b`) so the same fixtures power both CI and interactive debugging. This is the default mechanism for per-adapter qualitative coverage. | ||
| - **BenchDrift** (`github.com/IBM/BenchDrift`) addresses a second failure mode: an adapter that works on its canonical phrasing but breaks on semantically-equivalent rephrasings. BenchDrift generates syntactic variations of each test case while preserving meaning, then scores consistency across variations. Worth applying to the adapters where phrasing-invariance is a real concern — answerability, context-relevance, requirement-check, and the Guardian family all qualify. Optional extension rather than baseline coverage, but enabling it per-adapter is a one-config-file step once the `TestBasedEval` fixtures exist. | ||
|
|
||
| Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fit in a reasonable nightly-CI budget. |
There was a problem hiding this comment.
We should not be worried about a reasonable nightly CI budget. We should test everything that might be relevant. Or at least add a latch for doing so.
There was a problem hiding this comment.
Agreed — gated with a marker, not omitted.
| These gate decomposition; everything else can live in sub-issues once these are agreed. | ||
|
|
||
| 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. | ||
| 2. **Adapter lifecycle — session-scoped, no auto-unload.** **Resolved (Jake):** auto-load yes, auto-unload no. Once activated, the adapter stays loaded; the caller or session teardown triggers explicit `release()`. The multi-tenancy concern is reduced because `LocalHFBackend` is primarily a single-user/local backend (see §10). Request-scoped lifecycle remains an opt-in for deployments that need per-call isolation. |
There was a problem hiding this comment.
I should clarify this. I think we auto-load the adapter but we still auto-activate / auto-deactivate it. But I agree with the rest of the decision.
There was a problem hiding this comment.
Captured — updated Q2 to reflect the two-level split: prepare/release session-scoped, activate/deactivate per-call. The sequence diagram in §9.3 was already correct; the Q2 text was the one that was wrong.
|
|
||
| These gate decomposition; everything else can live in sub-issues once these are agreed. | ||
|
|
||
| 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. |
There was a problem hiding this comment.
Yes. I believe this holds. In most cases, these things will all be colocated / the same, but allowing for divergence is good (and enables us to separate out how the weights get downloaded from their functionality, etc...).
There was a problem hiding this comment.
Thanks — captured as Resolved, with your note about divergence recorded in the Q1 body.
- Q1: mark Resolved — shape holds; divergence between identity/io_contract/weights is the point - Q2: reword to reflect two-level lifecycle (session-scoped prepare/release; per-call auto-activate/deactivate) - Q5: IBM retiring "Intrinsic"; adopt AdapterBasedComponent as agreed placeholder name; update preamble, §3, §7, Q4 sub-question - §9.1: clarify build_prompt() returns a Component-compatible object - §15: remove TestBasedEval/BenchDrift recommendation; goal stays, approach left open Assisted-by: Claude Code
|
@jakelorocco — thanks, all eight points are in. Part I's all Resolved. Two open questions left in §17 but neither blocks Phase 0, so I'd like to close both out rather than leave them open: Q3b (manual-Message detection): go with (b) — scan context for Q4 ( If you're good with those, next steps:
Once sub-issues are filed and the broad shape is agreed, I'd like to consider merging this PR for the record rather than deleting the doc — keeps the design rationale alongside the implementation as a reference. Is that ok? |
Misc PR
Type of PR
Description
What this is
A long-form design proposal for Epic #929 — "Fix Intrinsic Adapter Lifecycle & Consistency in Mellea." Single file at
docs/dev/proposals/929-adapter-lifecycle.md.Path proposal
The file is at
docs/dev/proposals/929-adapter-lifecycle.md. Theproposals/subdir is a proposed convention for design docs under review — easy to relocate before any eventual merge if maintainers prefer a different home.How to review
Feedback already incorporated
Two rounds of feedback are already folded into the current commit:
8d9de33#r184732700on the preservation branchFive additional Jake requirements + three internal contradictions are pending; they will arrive as follow-up commits visible on this PR.
What I'd like from reviewers
cc @jakelorocco @nrfulton @psschwei
Testing
Attribution