docs: self-hosted model output-trimming issue + proposed fix#2
docs: self-hosted model output-trimming issue + proposed fix#2Blackfrost-AI wants to merge 1 commit into
Conversation
… fix Smaller/local models driving Claude Code as a harness read the cooler's compact summary as data loss and abandon ctx_execute (falling back to raw Bash) — losing both the token savings and the guardrails. Document the root cause (hard-coded trim defaults in compactDefault / execute, plus no truncation signal in the response) and a proposed fix (env-configurable caps + a retrieval hint) to implement later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds ChangesSelf-hosted model trimming issue documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~4 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What
Adds
docs/SELF-HOSTED-MODELS.md— a documented known-issue + proposed fix. No code changes; this is a design note so we can come back and implement the fix.Why
When Context Cooler drives the Claude Code TUI as a harness for a self-hosted / small model, the model can read the cooler's compact summary as data loss and abandon
ctx_execute, falling back to rawBash. That loses both the token savings and the cooler's guardrails — observed live with a quantized local model (huihui-q8).Root cause (cited in the doc)
compactDefaultcuts arrays to 5 / objects to scalars (src/lib/filter.ts:197,202-204); non-JSON text is cut to 5000 chars (src/tools/execute.ts:170).ctx_search. Frontier models infer this; smaller models don't.Proposed fix (for a follow-up PR)
CTX_*convention (CTX_MAX_ARRAY_ITEMS,CTX_MAX_OBJECT_KEYS,CTX_MAX_TEXT_CHARS), defaults = current values → zero behavior change unless opted in.ctx_executeresponse when trimming occurs (truncated/retrieve_with) so any model knows the rest is recoverable.CTX_PROFILE=self-hostedto bump all three at once.Interim mitigation
Harness-side only: the operating profile now tells the model a trimmed summary means "the rest is on disk, use
ctx_search." That doesn't help operators who haven't tuned their profile — hence the code-side fix proposed here.🤖 Generated with Claude Code
Summary by CodeRabbit