fix: only enforce token limit in direct LLM mode#307
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe token limit validation in the OpenAI router has been relocated to execute conditionally only for direct LLM model requests, moving it from a universal pre-flight check to logic nested within the direct-model handling branch, skipping validation for RAG-partitioned models. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move the token-limit check so it only runs in direct LLM mode. In RAG mode the pipeline trims chat history to
chat_history_depth(default 4) before calling the LLM, so the pre-trim check was falsely rejecting long conversations that would fit once truncated. Direct LLM mode still enforces the limit since no trimming happens there.