Feat/phase 13 nemotron retrieval#14
Conversation
…motron support Create core/embedding_provider.py with a unified interface for embedding generation, supporting two backends: LocalEmbedder (default, 384-dim): - Uses sentence-transformers all-MiniLM-L6-v2 via ChromaDB - No API key required, runs offline after model download - HF_HUB_OFFLINE=1 for offline operation NemotronEmbedder (2048-dim, requires NVIDIA_API_KEY): - Uses NVIDIA Nemotron embed API (nemotron-embed-4b-v1) - Better semantic understanding for legal text - API health check before use - Graceful fallback to local on failure NemotronChromaEmbedFn: - Adapter wrapping NemotronEmbedder for ChromaDB's embedding API - Enables drop-in replacement of SentenceTransformerEmbeddingFunction Factory function: - get_embedding_provider() reads HECTOR_EMBEDDING_PROVIDER env var - Falls back to local if Nemotron unavailable - Supports: 'local' | 'nemotron' Configuration: - HECTOR_EMBEDDING_PROVIDER: 'local' | 'nemotron' - HECTOR_NEMOTRON_EMBED_MODEL: model ID - HECTOR_NEMOTRON_API_KEY: NVIDIA API key - HECTOR_EMBEDDING_DIM: 384 (local) | 2048 (Nemotron)
…ron support
Create core/rerank_provider.py with a unified interface for
document reranking, supporting two backends:
LocalReranker (default):
- Uses cross-encoder/ms-marco-MiniLM-L-6-v2 via sentence-transformers
- Sigmoid normalization of raw scores to 0-1 range
- No API key required, runs offline after model download
NemotronReranker (requires NVIDIA_API_KEY):
- Uses NVIDIA Nemotron rerank API (nemotron-rerank-v1)
- Better semantic understanding for legal text ranking
- API health check before use
- Handles both rankings[] and scores[] response formats
- Graceful fallback to local on failure
Both providers:
- Accept (query, documents) pairs
- Add 'reranker_score' to each document dict
- Sort by score descending
- Append reason string ('cross-encoder-reranked' or 'nemotron-reranked')
Factory function:
- get_rerank_provider() reads HECTOR_RERANK_PROVIDER env var
- Falls back to local if Nemotron unavailable
- Supports: 'local' | 'nemotron'
Configuration:
- HECTOR_RERANK_PROVIDER: 'local' | 'nemotron'
- HECTOR_NEMOTRON_RERANK_MODEL: model ID
- HECTOR_NEMOTRON_API_KEY: NVIDIA API key
…ver and ingestor Modify data/hybrid_retriever.py: - Import core.embedding_provider and core.rerank_provider - _get_embedding_function(): check HECTOR_EMBEDDING_PROVIDER env var * If 'nemotron', use NemotronEmbedder via provider abstraction * Fall back to local SentenceTransformerEmbeddingFunction - _rerank_with_cross_encoder(): check HECTOR_RERANK_PROVIDER env var * If 'nemotron', use NemotronReranker via provider abstraction * Fall back to local CrossEncoder Modify utils/enhanced_ingestor.py: - __init__(): try core.embedding_provider.get_embedding_provider() * Respects HECTOR_EMBEDDING_PROVIDER env var * Falls back to local SentenceTransformerEmbeddingFunction on error Both components now seamlessly switch between local and Nemotron backends based on environment variables, with automatic fallback to local models when the Nemotron API is unavailable.
Tests for embedding and rerank provider abstractions validating: Embedding provider: - LocalEmbedder defaults to all-MiniLM-L6-v2 with 384d dimension - NemotronEmbedder defaults to nemotron-embed-4b-v1 with 2048d - Factory returns LocalEmbedder by default and on fallback - Factory returns NemotronEmbedder when NVIDIA_API_KEY is set - Factory falls back to local when Nemotron API is unreachable - NemotronChromaEmbedFn adapter wraps NemotronEmbedder correctly Rerank provider: - LocalReranker defaults to ms-marco-MiniLM-L-6-v2 - Sigmoid normalization produces bounded [0,1] values - Empty document list returns empty result for both providers - Factory returns LocalReranker by default and on fallback - Factory returns NemotronReranker when NVIDIA_API_KEY is set - Factory falls back to local when Nemotron API is unreachable Integration: - get_embedding_provider and get_rerank_provider are callable - hybrid_retriever and enhanced_ingestor modules load correctly All tests use mocked API calls and explicit env var control to avoid interference from concurrent test execution (933 tests total).
…afety Problem: Switching between local (384d) and Nemotron (2048d) embeddings on the same ChromaDB collection causes dimension mismatch errors. ChromaDB collections are bound to a specific embedding dimension at creation time. Solution: Both the ingestor and hybrid retriever now append the provider name to the collection name when a non-local provider is configured: local → indian_law_bns nemotron → indian_law_bns_nemotron This ensures local and Nemotron embeddings never share a collection, preventing dimension conflicts while remaining backward-compatible with existing 384d databases. Modified files: - utils/enhanced_ingestor.py: collection name includes provider suffix - data/hybrid_retriever.py: collection name includes provider suffix Env vars: - HECTOR_EMBEDDING_PROVIDER: 'local' (default) or 'nemotron' - HECTOR_EMBEDDING_DIM: reserved for future explicit dimension control
|
Warning Review limit reached
More reviews will be available in 32 minutes and 6 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
No description provided.