Skip to content

Feat/phase 13 nemotron retrieval#14

Merged
DanielDeshmukh merged 5 commits into
mainfrom
feat/phase-13-nemotron-retrieval
Jun 28, 2026
Merged

Feat/phase 13 nemotron retrieval#14
DanielDeshmukh merged 5 commits into
mainfrom
feat/phase-13-nemotron-retrieval

Conversation

@DanielDeshmukh

Copy link
Copy Markdown
Owner

No description provided.

…motron support

Create core/embedding_provider.py with a unified interface for
embedding generation, supporting two backends:

LocalEmbedder (default, 384-dim):
- Uses sentence-transformers all-MiniLM-L6-v2 via ChromaDB
- No API key required, runs offline after model download
- HF_HUB_OFFLINE=1 for offline operation

NemotronEmbedder (2048-dim, requires NVIDIA_API_KEY):
- Uses NVIDIA Nemotron embed API (nemotron-embed-4b-v1)
- Better semantic understanding for legal text
- API health check before use
- Graceful fallback to local on failure

NemotronChromaEmbedFn:
- Adapter wrapping NemotronEmbedder for ChromaDB's embedding API
- Enables drop-in replacement of SentenceTransformerEmbeddingFunction

Factory function:
- get_embedding_provider() reads HECTOR_EMBEDDING_PROVIDER env var
- Falls back to local if Nemotron unavailable
- Supports: 'local' | 'nemotron'

Configuration:
- HECTOR_EMBEDDING_PROVIDER: 'local' | 'nemotron'
- HECTOR_NEMOTRON_EMBED_MODEL: model ID
- HECTOR_NEMOTRON_API_KEY: NVIDIA API key
- HECTOR_EMBEDDING_DIM: 384 (local) | 2048 (Nemotron)
…ron support

Create core/rerank_provider.py with a unified interface for
document reranking, supporting two backends:

LocalReranker (default):
- Uses cross-encoder/ms-marco-MiniLM-L-6-v2 via sentence-transformers
- Sigmoid normalization of raw scores to 0-1 range
- No API key required, runs offline after model download

NemotronReranker (requires NVIDIA_API_KEY):
- Uses NVIDIA Nemotron rerank API (nemotron-rerank-v1)
- Better semantic understanding for legal text ranking
- API health check before use
- Handles both rankings[] and scores[] response formats
- Graceful fallback to local on failure

Both providers:
- Accept (query, documents) pairs
- Add 'reranker_score' to each document dict
- Sort by score descending
- Append reason string ('cross-encoder-reranked' or 'nemotron-reranked')

Factory function:
- get_rerank_provider() reads HECTOR_RERANK_PROVIDER env var
- Falls back to local if Nemotron unavailable
- Supports: 'local' | 'nemotron'

Configuration:
- HECTOR_RERANK_PROVIDER: 'local' | 'nemotron'
- HECTOR_NEMOTRON_RERANK_MODEL: model ID
- HECTOR_NEMOTRON_API_KEY: NVIDIA API key
…ver and ingestor

Modify data/hybrid_retriever.py:
- Import core.embedding_provider and core.rerank_provider
- _get_embedding_function(): check HECTOR_EMBEDDING_PROVIDER env var
  * If 'nemotron', use NemotronEmbedder via provider abstraction
  * Fall back to local SentenceTransformerEmbeddingFunction
- _rerank_with_cross_encoder(): check HECTOR_RERANK_PROVIDER env var
  * If 'nemotron', use NemotronReranker via provider abstraction
  * Fall back to local CrossEncoder

Modify utils/enhanced_ingestor.py:
- __init__(): try core.embedding_provider.get_embedding_provider()
  * Respects HECTOR_EMBEDDING_PROVIDER env var
  * Falls back to local SentenceTransformerEmbeddingFunction on error

Both components now seamlessly switch between local and Nemotron
backends based on environment variables, with automatic fallback
to local models when the Nemotron API is unavailable.
Tests for embedding and rerank provider abstractions validating:

Embedding provider:
- LocalEmbedder defaults to all-MiniLM-L6-v2 with 384d dimension
- NemotronEmbedder defaults to nemotron-embed-4b-v1 with 2048d
- Factory returns LocalEmbedder by default and on fallback
- Factory returns NemotronEmbedder when NVIDIA_API_KEY is set
- Factory falls back to local when Nemotron API is unreachable
- NemotronChromaEmbedFn adapter wraps NemotronEmbedder correctly

Rerank provider:
- LocalReranker defaults to ms-marco-MiniLM-L-6-v2
- Sigmoid normalization produces bounded [0,1] values
- Empty document list returns empty result for both providers
- Factory returns LocalReranker by default and on fallback
- Factory returns NemotronReranker when NVIDIA_API_KEY is set
- Factory falls back to local when Nemotron API is unreachable

Integration:
- get_embedding_provider and get_rerank_provider are callable
- hybrid_retriever and enhanced_ingestor modules load correctly

All tests use mocked API calls and explicit env var control to avoid
interference from concurrent test execution (933 tests total).
…afety

Problem:
Switching between local (384d) and Nemotron (2048d) embeddings on the
same ChromaDB collection causes dimension mismatch errors. ChromaDB
collections are bound to a specific embedding dimension at creation time.

Solution:
Both the ingestor and hybrid retriever now append the provider name to
the collection name when a non-local provider is configured:

  local  → indian_law_bns
  nemotron → indian_law_bns_nemotron

This ensures local and Nemotron embeddings never share a collection,
preventing dimension conflicts while remaining backward-compatible
with existing 384d databases.

Modified files:
- utils/enhanced_ingestor.py: collection name includes provider suffix
- data/hybrid_retriever.py: collection name includes provider suffix

Env vars:
- HECTOR_EMBEDDING_PROVIDER: 'local' (default) or 'nemotron'
- HECTOR_EMBEDDING_DIM: reserved for future explicit dimension control
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@DanielDeshmukh, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 6 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aee761f5-d93b-4be6-9165-e0c716d4c868

📥 Commits

Reviewing files that changed from the base of the PR and between 87c0ed1 and 64c4c09.

📒 Files selected for processing (5)
  • core/embedding_provider.py
  • core/rerank_provider.py
  • data/hybrid_retriever.py
  • tests/test_providers.py
  • utils/enhanced_ingestor.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phase-13-nemotron-retrieval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@DanielDeshmukh DanielDeshmukh merged commit d4ec3c8 into main Jun 28, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant