Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .hydra_config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ reranker:
top_k: ${oc.decode:${oc.env:RERANKER_TOP_K, 10}} # Number of documents to return after reranking. Upgrade for better results if your llm has a wider context window.
base_url: ${oc.env:RERANKER_BASE_URL, http://reranker:${oc.env:RERANKER_PORT, 7997}}

file_reducer:
max_group_tokens: ${oc.decode:${oc.env:FILE_REDUCER_MAX_GROUP_TOKENS, 4096}}
min_group_tokens: ${oc.decode:${oc.env:FILE_REDUCER_MIN_GROUP_TOKENS, 2048}}
target_size_tokens: ${oc.decode:${oc.env:FILE_REDUCER_TARGET_SIZE_TOKENS, 1024}}
max_rounds: ${oc.decode:${oc.env:FILE_REDUCER_MAX_ROUNDS, 3}}
min_shrink_ratio: ${oc.decode:${oc.env:FILE_REDUCER_MIN_SHRINK_RATIO, 0.1}}

map_reduce:
# Number of documents to process in the initial mapping phase
initial_batch_size: ${oc.decode:${oc.env:MAP_REDUCE_INITIAL_BATCH_SIZE, 10}}
Expand Down Expand Up @@ -91,6 +98,7 @@ prompts:
chunk_contextualizer: chunk_contextualizer_tmpl.txt
image_describer: image_captioning_tmpl.txt
spoken_style_answer: spoken_style_answer_tmpl.txt
file_reducer: file_reducer_tmpl.txt

# query templates for different retriever types
hyde: hyde.txt
Expand Down
292 changes: 292 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
# OpenRAG Agent Guide

## Build, Lint, and Test Commands

### Dependencies
```bash
# Install dependencies (uv package manager)
uv sync

# Install dev dependencies
uv sync --group dev

# Install lint dependencies
uv sync --group lint
```

### Development Server
```bash
# GPU deployment
docker compose up -d

# CPU deployment
docker compose --profile cpu up -d

# Rebuild and run
docker compose up --build -d
```

### Testing
```bash
# Run all unit tests
uv run pytest

# Run a single test file
uv run pytest openrag/components/indexer/chunker/test_chunking.py

# Run tests matching a pattern
uv run pytest -k "test_chunk"

# Run with verbose output
uv run pytest -v

# Run integration tests (requires running server)
uv run pytest -m integration

# Run tests with coverage
uv run pytest --cov=openrag
```

### Linting and Formatting
```bash
# Check code style
uv run ruff check openrag/ tests/

# Auto-fix linting issues
uv run ruff check --fix openrag/ tests/

# Format code
uv run ruff format openrag/ tests/

# Check formatting without modifying
uv run ruff format --check openrag/ tests/
```

### CI/CD
```bash
# Run API integration tests locally with act
act -j api-tests -W .github/workflows/api_tests.yml --bind
```

## Code Style Guidelines

### Imports
- Use **absolute imports** from the `openrag/` directory (Python path root)
- Group imports: standard library → third-party → first-party (`openrag.*`)
- Use `from openrag.X import Y` not relative imports across packages
- Isort configuration: `known-first-party = ["openrag"]`

```python
# Correct
from components.ray_utils import call_ray_actor_with_timeout
from utils.logger import get_logger
from config import load_config

# Avoid
from ..ray_utils import ... # Only use within same package
```

### Formatting
- **Line length**: 120 characters (configured in `pyproject.toml`)
- **Target Python**: 3.12+
- Use **double quotes** for strings
- Use **4 spaces** for indentation (no tabs)
- Follow Black-compatible formatting (Ruff format)

### Type Hints
- Use **type hints** for function parameters and return values
- Use `|` for union types (Python 3.10+ syntax)
- Use `Optional[T]` or `T | None` for optional values
- Use `list[T]`, `dict[str, Any]` for collections

```python
def process_file(file_id: str, partition: str | None = None) -> dict[str, Any]:
"""Process a file and return metadata."""
...
```

### Naming Conventions
- **Functions/variables**: `snake_case`
- **Classes**: `PascalCase`
- **Constants**: `UPPER_CASE`
- **Private members**: `_leading_underscore`
- **Ray Actors**: `PascalCase` (e.g., `Indexer`, `TaskStateManager`)
- **Test functions**: `test_<description>`

### Error Handling
- Use **custom exceptions** from `openrag/utils/exceptions/`
- All exceptions inherit from `OpenRAGError`
- Include `code`, `message`, and optional `status_code`
- Use specific exception types: `VDBError`, `EmbeddingError`

```python
from utils.exceptions import OpenRAGError, VDBError

# Raise error with code and message
raise VDBError(message="Failed to connect", code="VDB_001", status_code=503)

# Custom exception with extra context
raise OpenRAGError(
message="File not found",
code="FILE_NOT_FOUND",
status_code=404,
file_id=file_id
)
```

### Logging
- Use **Loguru** with structured logging via `get_logger()`
- Include contextual data using `.bind()`
- Never log secrets or sensitive data

```python
from utils.logger import get_logger

logger = get_logger()

# Log with context
logger.bind(file_id=file_id, partition=partition).info("Processing file")

# Error logging with exception
logger.bind(error=str(e)).error("Failed to process document")
```

### Async/Await
- Use `async def` for I/O operations (database, HTTP, Ray)
- Always `await` async calls
- Use `asyncio.gather()` for concurrent independent operations
- Use `call_ray_actor_with_timeout()` for Ray actor calls

```python
from components.ray_utils import call_ray_actor_with_timeout

# Concurrent operations
results = await asyncio.gather(
task1(),
task2(),
task3()
)

# Ray actor with timeout
result = await call_ray_actor_with_timeout(
future=indexer.process.remote(data),
timeout=30,
task_description="Processing document"
)
```

### Ray Actors
- Ray Actors are initialized in `openrag/api.py`
- Access actors via `ray.get_actor(name, namespace="openrag")`
- All actor methods called with `.remote()`

```python
import ray

# Get actor reference
vectordb = ray.get_actor("Vectordb", namespace="openrag")
indexer = ray.get_actor("Indexer", namespace="openrag")

# Call methods
await vectordb.async_search.remote(query=query, partition=partition)
```

### Configuration
- Configuration via **Hydra** with YAML files in `.hydra_config/`
- Access config via `load_config()` from `config.py`
- Environment variables override config values

```python
from config import load_config

config = load_config()
chunk_size = config.chunker.size
```

### API Patterns
- FastAPI routers in `openrag/routers/`
- Use dependency injection for shared resources
- Return `JSONResponse` for custom error responses
- Use Pydantic models for request/response validation

```python
from fastapi import APIRouter, Depends
from pydantic import BaseModel

router = APIRouter()

class DocumentRequest(BaseModel):
text: str
partition: str | None = None

@router.post("/documents")
async def create_document(req: DocumentRequest, user: User = Depends(get_current_user)):
...
```

### Testing Guidelines
- Unit tests: `openrag/components/**/test_*.py` (pytest)
- Integration tests: `tests/api_tests/*.py`
- Use pytest fixtures from `conftest.py`
- Mark tests: `@pytest.mark.integration` or `@pytest.mark.unit`

```python
import pytest

@pytest.mark.unit
def test_chunking():
assert result == expected

@pytest.mark.integration
async def test_api_endpoint():
response = await client.post("/v1/chat/completions", json={...})
assert response.status_code == 200
```

### Documentation
- Docstrings: **Google style** or **reStructuredText**
- Include type hints in docstrings if not obvious
- Document complex algorithms and business logic

```python
def process_chunk(chunk: Chunk) -> Embedding:
"""Process a document chunk and generate embedding.

Args:
chunk: The chunk to process

Returns:
Generated embedding vector

Raises:
EmbeddingError: If embedding generation fails
"""
...
```

## Key Files and Directories

```
openrag/
├── api.py # FastAPI app entry point, Ray initialization
├── routers/ # API route handlers
├── components/ # Core components (Indexer, Vectordb, Pipeline)
│ ├── indexer/ # Document ingestion, chunking, embedding
│ ├── pipeline.py # RAG pipeline orchestration
│ └── websearch/ # Web search integration
├── utils/ # Shared utilities
│ ├── exceptions/ # Custom exception classes
│ ├── logger.py # Logging configuration
│ └── config.py # Configuration loading
├── models/ # Pydantic models
└── prompts/ # LLM prompt templates
```

## Important Notes

- **Never commit secrets** - use `.env` files (not in repo)
- **Ray namespace** is always `"openrag"` for all actors
- **Milvus** is the vector database with hybrid search (dense + BM25)
- **Authentication** uses token-based auth with RBAC
- **Partition-based** multi-tenant document organization
- **OpenAI-compatible** API format for chat completions
1 change: 1 addition & 0 deletions docs/content/docs/documentation/API.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,7 @@ OpenAI-compatible text completion endpoint.
| `websearch` | `bool` | `false` | Augments the RAG context with live web search results. When used with a partition (`openrag-{partition}`), document and web results are combined. When used without a partition (direct LLM mode), web results are the sole context. Requires `WEBSEARCH_API_TOKEN` to be configured. See [web search configuration](/openrag/documentation/env_vars/#web-search-configuration). |
| `spoken_style_answer` | `bool` | `false` | Generates a succinct spoken-style conversational answer based on the retrieved documents. |
| `use_map_reduce` | `bool` | `false` | Uses a map-reduce strategy to aggregate information from multiple documents. See [map-reduce configuration](/openrag/documentation/env_vars/#map--reduce-configuration). |
| `attachments` | `list[{id: string}]` | `null` | Pins specific files by ID for retrieval, bypassing semantic search entirely. Each file's chunks are compressed by the file reducer before being sent to the LLM. See [file reducer configuration](/openrag/documentation/env_vars/#file-reducer-configuration). |
| `llm_override` | `object` | `null` | Routes the request to a different LLM endpoint while still using OpenRAG's RAG pipeline (retrieval, reranking, prompt construction). Accepts: `base_url` (string), `api_key` (string), `model` (string). Any field not provided falls back to the default OpenRAG LLM configuration. |

Examples:
Expand Down
16 changes: 16 additions & 0 deletions docs/content/docs/documentation/env_vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ The RAG pipeline comes with preconfigured prompts **`./prompts/example1`**. Here
| `image_captioning_tmpl.txt` | Template for generating image descriptions using the VLM |
| `hyde.txt` | Hypothetical Document Embeddings (HyDE) query expansion template |
| `multi_query_pmpt_tmpl.txt` | Template for generating multiple query variations |
| `file_reducer_tmpl.txt` | System prompt for the file reducer's chunk compression LLM calls |

To customize prompt:
1. **Duplicate the example folder**: Copy the `example1` folder from `./prompts/`
Expand Down Expand Up @@ -455,6 +456,21 @@ curl -X 'POST' 'http://localhost:8080/v1/chat/completions' \
```
:::

### File Reducer Configuration

The file reducer compresses a file's chunks down to a size that fits within the LLM context window. It works iteratively: chunks are grouped, each group is summarized by the LLM, and the process repeats until the total content fits. Two safety mechanisms prevent it from running indefinitely:

- **`max_rounds`** — hard cap on the number of compression iterations.
- **`min_shrink_ratio`** — if a round shrinks the content by less than this fraction, the LLM is not compressing meaningfully and the loop stops early.

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `FILE_REDUCER_TARGET_SIZE_TOKENS` | `int` | 1024 | Token budget for the final output. Compression rounds continue until the total content fits within this limit |
| `FILE_REDUCER_MAX_GROUP_TOKENS` | `int` | 4096 | Maximum tokens per group fed to the LLM in a single summarization call |
| `FILE_REDUCER_MIN_GROUP_TOKENS` | `int` | 2048 | Groups smaller than this threshold are passed through without calling the LLM |
| `FILE_REDUCER_MAX_ROUNDS` | `int` | 3 | Maximum number of compression rounds before stopping regardless of output size |
| `FILE_REDUCER_MIN_SHRINK_RATIO` | `float` | 0.1 | Minimum fraction of tokens that must be removed in a round to continue iterating (e.g. `0.1` = at least 10% reduction required) |

### FastAPI & Access Control
:::info
By default, our API (FastAPI) uses **`uvicorn`** for deployment. One can opt in to use `Ray Serve` for scalability (see the [ray serve configuration](/openrag/documentation/env_vars/#ray-serve-configuration))
Expand Down
Loading