refactor(users): align regenerate_token with documented behavior by Matteovanypersele · Pull Request #309 · linagora/openrag

Matteovanypersele · 2026-04-17T10:44:29Z

We align regenerate_token with its documented behavior.

We add the admin-or-self check described in the endpoint documentation, following the same is_admin pattern already used in the other functions of this file.

Two Robot cases added in tests/api/users.robot to cover the two expected behaviors (self → 200, cross-user → 403).

Summary by CodeRabbit

New Features
- Enhanced token regeneration API: non-admin users can now regenerate their own tokens.
Improvements
- Added configurable timeout and automatic retry mechanisms with exponential backoff for audio transcription, PDF document processing, and document marking operations to improve reliability and error recovery.
Tests
- Added authorization test cases for token regeneration endpoints.

The endpoint description states that token regeneration is allowed for admins and for the user themselves, but the implementation had no dependency enforcing this. Add the admin-or-self check. Cover the allowed (self) and denied (cross-user) paths with Robot tests.

coderabbitai · 2026-04-17T10:44:42Z

📝 Walkthrough

Walkthrough

This PR adds timeout and exponential backoff retry configurations for three data loaders (Whisper, Marker, Docling). New configuration parameters define timeout values and retry behavior, wired through Pydantic models and applied via a new retry_with_backoff utility function that executes async operations with configurable delays between attempts. Additionally, user token regeneration endpoint now enforces runtime authorization checks.

Changes

Cohort / File(s)	Summary
Configuration & Models `conf/config.yaml`, `openrag/config/models.py`	Added timeout and retry-related fields for Whisper, Marker, and Docling loaders with defaults (timeouts: 1800–3600s, max retries: 1–3, base delay: 2.0s).
Ray Actor Utilities `openrag/components/ray_utils.py`	Introduced `retry_with_backoff` function to centralize exponential backoff retry logic for async operations, re-raising `CancelledError` immediately without retry.
Loader Implementations `openrag/components/indexer/loaders/audio/local_whisper.py`, `openrag/components/indexer/loaders/pdf_loaders/docling2.py`, `openrag/components/indexer/loaders/pdf_loaders/marker.py`	Applied timeout wrappers and `retry_with_backoff` to transcription and PDF processing workflows; each loader now reacquires/reinitializes workers on retry attempts with backoff delays.
User Authorization `openrag/routers/users.py`	Updated `regenerate_user_token` endpoint to enforce runtime permission checks: admins may regenerate any user's token; non-admins may only regenerate their own.
Integration Tests `tests/api/users.robot`	Added two test cases validating non-admin token regeneration: successful self-regeneration and forbidden cross-user regeneration.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Loader
    participant RetryHandler
    participant TimeoutWrapper
    participant RayActor
    
    Caller->>Loader: transcribe/process(path)
    Loader->>RetryHandler: retry_with_backoff(attempt_fn, max_retries, base_delay)
    
    loop Attempt (up to max_retries + 1)
        RetryHandler->>Loader: attempt_fn(attempt_index)
        Loader->>Loader: select_worker / acquire_actor
        Loader->>TimeoutWrapper: call_ray_actor_with_timeout(...)
        TimeoutWrapper->>RayActor: .remote(path)
        
        alt Success
            RayActor-->>TimeoutWrapper: result
            TimeoutWrapper-->>Loader: return result
            Loader-->>RetryHandler: return result
            RetryHandler-->>Caller: return result
        else Timeout / Exception
            RayActor-->>TimeoutWrapper: timeout/error
            TimeoutWrapper-->>Loader: raise exception
            Loader-->>Loader: cleanup (return_worker)
            Loader-->>RetryHandler: exception
            alt Final Attempt
                RetryHandler-->>Caller: re-raise
            else More Retries Available
                RetryHandler->>RetryHandler: await delay (base_delay * 2^attempt)
            end
        end
    end

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

Feat/marker pdf chunking #297: Introduces timeout/retry wrapper mechanisms (retry_with_backoff and call_ray_actor_with_timeout) that are now applied across MarkerPool, DoclingPool, and WhisperPool for consistent failure handling.
Marker retries #304: Directly modifies the same loader files (Marker, Docling, Whisper) and adds the same configuration fields and retry/timeout wiring patterns across all three components.

Suggested Reviewers

EnjoyBacon7
Ahmath-Gadji

Poem

🐰 Three loaders now dance with patience and grace,
With timeouts and retries for every case,
Backoff delays smooth the retry waltz,
While admins and users claim their exalts,
A robust and resilient workflow's embrace! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The title refers to aligning regenerate_token with documented behavior, but the PR primarily adds timeout and retry mechanisms to three loaders (Whisper, Marker, Docling) and implements authorization checks for token regeneration.	Update the title to reflect the main changes, such as: 'feat: add timeout and retry mechanisms to loaders and token regeneration authorization' to accurately represent both major changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

openrag/components/ray_utils.py (1)
61-66: Nit: type hint doesn't reflect that attempt_fn must be async.

attempt_fn is awaited at line 76, so the callable must return an awaitable. The current annotation Callable[[int], Any] would type-check a plain sync callable returning a non-awaitable, defeating the purpose of the hint.
♻️ Suggested change
-from collections.abc import Callable
+from collections.abc import Awaitable, Callable
 async def retry_with_backoff(
-    attempt_fn: Callable[[int], Any],
+    attempt_fn: Callable[[int], Awaitable[Any]],
     max_retries: int,
     base_delay: float,
     task_description: str = "task",
 ) -> Any:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openrag/components/ray_utils.py` around lines 61 - 66, The type hint for
retry_with_backoff's attempt_fn is incorrect: it must be an async function
returning an awaitable. Update the annotation for attempt_fn in
retry_with_backoff to reflect an awaitable return type (e.g., use
Callable[[int], Awaitable[Any]] or Coroutine[Any, Any, Any]) so static type
checkers know the callable is awaitable; keep the rest of the signature and
behavior unchanged.
openrag/config/models.py (1)
250-252: LGTM — field names, types, and defaults match conf/config.yaml.

whisper_timeout=1800, whisper_max_task_retry=1, whisper_retry_base_delay=2.0, marker_max_task_retry=3, marker_retry_base_delay=2.0, docling_timeout=3600, docling_max_task_retry=3, docling_retry_base_delay=2.0 all align with the YAML keys and defaults.

Optional: for consistency with the adjacent docling_* fields that use Field(default=..., ge=...) validators (Lines 346-348), you could add non-negative validators on the new timeout/retry fields (e.g. ge=0 for timeout/delay, ge=0 for retries). Non-blocking.

Also applies to: 344-351
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openrag/config/models.py` around lines 250 - 252, Add non-negative validators
to the new whisper_* fields to match the pattern used for docling_*: update the
whisper_timeout, whisper_max_task_retry, and whisper_retry_base_delay
declarations to use Field(default=..., ge=0) (or ge=0 for retries and delays,
ge=0 for timeout) so they enforce non-negative values; apply the same change to
the other mentioned fields at lines 344-351 (marker_max_task_retry,
marker_retry_base_delay, docling_timeout, docling_max_task_retry,
docling_retry_base_delay) to keep validation consistent with the existing
docling_* Field(…ge=…) pattern.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@openrag/components/ray_utils.py`:
- Around line 61-66: The type hint for retry_with_backoff's attempt_fn is
incorrect: it must be an async function returning an awaitable. Update the
annotation for attempt_fn in retry_with_backoff to reflect an awaitable return
type (e.g., use Callable[[int], Awaitable[Any]] or Coroutine[Any, Any, Any]) so
static type checkers know the callable is awaitable; keep the rest of the
signature and behavior unchanged.

In `@openrag/config/models.py`:
- Around line 250-252: Add non-negative validators to the new whisper_* fields
to match the pattern used for docling_*: update the whisper_timeout,
whisper_max_task_retry, and whisper_retry_base_delay declarations to use
Field(default=..., ge=0) (or ge=0 for retries and delays, ge=0 for timeout) so
they enforce non-negative values; apply the same change to the other mentioned
fields at lines 344-351 (marker_max_task_retry, marker_retry_base_delay,
docling_timeout, docling_max_task_retry, docling_retry_base_delay) to keep
validation consistent with the existing docling_* Field(…ge=…) pattern.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c1c03a97-d0de-45f2-a79c-16935fd5509c

📥 Commits

Reviewing files that changed from the base of the PR and between a91ad6d and 6dfa5e3.

📒 Files selected for processing (8)

conf/config.yaml
openrag/components/indexer/loaders/audio/local_whisper.py
openrag/components/indexer/loaders/pdf_loaders/docling2.py
openrag/components/indexer/loaders/pdf_loaders/marker.py
openrag/components/ray_utils.py
openrag/config/models.py
openrag/routers/users.py
tests/api/users.robot

Ahmath-Gadji · 2026-04-20T12:36:03Z

Thanks a lot for this PR.

It turns out another contributor was working on the same feature in parallel, enabling both users and admins to regenerate user tokens. Additionally, IndexerUI has been updated to include a button for this action.
See #311

As a result, this PR will be closed.

Matteovanypersele changed the title ~~Fix/regenerate token permissions~~ refactor(users): align regenerate_token with documented behavior Apr 17, 2026

Matteovanypersele changed the base branch from main to dev April 17, 2026 10:48

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Ahmath-Gadji closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(users): align regenerate_token with documented behavior#309

refactor(users): align regenerate_token with documented behavior#309
Matteovanypersele wants to merge 1 commit intolinagora:devfrom
Matteovanypersele:fix/regenerate-token-permissions-dev

Matteovanypersele commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Ahmath-Gadji commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Matteovanypersele commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Ahmath-Gadji commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Matteovanypersele commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading