Skip to content

[WIP] Sglang Rollout Refactor#2267

Draft
xiuhu17 wants to merge 34 commits intoNVIDIA-NeMo:mainfrom
xiuhu17:zhw/refactor_sglang
Draft

[WIP] Sglang Rollout Refactor#2267
xiuhu17 wants to merge 34 commits intoNVIDIA-NeMo:mainfrom
xiuhu17:zhw/refactor_sglang

Conversation

@xiuhu17
Copy link
Copy Markdown

@xiuhu17 xiuhu17 commented Apr 14, 2026

Refactor SGLang Rollout with the New Design

Overview

This PR refactors the SGLang rollout stack to support the new rollout architecture while preserving backward compatibility with the previous interface.

The new design focuses on improving scalability, reliability, and maintainability of the rollout system. In particular, it introduces support for:

  • Redesign the old sglang rollout
  • SGLang router
  • Multi-node SGLang deployment
  • Weight / KV cache / CUDA graph onload and offload
  • Weight check: snapshot, reset, compare for weight update check.
  • async def generate_async(...) for asynchronous multi-turn rollout
  • Refactored colocated weight update
  • Fault tolerance for dead rollout server restart

The goal is to make the rollout layer production-ready for larger-scale serving and training integration, while minimizing disruption to existing callers.


Tests:

Thorough tests(around 75 tests):


@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants