[WIP] Sglang Rollout Refactor by xiuhu17 · Pull Request #2267 · NVIDIA-NeMo/RL

xiuhu17 · 2026-04-14T23:23:12Z

Refactor SGLang Rollout with the New Design

Overview

This PR refactors the SGLang rollout stack to support the new rollout architecture while preserving backward compatibility with the previous interface.

The new design focuses on improving scalability, reliability, and maintainability of the rollout system. In particular, it introduces support for:

Redesign the old sglang rollout
SGLang router
Multi-node SGLang deployment
Weight / KV cache / CUDA graph onload and offload
Weight check: snapshot, reset, compare for weight update check.
async def generate_async(...) for asynchronous multi-turn rollout
Refactored colocated weight update
Fault tolerance for dead rollout server restart

The goal is to make the rollout layer production-ready for larger-scale serving and training integration, while minimizing disruption to existing callers.

Tests:

Thorough tests(around 75 tests):

https://docs.google.com/document/d/1hP4zzJQ5ZaMhEnmyoWNLOx3T2E8Yj20DCmTvbn-XTWw/edit?tab=t.0

copy-pr-bot · 2026-04-14T23:23:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

xiuhu17 added 30 commits April 2, 2026 15:51

update

51d0505

update

a36744b

update

decb22e

update

a5b2563

update

ce8a1c6

update

2501526

update

2383916

update

0f1639d

update

1653e72

update

2da9c4b

update

3489663

update

cd4d625

update

2568512

update

4f94d6e

update

a478c69

update

a943224

update

6f7c371

update

92ee51a

update

6433805

update

646cca7

update

2e55a36

update

d461bf1

update

752d2c6

update

3c6c419

update

1ab9dc4

update

cb9a6b6

update

1d90b3f

update

4c6d66f

update

012c487

update

8528f8e

xiuhu17 added 4 commits April 14, 2026 15:21

update

67f2e16

update

f31c49d

update

c2ef4f2

update

c114f45

github-actions bot added the community-request label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Sglang Rollout Refactor#2267

[WIP] Sglang Rollout Refactor#2267
xiuhu17 wants to merge 34 commits intoNVIDIA-NeMo:mainfrom
xiuhu17:zhw/refactor_sglang

xiuhu17 commented Apr 14, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiuhu17 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor SGLang Rollout with the New Design

Overview

Tests:

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiuhu17 commented Apr 14, 2026 •

edited

Loading