Skip to content
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
416 commits
Select commit Hold shift + click to select a range
5b520e3
Validate native vLLM LoRA for Qwen3 dense
FurtherAI May 5, 2026
d70ab2c
Promote dense Qwen models to validated support
FurtherAI May 5, 2026
3d77ba3
Avoid eager model support workflow imports
FurtherAI May 5, 2026
3663266
Use compact packed GDN kernels for local buckets
FurtherAI May 5, 2026
5d32ac0
Use chunked FLA GDN kernel
FurtherAI May 6, 2026
697f392
Use fused Megatron cross entropy
FurtherAI May 6, 2026
632eefb
Remove legacy GDN executor path
FurtherAI May 6, 2026
4d60c94
Add harness CE fusion override worker
FurtherAI May 6, 2026
d57b48e
Add GDN timing hooks to harness wrapper
FurtherAI May 6, 2026
02f221b
Organize Megatron modules and integration tests
FurtherAI May 7, 2026
06814b0
Fix HF parity invariant handler call
FurtherAI May 7, 2026
df52d07
Port main dependency and lifecycle updates
FurtherAI May 8, 2026
4c1fde1
Update Qwen handler for newer bridge mappings
FurtherAI May 8, 2026
6c66d67
Validate Qwen3.5 vLLM LoRA layout
FurtherAI May 8, 2026
470f966
Remove flex attention compile tuning options
FurtherAI May 8, 2026
6b43ef0
Ignore train inference mismatch artifacts
FurtherAI May 8, 2026
5fe1f1b
Avoid assert bytecode in flex attention forward
FurtherAI May 8, 2026
70e9db4
Report flex attention bias type mismatches
FurtherAI May 8, 2026
f79e63e
Propagate Qwen3.5 MTP shared-prefix attention
FurtherAI May 8, 2026
1506236
Forward Qwen3.5 MTP attention bias to layers
FurtherAI May 8, 2026
dd16e0a
Avoid checkpointing Qwen3.5 MTP attention state
FurtherAI May 8, 2026
5bf2c87
Disable Qwen3.5 MTP in ART Megatron
FurtherAI May 8, 2026
e9b869d
Drop MTP diagnostic flex attention changes
FurtherAI May 8, 2026
d26ecb7
Assert Qwen3.5 ART training has no MTP
FurtherAI May 8, 2026
6b40e71
Clean PR artifacts and fix type checks
FurtherAI May 8, 2026
aafedae
Merge remote-tracking branch 'origin/main' into austin/vllm_separation
FurtherAI May 8, 2026
7edba06
Unify runtime process supervision
FurtherAI May 9, 2026
a31a581
Model asyncio subprocess contract in runtime tests
FurtherAI May 9, 2026
815d577
Defer supervised wait coroutine creation
FurtherAI May 9, 2026
f662370
Prune oracle topology artifacts by default
FurtherAI May 9, 2026
7434fdf
Handle vLLM EP dummy LoRA warmup
FurtherAI May 9, 2026
e84cc4c
Keep vLLM MoE LoRA stacking idempotent
FurtherAI May 9, 2026
ef2c7b9
Add train inference mismatch workflow stage
FurtherAI May 10, 2026
a0c071b
Update workflow test oracle artifact mocks
FurtherAI May 10, 2026
0608762
Preserve recent Unsloth training fixes
FurtherAI May 11, 2026
cee9112
Preserve recent Unsloth training fixes
FurtherAI May 11, 2026
7af2df4
Merge CP GDN into separated Megatron layout
FurtherAI May 11, 2026
dbc4fe5
Rebuild merged Megatron environment
FurtherAI May 11, 2026
1dc1914
Align merged Megatron validation expectations
FurtherAI May 12, 2026
595fe7b
Add train inference output parity probe
FurtherAI May 12, 2026
8cd811d
Restore Qwen3 MoE CP compile flags
FurtherAI May 12, 2026
9bb5d9d
Use vLLM separation Megatron bridge revision
FurtherAI May 12, 2026
ad43186
Use Triton flex backend for fp32 oracle tests
FurtherAI May 12, 2026
53bc198
Restore Qwen3.5 LoRA wrapping integration test
FurtherAI May 12, 2026
e94fb6a
Merge dense oracle coverage with CP validation
FurtherAI May 12, 2026
8e746b9
Complete vLLM separation topology merge
FurtherAI May 12, 2026
620572b
Assert Megatron topology stays fixed
FurtherAI May 12, 2026
88ad814
Keep merged startup on fixed Megatron topology
FurtherAI May 12, 2026
e7d196b
Remove unit test topology coverage
FurtherAI May 12, 2026
4f5f468
Patch vLLM LoRA duplicate aliases
FurtherAI May 12, 2026
9c0fab9
Fix model support validation harness drift
FurtherAI May 12, 2026
45a5bbb
Use test triton backend for HF parity
FurtherAI May 12, 2026
efeccf4
Fix EP MoE native LoRA TP slicing
FurtherAI May 12, 2026
a583c9d
Use aggregate correctness mean abs pct
FurtherAI May 12, 2026
f1df667
Convert Qwen3.5 q-gate LoRA layout
FurtherAI May 12, 2026
12457e1
Fix EP MoE LoRA align expert count
FurtherAI May 12, 2026
d542aab
Fix EP MoE dummy LoRA warmup
FurtherAI May 12, 2026
6a61e12
Add train-inf no shared expert LoRA ablation
FurtherAI May 12, 2026
fbfc093
Slice fused expert HF loads under EP
FurtherAI May 12, 2026
f1f10fb
Align qwen35 moe lora with vllm 3d layout
FurtherAI May 12, 2026
28ce863
Add train-inf LoRA target override
FurtherAI May 12, 2026
624c16d
Build CP block masks without dense token masks
FurtherAI May 12, 2026
a98794b
Avoid base grad buffers in parity worker
FurtherAI May 13, 2026
cf3f1df
Handle empty local GDN CP ranks
FurtherAI May 13, 2026
6263741
Pin NCCL and update merged weight sync
FurtherAI May 13, 2026
46b2b33
Update train inf mismatch metric gates
FurtherAI May 13, 2026
ceeec62
Use smaller train inf metric epsilon
FurtherAI May 13, 2026
080ce98
Use smaller metric denominator epsilon
FurtherAI May 13, 2026
03fbcdf
Run live train inf parity in workflow
FurtherAI May 13, 2026
3c12779
Apply train inf mismatch updates
FurtherAI May 13, 2026
bb046af
Require MoE layers in Qwen3.5 MoE LoRA handler
FurtherAI May 13, 2026
8f09e81
Optimize GDN CP planning for varied workloads
FurtherAI May 14, 2026
52e15e6
Avoid GDN CP runtime length synchronization
FurtherAI May 14, 2026
cbd3bce
Add streaming frozen weight offload
FurtherAI May 14, 2026
d9d746d
Lower streaming offload memory window
FurtherAI May 14, 2026
4fcb7b2
Use bounded pinned staging for streaming offload
FurtherAI May 14, 2026
f9cc1d9
Enable Megatron debug wrappers without compile
FurtherAI May 14, 2026
d81b060
Improve DeepEP debug timing payload
FurtherAI May 14, 2026
456ef10
Support streaming offload for sparse CP shards
FurtherAI May 14, 2026
8792c98
Remove DeepEP combine readiness runtime workaround
FurtherAI May 14, 2026
b37b8fa
Disable compiled MoE dispatch preprocess
FurtherAI May 15, 2026
319c5b3
Add windowed streaming weight prefetch
FurtherAI May 15, 2026
a889a30
Fail fast on Megatron job failures
FurtherAI May 15, 2026
a35be61
Fix CP exchange collective participation
FurtherAI May 15, 2026
6cc1e49
Fix CP empty-rank collective participation
FurtherAI May 15, 2026
b03853b
Add streaming weight offload validation hooks
FurtherAI May 16, 2026
83da657
Fix qwen35 gdn compile boundaries
FurtherAI May 16, 2026
0c527d8
Narrow expert lora compile boundary
FurtherAI May 16, 2026
c325859
Fix oracle routing trace for variable micros
FurtherAI May 16, 2026
646b748
Refine oracle LoRA reference controls
FurtherAI May 17, 2026
c872195
Use native Megatron MoE routing replay
FurtherAI May 18, 2026
f6a369f
Add production MoE routing replay plumbing
FurtherAI May 18, 2026
a5d6a26
Expose trajectory routing replay train flag
FurtherAI May 18, 2026
211b7e2
Make expert replay a backend setting
FurtherAI May 18, 2026
f3f619c
Add real-path train inf mismatch test
FurtherAI May 18, 2026
9ab0308
Disable async scheduling for expert replay
FurtherAI May 18, 2026
f5f1714
Forward false vLLM runtime flags
FurtherAI May 18, 2026
3b84202
Use nonzero advantages in real mismatch test
FurtherAI May 18, 2026
45627c8
Align real mismatch rollout chat template
FurtherAI May 18, 2026
200494c
Allow replay to omit terminal generated route
FurtherAI May 18, 2026
cde0316
Replay known routes and live-route terminal gaps
FurtherAI May 18, 2026
2d043df
Gather TP logits in mismatch extractor
FurtherAI May 18, 2026
cb815e4
Run real mismatch test without opt-in env
FurtherAI May 18, 2026
3f3cc5f
Make routing replay native and cp2 by default
FurtherAI May 18, 2026
3470a2b
Fix mismatch test topology world size
FurtherAI May 18, 2026
b72a01a
Restore tp2 ep2 mismatch defaults
FurtherAI May 18, 2026
8125f8a
Fix CP attention backward grad layout
FurtherAI May 18, 2026
d9dbdb6
Wire weight offload config into attention oracle
FurtherAI May 18, 2026
f61d43c
Document mismatch threshold diagnostics
FurtherAI May 18, 2026
bec322b
Fix CP flash grad handoff
FurtherAI May 19, 2026
85583fb
Default oracle validation to Qwen3.5
FurtherAI May 19, 2026
75a4abb
Allow streaming offload with compiled layers
FurtherAI May 20, 2026
9aff411
Tolerate job tensor cleanup races
FurtherAI May 20, 2026
bac0f1a
Revert job tensor cleanup retry
FurtherAI May 20, 2026
a6e1749
Raise train-inf mismatch bf16 gate
FurtherAI May 20, 2026
22aa60f
Fix oracle routing replay capture
FurtherAI May 20, 2026
7e44709
Tune streaming weight offload defaults
FurtherAI May 20, 2026
9a2abc0
Keep full-model streaming offload defaults
FurtherAI May 20, 2026
6a0a9c2
Optimize CP block mask refinement
FurtherAI May 20, 2026
0e688f8
Fix MoE replay topology parity
FurtherAI May 21, 2026
050d6cb
Spread synthetic replay routes
FurtherAI May 21, 2026
bf3ec9b
Clean up Megatron compile workarounds
FurtherAI May 21, 2026
ea92cb8
Remove temporary flex compile options
FurtherAI May 21, 2026
48cb055
Move routing replay trace bundle builder to tests
FurtherAI May 21, 2026
7c5548d
Fix flex attention compile defaults
FurtherAI May 21, 2026
ae9933b
Move model support validation APIs to tests
FurtherAI May 21, 2026
bbfe210
Clean up Qwen3.5 text bridge registration
FurtherAI May 21, 2026
9dba103
Merge branch 'main' into austin/train_inf_mismatch
FurtherAI May 21, 2026
2bef373
Clean up routing replay merge state
FurtherAI May 23, 2026
1a0adcd
Drop stale megatron core build config
FurtherAI May 23, 2026
2566b64
Clean up train inf mismatch real path gate
FurtherAI May 23, 2026
7b9a0c6
Restore explicit NCCL weight transfer contract
FurtherAI May 23, 2026
a8f07ea
Lower train-inf mismatch rollout temperature
FurtherAI May 23, 2026
bf99ef8
Seed train-inf mismatch rollouts
FurtherAI May 23, 2026
04ac948
Use lower train-inf rollout temperature without seeds
FurtherAI May 23, 2026
2d6de24
Restore train-inf rollout temperature
FurtherAI May 23, 2026
66451c0
Refactor Megatron provider runtime env handling
FurtherAI May 23, 2026
f761ead
Refactor Megatron train support helpers
FurtherAI May 23, 2026
8da254e
Move Megatron microbatch helpers out of train
FurtherAI May 23, 2026
9def1cf
Move Megatron runtime patches out of compile helpers
FurtherAI May 23, 2026
71882dd
Group Megatron flex attention helpers
FurtherAI May 23, 2026
b358a4a
Move provider helpers and Megatron backend into main module
FurtherAI May 23, 2026
1ce63a7
Use compact non-CP oracle topology matrix
FurtherAI May 23, 2026
b3f6f4b
Merge origin/main into vllm merge worktree
FurtherAI May 23, 2026
28fcde8
Fix Megatron type checking
FurtherAI May 23, 2026
98b1cd7
Add durable model support workflow CLI
FurtherAI May 23, 2026
850ce28
Remove native LoRA exclusion from workflow CLI
FurtherAI May 23, 2026
082d0aa
Add vLLM routed expert prefix sidecar
FurtherAI May 23, 2026
923f025
Make CP prepare keep planning metadata on CPU
FurtherAI May 24, 2026
c4aacbe
Fix empty-rank GDN CP autograd participation
FurtherAI May 24, 2026
137c4d3
Fix GDN CP oracle metadata paths
FurtherAI May 24, 2026
53cd24c
Fix routed expert prefix cache sidecar dependencies
FurtherAI May 24, 2026
fb5d442
Refresh native GDN CP packed test assertions
FurtherAI May 24, 2026
003b433
Tune train-inf mismatch gates
FurtherAI May 24, 2026
09937e0
Relax qwen3 train-inf gates
FurtherAI May 24, 2026
f12dd5a
Relax bf16 attention oracle thresholds
FurtherAI May 24, 2026
a9f79bd
Set bf16 attention oracle threshold to two percent
FurtherAI May 24, 2026
5cda0b2
Fix fused expert LoRA ETP sharding
FurtherAI May 24, 2026
54855ec
Recognize fused moe lora coverage
FurtherAI May 25, 2026
0f70173
Enable managed MoE routing replay
FurtherAI May 25, 2026
aedd9ea
Clean up oracle trace UID handling
FurtherAI May 25, 2026
fdeb42b
Release routing replay before job cleanup
FurtherAI May 25, 2026
456ee60
Update Qwen3.5 train-inf invariant gate
FurtherAI May 25, 2026
bdd6c0e
Support dense real-path train-inf topology
FurtherAI May 25, 2026
491ef59
Ignore token-only MoE routing metadata
FurtherAI May 25, 2026
7822790
Treat null route fields as absent
FurtherAI May 25, 2026
7192d07
Fix dense real-path score matching
FurtherAI May 25, 2026
4604b9e
Fix CP GDN forward trace canonicalization
FurtherAI May 25, 2026
6593840
Add real-path base mismatch diagnostics
FurtherAI May 26, 2026
d7a381c
Fix real-path base diagnostic scoring
FurtherAI May 26, 2026
db3cffb
Freeze base diagnostic Megatron worker
FurtherAI May 26, 2026
3084544
Add real-path base mismatch diagnostic
FurtherAI May 26, 2026
47991e1
Move GDN trace UID helpers to oracle tests
FurtherAI May 26, 2026
4ab349d
Add train-inf forward trace diagnostic
FurtherAI May 26, 2026
7931829
Lease scheduled eval adapters
FurtherAI May 26, 2026
5e940a1
Keep forward trace on default vLLM path
FurtherAI May 26, 2026
fd3c3d4
Limit vLLM forward trace tensor dumps
FurtherAI May 26, 2026
f6e07d9
Capture Megatron final hidden in trace
FurtherAI May 26, 2026
87cd3a4
Save Megatron logits in forward trace
FurtherAI May 26, 2026
19297a9
Capture Megatron trace submodules for train-inf diagnostics
FurtherAI May 26, 2026
0286d1e
Trace vLLM projection submodules for diagnostics
FurtherAI May 26, 2026
9b4e340
Add all-architectures model support workflow
FurtherAI May 26, 2026
9a9cd7a
Clean up Megatron weight offload status logging
FurtherAI May 26, 2026
c97dbd8
Clean train-inf adapter artifacts on pass
FurtherAI May 26, 2026
58464af
Share external vLLM runtime lifecycle
FurtherAI May 26, 2026
651b354
Rename CP token UID tracing flag
FurtherAI May 26, 2026
3281ac5
Clean up weight transfer communicator lifetime
FurtherAI May 26, 2026
c17cc4e
Deduplicate Megatron test artifact helpers
FurtherAI May 26, 2026
3ca6f94
Use main loss for context-parallel RL
FurtherAI May 27, 2026
512b48a
Keep context-parallel loss reductions isolated
FurtherAI May 27, 2026
f07e733
Route loss inputs through explicit alignment adapters
FurtherAI May 27, 2026
244fd56
Require group ids in aligned loss inputs
FurtherAI May 27, 2026
4f27a8e
Avoid mutating aligned loss advantages
FurtherAI May 27, 2026
4836a67
Merge origin/main into vllm merge worktree
FurtherAI May 27, 2026
17e387e
Fix packed tensor cleanup for CP lookahead
FurtherAI May 27, 2026
ebdc538
Optimize Megatron LoRA checkpoint publishing
FurtherAI May 27, 2026
76177d6
Merge remote-tracking branch 'origin/main' into austin/train_inf_mism…
FurtherAI May 27, 2026
1baa5eb
Batch Megatron LoRA publish transfers
FurtherAI May 27, 2026
2bebf03
Optimize Megatron LoRA publish metadata
FurtherAI May 28, 2026
ad3c368
Derive Megatron LoRA publish metadata locally
FurtherAI May 28, 2026
ecc4d86
Optimize packed expert LoRA publish
FurtherAI May 28, 2026
c1d8020
Lazy load Megatron model support handlers
FurtherAI May 28, 2026
7226e60
Merge train-inf mismatch workflow and validation fixes
FurtherAI May 29, 2026
393d682
Adjust Qwen3 MoE train-inf parity gates
FurtherAI May 29, 2026
be6bfab
Cover Qwen3 MoE train-inf route-conflict KL
FurtherAI May 29, 2026
50b5c30
Adjust Qwen3 dense train-inf parity gate
FurtherAI May 29, 2026
4376533
Fix Qwen3.5 validation regressions
FurtherAI May 29, 2026
877b725
Patch Qwen3.5 fp32 parity GDN reference
FurtherAI May 29, 2026
e55f598
Fix packed GDN parity reference slicing
FurtherAI May 29, 2026
6a51b57
Run Qwen3.5 HF parity in bf16
FurtherAI May 29, 2026
c826d91
Allow bf16 HF parity validation
FurtherAI May 29, 2026
be653e5
Set Qwen3.5 dense train-inf parity gates
FurtherAI May 29, 2026
8e037ba
Revert "Set Qwen3.5 dense train-inf parity gates"
FurtherAI May 29, 2026
8c4aaeb
Use CP-first Megatron default topology
FurtherAI May 29, 2026
1b8f93e
Revert diagnostic validation threshold changes
FurtherAI May 29, 2026
fc190ed
Restore Qwen3.5 fused expert LoRA export
FurtherAI May 29, 2026
d6fd491
Restore bf16 real GDN CP validation
FurtherAI May 29, 2026
da54952
Pin Megatron integration artifacts to commits
FurtherAI May 29, 2026
bfcebb2
test: add gdn fp32 oracle reference
FurtherAI May 30, 2026
5ddc79e
test: stabilize gdn output loss checks
FurtherAI May 30, 2026
d0ff976
Fix train inf mismatch CP scoring harness
FurtherAI May 30, 2026
d3e8807
Fix CP routing replay explicit uid targets
FurtherAI May 30, 2026
63eb044
Optimize CP routing replay UID handoff
FurtherAI May 30, 2026
f909c41
Cache routing replay target refreshes
FurtherAI May 30, 2026
d427295
Prestage routing replay targets before forward
FurtherAI May 30, 2026
f095db5
Test prestaged routing replay layout switches
FurtherAI May 30, 2026
5705a00
Keep workflow architecture inspection single-rank
FurtherAI May 30, 2026
6fcacdb
Stage routing replay targets in validation harnesses
FurtherAI May 31, 2026
aedb7ed
Remove branch-only assertion tests
FurtherAI May 31, 2026
88d4f15
Keep CP scoring token UIDs on CPU
FurtherAI May 31, 2026
8de63fd
Retry train inf mismatch workflow stage
FurtherAI Jun 1, 2026
d486c38
Fix CP routing replay trace token uids
FurtherAI Jun 1, 2026
139a64b
Relax router score oracle for CP replay
FurtherAI Jun 1, 2026
a0df118
Drop padded expert rows from forward traces
FurtherAI Jun 1, 2026
9f80b5c
Pack oracle LoRA snapshots before safetensors save
FurtherAI Jun 1, 2026
dcc25a8
Disable compiled qwen35 routed expert compute
FurtherAI Jun 1, 2026
ad055f8
Normalize Megatron identity LoRA through model support
FurtherAI Jun 1, 2026
899b917
Preserve GDN layout across checkpoint recompute
FurtherAI Jun 1, 2026
18dad24
Tighten router score oracle threshold
FurtherAI Jun 1, 2026
075031c
Narrow Qwen3.5 MoE compile workaround
FurtherAI Jun 1, 2026
23d32d4
Use GDN island boundary layout state
FurtherAI Jun 1, 2026
0264231
Remove GDN layout inference fallback
FurtherAI Jun 1, 2026
3470ce8
Patch weighted SwiGLU compile autograd
FurtherAI Jun 1, 2026
d8b2209
Remove no-op CP training guard
FurtherAI Jun 2, 2026
342b100
Remove CP timing from production training results
FurtherAI Jun 2, 2026
64144ca
Trim GDN shared-prefix PR test surface
FurtherAI Jun 2, 2026
81fc8b2
Drop GDN shared-prefix README from PR surface
FurtherAI Jun 2, 2026
a2b0ec8
Remove dead GDN production helpers
FurtherAI Jun 2, 2026
25e0a7f
Merge latest main into vllm merge worktree
FurtherAI Jun 4, 2026
8aac18f
Merge gql weave compatibility fix from main
FurtherAI Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 22 additions & 10 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ backend = [
"bitsandbytes>=0.45.2",
"unsloth==2026.3.3",
"unsloth-zoo==2026.3.1",
"torch==2.10.0",
"torch>=2.11.0",
"torchao==0.16.0",
"accelerate==1.7.0",
"awscli>=1.38.1",
Expand All @@ -43,8 +43,10 @@ backend = [
]
megatron = [
"numpy<2",
"torch==2.10.0",
"quack-kernels==0.2.5",
"torch>=2.11.0",
"flash-attn-4==4.0.0b5",
"ninja>=1.11.1",
"quack-kernels==0.3.7",
"apex",
"transformer-engine==2.11.0",
"transformer-engine-cu12==2.11.0",
Expand All @@ -53,6 +55,7 @@ megatron = [
"pybind11>=2.13.6",
"megatron-bridge==0.4.0rc0",
"deep-ep==1.2.1 ; sys_platform == 'linux'",
"tilelang==0.1.10 ; sys_platform == 'linux' and platform_machine == 'x86_64'",
"causal-conv1d==1.6.1 ; sys_platform == 'linux' and platform_machine == 'x86_64' and python_full_version < '3.12'",
"mamba-ssm==2.3.1 ; sys_platform == 'linux' and platform_machine == 'x86_64' and python_full_version < '3.12'",
"nvidia-ml-py==13.580.82",
Expand All @@ -76,7 +79,7 @@ tinker = [
"protobuf>=6.31.1",
"tinker-cookbook>=0.4.1,<0.5",
"tinker>=0.21.0,<0.22",
"torch==2.10.0",
"torch>=2.11.0",
"transformers==5.2.0",
"uvicorn>=0.35.0",
"datrie>=0.8.3",
Expand Down Expand Up @@ -152,17 +155,19 @@ override-dependencies = [
"megatron-core==0.17.0",
"numpy<2",
"nvidia-resiliency-ext<0.5",
"quack-kernels==0.2.5",
"quack-kernels==0.3.7",
"transformer-engine==2.11.0",
"transformers==5.2.0",
"torch==2.11.0",
]
exclude-dependencies = ["pynvml", "emerging-optimizers"]
no-build-isolation-package = ["apex", "transformer-engine", "transformer-engine-cu12", "transformer-engine-torch", "megatron-bridge", "deep-ep", "nv-grouped-gemm"]

[tool.uv.extra-build-dependencies]
apex = ["torch>=2.8.0"]
deep-ep = ["torch>=2.8.0"]
nv-grouped-gemm = ["torch>=2.8.0"]
transformer-engine-torch = ["torch>=2.8.0"]
apex = ["torch>=2.11.0"]
deep-ep = ["torch>=2.11.0"]
nv-grouped-gemm = ["torch>=2.11.0"]
transformer-engine-torch = ["torch>=2.11.0"]

[tool.uv.extra-build-variables]
apex = { APEX_CPP_EXT = "1", APEX_CUDA_EXT = "1", APEX_FAST_LAYER_NORM = "1", APEX_PARALLEL_BUILD = "16", NVCC_APPEND_FLAGS = "--threads 4" }
Expand All @@ -180,7 +185,7 @@ requires-dist = []

[[tool.uv.dependency-metadata]]
name = "transformer-engine-torch"
version = "0.5.18"
version = "2.11.0"
requires-dist = [
"einops",
"onnx",
Expand Down Expand Up @@ -266,8 +271,15 @@ dev = [
]

[tool.uv.sources]
torch = { index = "pytorch-cu128" }
apex = { git = "https://github.com/NVIDIA/apex.git", rev = "25.09" }
deep-ep = { git = "https://github.com/deepseek-ai/DeepEP.git", rev = "v1.2.1" }
flash-attn-4 = { url = "https://files.pythonhosted.org/packages/24/f7/01ee2576ce41f9884d291ee21861ef194afc0b2b1ce3bd175fc7a6e1b133/flash_attn_4-4.0.0b5-py3-none-any.whl" }
megatron-bridge = { git = "https://github.com/NVIDIA-NeMo/Megatron-Bridge.git", rev = "e049cc00c24d03e2ae45d2608c7a44e2d2364e3d" }
panza = { git = "https://github.com/corbt/panza.git" }
transformer-engine-torch = { git = "https://github.com/NVIDIA/TransformerEngine.git", rev = "v2.11", subdirectory = "transformer_engine/pytorch" }

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true
17 changes: 11 additions & 6 deletions scripts/bump_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,19 @@
import subprocess
import sys

PROJECT_VERSION_RE = re.compile(
r'(?ms)^(\[project\]\s+.*?^version = ")(\d+\.\d+\.\d+)(")'
)


def get_current_version():
"""Extract current version from pyproject.toml."""
pyproject_path = Path(__file__).parent.parent / "pyproject.toml"
content = pyproject_path.read_text()
match = re.search(r'version = "(\d+\.\d+\.\d+)"', content)
match = PROJECT_VERSION_RE.search(content)
if not match:
raise ValueError("Could not find version in pyproject.toml")
return match.group(1)
raise ValueError("Could not find [project] version in pyproject.toml")
return match.group(2)


def bump_version(current_version, bump_type):
Expand All @@ -43,10 +47,11 @@ def update_version(new_version):
pyproject_path = Path(__file__).parent.parent / "pyproject.toml"
content = pyproject_path.read_text()

# Update version
new_content = re.sub(
r'version = "\d+\.\d+\.\d+"', f'version = "{new_version}"', content
new_content, count = PROJECT_VERSION_RE.subn(
rf"\g<1>{new_version}\3", content, count=1
)
if count != 1:
raise ValueError("Could not update [project] version in pyproject.toml")

pyproject_path.write_text(new_content)

Expand Down
8 changes: 7 additions & 1 deletion src/art/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,12 @@
import transformers

try:
from .transformers.patches import patch_preprocess_mask_arguments
from .transformers.patches import (
disable_broken_torchvision_for_transformers,
patch_preprocess_mask_arguments,
)

disable_broken_torchvision_for_transformers()
patch_preprocess_mask_arguments()
except Exception:
pass
Expand All @@ -65,6 +69,7 @@
from .trajectories import Trajectory, TrajectoryGroup
from .types import (
LocalTrainResult,
MegatronTopologyConfig,
Messages,
MessagesAndChoices,
ServerlessTrainResult,
Expand All @@ -87,6 +92,7 @@
"LocalBackend",
"LocalTrainResult",
"LoRAConfig",
"MegatronTopologyConfig",
"ServerlessBackend",
"ServerlessTrainResult",
"Messages",
Expand Down
7 changes: 6 additions & 1 deletion src/art/_backend_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
summarize_trajectory_groups,
)
from .trajectories import TrajectoryGroup
from .types import TrainConfig
from .types import MegatronTopologyConfig, TrainConfig


def build_rl_train_configs(
Expand All @@ -34,6 +34,7 @@ def build_rl_train_configs(
scale_learning_rate_by_reward_std_dev: bool | None = None,
logprob_calculation_chunk_size: int | None = None,
packed_sequence_length: int | None = None,
megatron_topology: MegatronTopologyConfig | dict[str, int | None] | None = None,
num_trajectories_learning_rate_multiplier_power: float | None = None,
kl_ref_adapter_path: str | None = None,
) -> tuple[TrainConfig, dev.TrainConfig]:
Expand Down Expand Up @@ -65,6 +66,10 @@ def build_rl_train_configs(
dev_config["logprob_calculation_chunk_size"] = logprob_calculation_chunk_size
if packed_sequence_length is not None:
dev_config["packed_sequence_length"] = packed_sequence_length
if megatron_topology is not None:
dev_config["megatron_topology"] = MegatronTopologyConfig.model_validate(
megatron_topology
).model_dump(mode="json")
if num_trajectories_learning_rate_multiplier_power is not None:
dev_config["num_trajectories_learning_rate_multiplier_power"] = (
num_trajectories_learning_rate_multiplier_power
Expand Down
2 changes: 2 additions & 0 deletions src/art/dev/get_model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,6 @@ def get_model_config(
result["trainer_gpu_ids"] = config["trainer_gpu_ids"]
if "inference_gpu_ids" in config:
result["inference_gpu_ids"] = config["inference_gpu_ids"]
if "megatron_topology" in config:
result["megatron_topology"] = config["megatron_topology"]
return result
7 changes: 6 additions & 1 deletion src/art/dev/model.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
from enum import Enum
from typing import Literal, NoReturn
from typing import TYPE_CHECKING, Literal, NoReturn

from typing_extensions import Required, TypedDict

from .engine import EngineArgs

if TYPE_CHECKING:
from ..types import MegatronTopologyConfig

RolloutWeightsMode = Literal["lora", "merged"]


Expand Down Expand Up @@ -135,6 +138,7 @@ class InternalModelConfig(TypedDict, total=False):
chat_template_content_format: vLLM chat template content format.
chat_template_tool_schema_format: Tool schema rendering format used for
local training tokenization.
megatron_topology: Fixed Megatron parallel topology for this model.
allow_unvalidated_arch: Permit model-support validation workflows to run
architectures that are not yet in the supported-model registry.
"""
Expand All @@ -152,6 +156,7 @@ class InternalModelConfig(TypedDict, total=False):
chat_template_path: str
chat_template_content_format: str
chat_template_tool_schema_format: Literal["default", "vllm_openai"]
megatron_topology: "MegatronTopologyConfig | dict[str, int | None]"
allow_unvalidated_arch: bool


Expand Down
4 changes: 4 additions & 0 deletions src/art/dev/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ class TrainConfig(TypedDict, total=False):
logprob_calculation_chunk_size: int
mask_prob_ratio: bool
max_negative_advantage_importance_sampling_weight: float
megatron_topology: dict[
Literal["tp", "cp", "ep", "pp", "vpp", "etp"],
int | None,
]
moe_routing_replay_bundle: "MoeRoutingReplayBundle | None"
moe_routing_replay_path: str | None
moe_routing_replay_strict: bool
Expand Down
35 changes: 31 additions & 4 deletions src/art/local/backend.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from __future__ import annotations

from contextlib import asynccontextmanager
import gc
import hashlib
Expand All @@ -9,7 +11,7 @@
import socket
import time
from types import TracebackType
from typing import Any, AsyncIterator, Iterable, Literal, cast
from typing import TYPE_CHECKING, Any, AsyncIterator, Iterable, Literal, cast
import warnings

logger = logging.getLogger(__name__)
Expand All @@ -22,11 +24,13 @@
import polars as pl
import torch
from tqdm import auto as tqdm
from transformers import AutoImageProcessor, AutoTokenizer
from transformers.image_processing_utils import BaseImageProcessor
from transformers import AutoTokenizer
from transformers.tokenization_utils_base import PreTrainedTokenizerBase
from typing_extensions import Self

if TYPE_CHECKING:
from transformers.image_processing_utils import BaseImageProcessor

from art.utils.output_dirs import (
get_default_art_path,
get_model_dir,
Expand Down Expand Up @@ -66,7 +70,13 @@
tokenize_trajectory_groups,
)
from ..trajectories import Trajectory, TrajectoryGroup
from ..types import LocalTrainResult, Message, TrainConfig, TrainSFTConfig
from ..types import (
LocalTrainResult,
MegatronTopologyConfig,
Message,
TrainConfig,
TrainSFTConfig,
)
from ..utils import format_message, get_model_step
from .adapter_leases import (
AdapterLeaseManager,
Expand Down Expand Up @@ -410,6 +420,16 @@ async def adapter_lease(
async with pin_inference_step(model.name, step), manager.lease(step):
yield

@asynccontextmanager
async def adapter_retention_lease(
self,
model: AnyTrainableModel,
step: int,
) -> AsyncIterator[None]:
manager = self._adapter_lease_manager(model.name)
async with manager.lease(step):
yield

async def prune_model_adapters(
self,
model: AnyTrainableModel,
Expand Down Expand Up @@ -491,6 +511,8 @@ def _get_packed_tensors(
self._tokenizers[tokenizer_key] = tokenizer
if model.base_model not in self._image_processors:
try:
from transformers import AutoImageProcessor

self._image_processors[model.base_model] = (
AutoImageProcessor.from_pretrained(model.base_model, use_fast=True)
)
Expand Down Expand Up @@ -704,6 +726,7 @@ async def train( # type: ignore[override]
scale_learning_rate_by_reward_std_dev: bool = False,
logprob_calculation_chunk_size: int = 1024,
packed_sequence_length: int | None = None,
megatron_topology: MegatronTopologyConfig | None = None,
num_trajectories_learning_rate_multiplier_power: float = 0.0,
# Checkpoint behavior
save_checkpoint: bool = True,
Expand Down Expand Up @@ -764,6 +787,9 @@ async def train( # type: ignore[override]
packed_sequence_length: Packed sequence length to use for training.
When unset, Unsloth keeps the current max-length-rounded-to-2048
behavior. Required for Megatron.
megatron_topology: Parallel topology for Megatron training. When
provided, ART uses it to configure Megatron TP/CP/EP/PP/VPP/ETP
before launching the Megatron runtime.
num_trajectories_learning_rate_multiplier_power: Power for learning
rate multiplier based on number of trajectories.
save_checkpoint: Whether to save a checkpoint after training.
Expand Down Expand Up @@ -824,6 +850,7 @@ async def train( # type: ignore[override]
scale_learning_rate_by_reward_std_dev=scale_learning_rate_by_reward_std_dev,
logprob_calculation_chunk_size=logprob_calculation_chunk_size,
packed_sequence_length=packed_sequence_length,
megatron_topology=megatron_topology,
num_trajectories_learning_rate_multiplier_power=num_trajectories_learning_rate_multiplier_power,
kl_ref_adapter_path=resolved_kl_ref_adapter_path,
)
Expand Down
Loading
Loading