-
Notifications
You must be signed in to change notification settings - Fork 99
Multinode NVL/NVLS supports #798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Binyang2014
wants to merge
50
commits into
main
Choose a base branch
from
binyli/mnnvl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
dd8b301
Scale native allreduce/allgather algos for MNNVL/MNNVLS
Binyang2014 893a08e
Enable MNNVL allreduce tuning
Binyang2014 dded5e0
Improve MNNVL allreduce tuning performance
Binyang2014 865c2bc
Optimize MNNVL allreduce without symmetric memory
Binyang2014 3bc00cb
Enable NVLS zero-copy without symmetric memory flag
Binyang2014 533f329
Tune no-sym MNNVL with RSAG zero-copy
Binyang2014 45a651b
Decouple IPC-domain hint from bootstrap nRanksPerNode
Binyang2014 2a2fca8
Rename collective ctx/kernel param nRanksPerNode to ipcDomainNranks
Binyang2014 2efda4d
Restore compile-time templated NRanksPerNode for rsag_zero_copy
Binyang2014 1c29817
Revert AllreduceRsAgZeroCopy non-symmetric ctx key tag back to ++tag
Binyang2014 7bc5e04
Reset GPU tokens before reuse
Binyang2014 9a36884
Rename gpuMemset wrapper and zero TokenPool slots in deleter
Binyang2014 987f800
Merge remote-tracking branch 'origin/main' into binyli/mnnvl
Binyang2014 6296803
Make NVLS non-zero-copy allreduce algorithms MNNVL-ready
Binyang2014 9aeeaf0
Simplify torch-integration tuning example for MPI-only multi-node tes…
Binyang2014 905b23d
Drop non-MNNVL multi_node regime from torch-integration example
Binyang2014 4a0d5b2
Simplify torch-integration tuning example
Binyang2014 307a471
Shorten verbose comments and use THROW in validateIpcDomainSpansWorld
Binyang2014 f0c6ac0
Fold validateIpcDomainSpansWorld into getIpcDomainNranks
Binyang2014 bde23ce
Revert verbose RSAG zero-copy comment; rename NRanksPerNode template …
Binyang2014 095cfff
Revert RSAG nBlocks default to 64
Binyang2014 639b80d
Tie AllreduceAllpairPacket maxBlockNum_ to MAX_IPC_DOMAIN_NRANKS - 1
Binyang2014 e8caab7
Strip preflight validation blocks from NVLS pipeline allreduce kernels
Binyang2014 7d80a33
Default torch example SYMMETRIC_MEMORY env to 1
Binyang2014 d1b04a3
NVLS zero-copy allreduce: support FP16 accumulator for FP8 inputs
Binyang2014 113d859
fix
Binyang2014 9ff7e1c
update
Binyang2014 654bcfa
update
Binyang2014 5516bdb
fix
Binyang2014 e208cc3
WIP
Binyang2014 825fc12
address hang issue
Binyang2014 224b3de
Clean up completed communicator receives
Binyang2014 0c09239
Merge branch 'main' into binyli/mnnvl
Binyang2014 7724e49
Fix lint and ROCm error alias
Copilot 24850ef
Merge branch 'main' into binyli/mnnvl
Binyang2014 ee82cc4
Merge branch 'main' into binyli/mnnvl
Binyang2014 dbebde2
Configure IPC domain per communicator
Binyang2014 93b4354
temp solution
Binyang2014 0744e80
detect ipc domain automaticlly
Binyang2014 94af88d
Fix tuning example hang
Binyang2014 f32cfb1
update
Binyang2014 594dc79
Address NVLS review feedback
seagater 18d3737
Tighten NVML IPC domain hash lookup
seagater 4db71b9
Move barrier into setupNvlsChannels and clean up NVLS pipeline state
Binyang2014 35331cf
Fix collective topology sizing
Binyang2014 ac44e98
update
Binyang2014 7308c32
merge main
Binyang2014 42ece40
Fix memory leak
Binyang2014 641420d
increase nvls memory size to 64 GB
Binyang2014 ea73a1e
WIP
Binyang2014 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.