Skip to content

[ROCm]: fix: reduce MoE temp memory — embedding cap, weight sum default, skip trivial specs (PR3)#4193

Open
cj401-amd wants to merge 4 commits into
AI-Hypercomputer:mainfrom
cj401-amd:cj/tmem-fixes-clean-3-moe-tmem
Open

[ROCm]: fix: reduce MoE temp memory — embedding cap, weight sum default, skip trivial specs (PR3)#4193
cj401-amd wants to merge 4 commits into
AI-Hypercomputer:mainfrom
cj401-amd:cj/tmem-fixes-clean-3-moe-tmem

fix: MoE tmem reduction — megablox 9-tuple tiling, float32_weight_sum…

0ed140e
Select commit
Loading
Failed to load commit list.
Google CLA / cla/google succeeded Jun 18, 2026 in 7s

✅ All contributors are covered under a CLA with Google

See https://cla.developers.google.com/ for more info about Google's Contributor License Agreement (CLA).

ℹ️ Googlers: Go here to view more details and manage scans for this pull request.

Details

The following contributors were found for this pull request:

0ed140e Author: @cj401-amd <ch****in​@amd.com>

(Only the first commit for a unique contributor is listed.)