Skip to content

Stop overriding range_num_stages and range_unroll_factor for CUDA IMA workarounds#2394

Draft
choijon5 wants to merge 1 commit into
mainfrom
choijon5/stack/60
Draft

Stop overriding range_num_stages and range_unroll_factor for CUDA IMA workarounds#2394
choijon5 wants to merge 1 commit into
mainfrom
choijon5/stack/60

Conversation

@choijon5
Copy link
Copy Markdown
Contributor

@choijon5 choijon5 commented May 11, 2026

Stacked PRs:


With subprocess benchmarking, we no longer need the CUDA IMA workarounds which were made for #755 and #904.

With the workaround removed, CUDA IMA is avoided with subprocess benchmarking, while retaining the same speedup for the JSD kernel as before which needed the workaroud (#733):
image

… workarounds

stack-info: PR: #2394, branch: choijon5/stack/60
@choijon5 choijon5 force-pushed the choijon5/stack/60 branch from 3424ff6 to 3ba45d9 Compare May 11, 2026 03:27
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 11, 2026
@choijon5 choijon5 marked this pull request as draft May 11, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant