Skip to content

[Roadmap] Strengthen CUDA kernel implementation #1227

@ryankert01

Description

@ryankert01

Our current implementation of compute kernel is slower than torch.compile() implementation. This Roadmap aims to at least match with torch.comile implementation.

Stage 1. Add torch implementation reference

Stage 2. Strengthen our CUDA kernel

These PRs should

  1. strengthen current numerical correctness test with hard test cases
  2. strengthen CUDA kernel implementation accordingly
  3. make sure numerical correctness test with hard test cases passes
  4. make sure we are faster or match torch.compiles speed

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions