Skip to content

KDA MTP (Multi-Token Prediction) support #17

@icavan

Description

@icavan

Description

Add Multi-Token Prediction (MTP) support to cuLA's inference kernels.

Context

MTP is an inference optimization technique that predicts multiple tokens simultaneously, improving throughput for autoregressive generation. Supporting MTP in cuLA's linear attention kernels would enable faster inference for models using this technique.

Tasks

  • Design MTP integration for linear attention inference kernels
  • Implement MTP support in relevant kernels (KDA, Lightning Attention)
  • Add tests and benchmarks
  • Document usage

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions