Description
Add Multi-Token Prediction (MTP) support to cuLA's inference kernels.
Context
MTP is an inference optimization technique that predicts multiple tokens simultaneously, improving throughput for autoregressive generation. Supporting MTP in cuLA's linear attention kernels would enable faster inference for models using this technique.
Tasks
Description
Add Multi-Token Prediction (MTP) support to cuLA's inference kernels.
Context
MTP is an inference optimization technique that predicts multiple tokens simultaneously, improving throughput for autoregressive generation. Supporting MTP in cuLA's linear attention kernels would enable faster inference for models using this technique.
Tasks