feat: add MindIE-SD as optional NPU attention and compilation backend by blian6 · Pull Request #1004 · vipshop/cache-dit

blian6 · 2026-05-10T03:41:00Z

Summary

Add MindIE-SD as a high-priority optional attention backend for Ascend NPU devices,
with automatic device-aware backend selection and compilation support. MindIE-SD
uses laser_attention / fused_attn_score to accelerate attention operations on Ascend
hardware, and is auto-enabled when the mindiesd package is importable.

Key Design Decisions

Optional dependency: import mindiesd wrapped in try/except Exception, gracefully falls back to _native_npu
Zero-parameter enable: compile auto-enabled when mindiesd is detected on NPU; env var CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG=1 to disable
Env var control: CACHE_DIT_ENABLE_MINDIESD_ATTN selects attention backend ("1"=laser, "0"=native_npu); default is "1" (laser)
Fits existing patterns: follows same try/except+conditional-register pattern as FlashAttention and SageAttention; extends enable_cache flow (attention → parallelism → quantization → compile)
NPU-only: all MindIE-SD logic gated by device-type check, zero impact on CUDA/CPU

Verification

Import/registration: _mindiesd_laser REGISTERED, KernelBackend.MINDIESD supported
Env vars: CACHE_DIT_ENABLE_MINDIESD_ATTN=0/1/laser, CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG=0/1 all effective
FLUX.1-dev 1024×1024 DiT-only benchmark (Ascend 910B):
- _native_npu: 0.693s/step
- _mindiesd_laser: 0.687s/step (laser activated, S=4608)
- _mindiesd_laser + MindieSDBackend compile: 0.652s/step (+6% over baseline)
4-card USP=4 HCCL communication + mindiesd attention: all 4 cards pass

> > Add MindIE-SD as a high-priority optional attention backend for Ascend NPU, > with automatic device-aware backend selection and compilation support. > > Modified files (8): > - attention/backends/register.py: add _MINDIESD_LASER enum > - attention/backends/npu.py: _mindiesd_laser_attention backend, > bridging to mindiesd.layers.attention_forward > - kernels/backend.py: KernelBackend.MINDIESD enum > - kernels/ops.py: MINDIESD kernel routing with PT fallback > - caching/cache_interface.py: auto-select attention backend in > enable_cache(); auto-enable MindieSDBackend compile on NPU > - compile/utils.py: skip CUDA-specific inductor configs on NPU > - envs.py: CACHE_DIT_ENABLE_MINDIESD_ATTN and > CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG env vars > - _utils/utils.py: add _mindiesd_laser to CLI choices > > New file (1): > - _utils/backend_selector.py: BackendSelector.auto_select() > > Verified on Ascend 910B with FLUX.1-dev 512x512 and 1024x1024. > MindIE-SD backend matches native_npu performance with no regression, > and compile fusion provides +6% additional speedup at 1024x1024.

blian added 2 commits May 10, 2026 11:32

feat: add MindIE-SD as optional NPU attention and compilation backend

5003b91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MindIE-SD as optional NPU attention and compilation backend#1004

feat: add MindIE-SD as optional NPU attention and compilation backend#1004
blian6 wants to merge 2 commits into
vipshop:mainfrom
blian6:main

blian6 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blian6 commented May 10, 2026

Summary

Key Design Decisions

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant