Skip to content

feat: add MindIE-SD as optional NPU attention and compilation backend#1004

Open
blian6 wants to merge 2 commits into
vipshop:mainfrom
blian6:main
Open

feat: add MindIE-SD as optional NPU attention and compilation backend#1004
blian6 wants to merge 2 commits into
vipshop:mainfrom
blian6:main

Conversation

@blian6
Copy link
Copy Markdown

@blian6 blian6 commented May 10, 2026

Summary

Add MindIE-SD as a high-priority optional attention backend for Ascend NPU devices,
with automatic device-aware backend selection and compilation support. MindIE-SD
uses laser_attention / fused_attn_score to accelerate attention operations on Ascend
hardware, and is auto-enabled when the mindiesd package is importable.

Key Design Decisions

  • Optional dependency: import mindiesd wrapped in try/except Exception, gracefully falls back to _native_npu
  • Zero-parameter enable: compile auto-enabled when mindiesd is detected on NPU; env var CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG=1 to disable
  • Env var control: CACHE_DIT_ENABLE_MINDIESD_ATTN selects attention backend ("1"=laser, "0"=native_npu); default is "1" (laser)
  • Fits existing patterns: follows same try/except+conditional-register pattern as FlashAttention and SageAttention; extends enable_cache flow (attention → parallelism → quantization → compile)
  • NPU-only: all MindIE-SD logic gated by device-type check, zero impact on CUDA/CPU

Verification

  • Import/registration: _mindiesd_laser REGISTERED, KernelBackend.MINDIESD supported
  • Env vars: CACHE_DIT_ENABLE_MINDIESD_ATTN=0/1/laser, CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG=0/1 all effective
  • FLUX.1-dev 1024×1024 DiT-only benchmark (Ascend 910B):
    • _native_npu: 0.693s/step
    • _mindiesd_laser: 0.687s/step (laser activated, S=4608)
    • _mindiesd_laser + MindieSDBackend compile: 0.652s/step (+6% over baseline)
  • 4-card USP=4 HCCL communication + mindiesd attention: all 4 cards pass

blian added 2 commits May 10, 2026 11:32
>
> Add MindIE-SD as a high-priority optional attention backend for Ascend NPU,
> with automatic device-aware backend selection and compilation support.
>
> Modified files (8):
> - attention/backends/register.py: add _MINDIESD_LASER enum
> - attention/backends/npu.py: _mindiesd_laser_attention backend,
>   bridging to mindiesd.layers.attention_forward
> - kernels/backend.py: KernelBackend.MINDIESD enum
> - kernels/ops.py: MINDIESD kernel routing with PT fallback
> - caching/cache_interface.py: auto-select attention backend in
>   enable_cache(); auto-enable MindieSDBackend compile on NPU
> - compile/utils.py: skip CUDA-specific inductor configs on NPU
> - envs.py: CACHE_DIT_ENABLE_MINDIESD_ATTN and
>   CACHE_DIT_FORCE_DISABLE_MINDIESD_COMPILE_CONFIG env vars
> - _utils/utils.py: add _mindiesd_laser to CLI choices
>
> New file (1):
> - _utils/backend_selector.py: BackendSelector.auto_select()
>
> Verified on Ascend 910B with FLUX.1-dev 512x512 and 1024x1024.
> MindIE-SD backend matches native_npu performance with no regression,
> and compile fusion provides +6% additional speedup at 1024x1024.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant