Add: ChipBootstrapChannel for per-chip bootstrap handshake (L2)#608
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 20, 2026
Merged
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the ChipBootstrapChannel class, a shared-memory mailbox for cross-process chip bootstrapping, including its C++ implementation, Python bindings, and unit tests. The review feedback highlights several safety improvements for handling shared memory, such as validating buffer capacities in the constructor and ensuring that counts and strings read from the mailbox are bounds-checked to prevent memory safety vulnerabilities.
970225f to
8f2acbb
Compare
Introduce a one-shot cross-process mailbox class for parent-child bootstrap communication, independent of the task-mailbox protocol. Includes C++ implementation, nanobind Python bindings, and 7 UT cases covering in-process and fork-based cross-process scenarios. Design decisions: - Mailbox size: 4096 B (one page). HEADER_SIZE=64, ERROR_MSG_SIZE=1024, PTR_CAPACITY=376 — sufficient for all foreseeable chip buffer counts. - State machine: IDLE/SUCCESS/ERROR three states. Values 0/1/2 leave headroom for future intermediate states without serialization migration. - Memory ordering: aarch64 ldar/stlr inline asm (first, per codestyle hw-native-sys#6), x86_64 compiler barrier, __atomic_load/store fallback — same pattern as WorkerThread mailbox in worker_manager.cpp. - Error message: strncpy with explicit null termination at size-1, compatible with L4 task-mailbox error message convention. Cross-process read hardening: - Ctor rejects max_buffer_count > CHIP_BOOTSTRAP_PTR_CAPACITY so the clamp invariant holds for every subsequent read. - buffer_ptrs() clamps the shared-memory count against max_buffer_count_ so a corrupted or premature read cannot overrun the pointer region. - error_message() uses strnlen(CHIP_BOOTSTRAP_ERROR_MSG_SIZE) instead of trusting the null-terminator in shared memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8f2acbb to
b0b0f28
Compare
ChaoWao
approved these changes
Apr 20, 2026
3 tasks
ChaoWao
added a commit
to PKUZHOU/simpler
that referenced
this pull request
Apr 21, 2026
走通 hw-native-sys#592 hw-native-sys#597 hw-native-sys#605 hw-native-sys#608 hw-native-sys#609 hw-native-sys#610 hw-native-sys#613 拼起来的分布式 stack。 通过 Worker(level=3, chip_bootstrap_configs=...) 让两卡各自把所有 rank 的 input 经 CommRemotePtr 跨 rank MTE2 求和,再写回自己的 output,用 worker.copy_from 读回校验。 文件: - kernels/aiv/allreduce_kernel.cpp —— 从 hw-native-sys#307 (PKUZHOU / echo_stone) 直接搬过来,只改了一处 include 路径 ("common/comm_context.h" → "platform_comm/comm_context.h"),对齐 L1b 移动后的 header 位置。 - kernels/orchestration/allreduce_orch.cpp —— 把 ChipStorageTaskArgs 里的 5 个 scalar (input_ptr, output_ptr, nranks, root, device_ctx) 原样透给 AIV task,不走 Tensor 包装(Tensor 路径会改写指针)。 - main.py —— 2 卡 harness:per-rank input 用 SharedMemory + HostBufferStaging 在 bootstrap 阶段送进 window,init 后 unlink shm;orch_fn 每 chip add_scalar × 5 提交到 submit_next_level;copy_from 读回 output 校验。 - tests/st/workers_l3/test_allreduce_distributed_hw.py —— 挂 device_count(2) + platforms(["a2a3"]) 让 st-onboard-a2a3 自动拉起 main()。 WIP:本机只做了静态检查 (AST parse + import name 核对),没编译过 没跑过。下一步带到 2 卡 a2a3 环境调通;已知需要验证的点见 PR body。 Co-authored-by: echo_stone <liulei281@huawei.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ChipBootstrapChannel— a one-shot 4096 B cross-process mailbox for parent-child chip bootstrap handshake_task_interfaceDesign Decisions
WorkerThreadmailbox inworker_manager.cppTesting
pytest tests/ut/py/test_worker/test_bootstrap_channel.py)Part of PR #571 split plan (L2).