Align Arena allocations to 256 bytes#5278
Align Arena allocations to 256 bytes#5278WeiqunZhang merged 5 commits intoAMReX-Codes:developmentfrom
Conversation
|
Oh nice, maybe I can vector align the CPU SIMD loads and stores then, too. I would have otherwise added alignas to the SoA arrays, but I guess this ends up to be the same. |
|
With template <class Simd>
constexpr bool needs_more_than_256B =
(std::memory_alignment_v<Simd> > 256);
static_assert(!needs_more_than_256B<SIMDReal>);
static_assert(!needs_more_than_256B<SIMDParticleReal>);
static_assert(!needs_more_than_256B<SIMDInt>);
static_assert(!needs_more_than_256B<SIMDIdCpu>); |
ax3l
left a comment
There was a problem hiding this comment.
This is great, thank you!
If you like, you can update AMReX_SIMD.H as well for load_1d/store_1d.
|
Since that receives a general pointer, it is not necessarily aligned. E.g., for field data or particle data that was allocated through std::allocator instead of the Arena. There would need to be additional load_1d_aligned etc. functions, which I won't add in this PR. |
|
To be safe, we should add an assertion on the alignment after hipMalloc. |
|
Even if hipMalloc does not align to 256 bytes, things should work with this PR (besides the assert). |
|
This is what I found on ROCm's github repos. There are many layers, here is a line that calls malloc with required alignment. |
|
Maybe I will add an assert for all backends, then we can run HPSF CI, and finally I will take out both asserts so it doesn't crash for users. |
|
Thinking more about this. You are absolutely right that even if hipMalloc does not align to 256 bytes, things should work with this PR (besides the assert). So maybe we should remove the assert. Then add a comment in Arena.H explaining that 256 is only forced in certain situations. For cudaMalloc, it's guaranteed in the their document to be 256. For hipMalloc, it's 256 in practice, etc. I am not sure about cuda/hip host memory |
|
/run-hpsf-gitlab-ci |
|
GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1527178. |
|
GitLab CI 1527178 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1527178. |
|
Managed memory should be aligned to 4k page sizes; otherwise, the explicit prefetch to host / device would move unrelated allocations. CUDA/HIP host memory probably is also aligned to 4k pages, again poorly documented. |
Summary
This PR increases the alignment of Arena allocations from 16 to 256 bytes. This gives a small speedup when running on GPU.
In this test the alignment affects only the particle data, while the field elements are only 8-byte aligned due to there being no padding for the 2D arrays.
This could also help with SIMD on CPU
Additional background
It was not clear from the documentation what the alignment of memory allocated by hipMalloc is. I assume it is 256, the same as cudaMalloc.
Checklist
The proposed changes: