Align Arena allocations to 256 bytes by AlexanderSinn · Pull Request #5278 · AMReX-Codes/amrex

AlexanderSinn · 2026-04-08T14:50:31Z

Summary

This PR increases the alignment of Arena allocations from 16 to 256 bytes. This gives a small speedup when running on GPU.
In this test the alignment affects only the particle data, while the field elements are only 8-byte aligned due to there being no padding for the 2D arrays.

alignment: runtime in seconds of 3 runs
512: 8.443, 8.461, 8.444
256: 8.449, 8.459, 8.446
128: 8.473, 8.466, 8.491
64:  8.506, 8.508, 8.511
32:  8.49,  8.491, 8.483
16:  8.491, 8.489, 8.486
8:   misaligned address !!!

This could also help with SIMD on CPU

Additional background

It was not clear from the documentation what the alignment of memory allocated by hipMalloc is. I assume it is 256, the same as cudaMalloc.

Checklist

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

ax3l · 2026-04-08T20:58:52Z

Oh nice, maybe I can vector align the CPU SIMD loads and stores then, too.

I would have otherwise added alignas to the SoA arrays, but I guess this ends up to be the same.

ax3l · 2026-04-08T21:21:23Z

With AMReX_SIMD enabled, you can even do a compile-time check for future architectures, e.g.,

  template <class Simd>
  constexpr bool needs_more_than_256B =
      (std::memory_alignment_v<Simd> > 256);

  static_assert(!needs_more_than_256B<SIMDReal>);
  static_assert(!needs_more_than_256B<SIMDParticleReal>);
  static_assert(!needs_more_than_256B<SIMDInt>);
  static_assert(!needs_more_than_256B<SIMDIdCpu>);

ax3l

This is great, thank you!

If you like, you can update AMReX_SIMD.H as well for load_1d/store_1d.

AlexanderSinn · 2026-04-09T10:15:25Z

Since that receives a general pointer, it is not necessarily aligned. E.g., for field data or particle data that was allocated through std::allocator instead of the Arena. There would need to be additional load_1d_aligned etc. functions, which I won't add in this PR.

WeiqunZhang · 2026-04-10T23:25:50Z

To be safe, we should add an assertion on the alignment after hipMalloc.

…o_256_bytes

AlexanderSinn · 2026-04-14T12:14:39Z

Even if hipMalloc does not align to 256 bytes, things should work with this PR (besides the assert).
Unlike CUDA GPUs, gfx90a is happy to read fully unaligned memory. For example, to read 31 bytes, the compiler generates two 16-byte read instructions where the second one has an offset of 15 bytes (CUDA does 31 1-byte reads). There could still be a performance difference, however.
With this PR, memory allocated through AMReX will have the same alignment guarantee as through the vendor API, up to 256 bytes. So the vendors determine the alignment that is good for their specific GPU.

WeiqunZhang · 2026-04-14T16:43:57Z

This is what I found on ROCm's github repos. There are many layers, here is a line that calls malloc with required alignment.
https://github.com/ROCm/rocm-systems/blob/c7d0197a5118a6006c2cb72366edeb2fc0ed8c77/projects/clr/hipamd/src/hip_vm.cpp#L114
dev_info.memBaseAddrAlign_ is the number of bits defined here
https://github.com/ROCm/rocm-systems/blob/c7d0197a5118a6006c2cb72366edeb2fc0ed8c77/projects/clr/rocclr/device/rocm/rocdevice.cpp#L1333
It's at least 256 bytes, and it's even more for debug build.
https://github.com/ROCm/clr/blob/41709ee843e3715ebec72db854e0be1119ef620a/rocclr/utils/flags.hpp#L37

AlexanderSinn · 2026-04-14T16:48:44Z

Maybe I will add an assert for all backends, then we can run HPSF CI, and finally I will take out both asserts so it doesn't crash for users.

WeiqunZhang · 2026-04-14T16:54:26Z

Thinking more about this. You are absolutely right that even if hipMalloc does not align to 256 bytes, things should work with this PR (besides the assert). So maybe we should remove the assert. Then add a comment in Arena.H explaining that 256 is only forced in certain situations. For cudaMalloc, it's guaranteed in the their document to be 256. For hipMalloc, it's 256 in practice, etc. I am not sure about cuda/hip host memory

WeiqunZhang · 2026-04-14T17:06:44Z

/run-hpsf-gitlab-ci

github-actions · 2026-04-14T17:06:58Z

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1527178.

amrex-gitlab-ci-reporter · 2026-04-14T17:38:21Z

GitLab CI 1527178 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1527178.

AlexanderSinn · 2026-04-14T19:24:58Z

Managed memory should be aligned to 4k page sizes; otherwise, the explicit prefetch to host / device would move unrelated allocations. CUDA/HIP host memory probably is also aligned to 4k pages, again poorly documented.

Align Arena allocations to 256 bytes

f0c4024

WeiqunZhang self-requested a review April 8, 2026 16:50

ax3l added the performance label Apr 8, 2026

ax3l requested a review from atmyers April 8, 2026 21:01

ax3l approved these changes Apr 8, 2026

View reviewed changes

AlexanderSinn and others added 2 commits April 14, 2026 13:45

Merge branch 'AMReX-Codes:development' into Align_Arena_allocations_t…

06b47fa

…o_256_bytes

add assert for hipMalloc

db6e03a

Add assert to all backends for testing

2216c25

AlexanderSinn commented Apr 14, 2026

View reviewed changes

Comment thread Src/Base/AMReX_Arena.cpp Outdated

AlexanderSinn commented Apr 14, 2026

View reviewed changes

Comment thread Src/Base/AMReX_Arena.H Outdated

Apply suggestions from code review

36a8364

WeiqunZhang approved these changes Apr 15, 2026

View reviewed changes

WeiqunZhang merged commit 3c7601f into AMReX-Codes:development Apr 15, 2026
74 checks passed

Conversation

AlexanderSinn commented Apr 8, 2026

Summary

Additional background

Checklist

Uh oh!

ax3l commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l left a comment

Choose a reason for hiding this comment

Uh oh!

AlexanderSinn commented Apr 9, 2026

Uh oh!

WeiqunZhang commented Apr 10, 2026

Uh oh!

AlexanderSinn commented Apr 14, 2026

Uh oh!

WeiqunZhang commented Apr 14, 2026

Uh oh!

AlexanderSinn commented Apr 14, 2026

Uh oh!

WeiqunZhang commented Apr 14, 2026

Uh oh!

WeiqunZhang commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

amrex-gitlab-ci-reporter bot commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

AlexanderSinn commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ax3l commented Apr 8, 2026 •

edited

Loading

ax3l commented Apr 8, 2026 •

edited

Loading