Add amrex::LaunchRaw#4926
Conversation
|
I don't understand why the HYPRE and SUNDIALS tests keep failing. Maybe it is a CUDA compiler bug? Edit: Fixed by changed the input type of the device lambda from |
|
|
Codex: • The new LaunchRaw API is not fully usable on SYCL for its advertised 2D/3D cases, and the newly added GNUmake test cannot be built from its checked-in path. Those issues make the patch incorrect as submitted. Full review comments:
|
|
For the SYCL issue handle above, Codex suggests, |
|
The work-group-size runtime issue still exists. Codex suggests the following. Note that it's the diff against your branch with all the previous Codex changes. |
|
Re: MT > 1 on CPU, could you add a message to static_assert? |
|
Added. For SYCL it is getting a bit more complicated than I expected. Maybe we could just use a 1D range and split the block index manually using FastDivmodU64? |
|
Okay |
|
Codex: • The patch introduces backend regressions: CPU-only builds now use an incomplete IntVectND type in LaunchHandler. Full review comments:
|
|
Can you start GPU CI again? |
|
/run-hpsf-gitlab-ci |
|
GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1476241. |
|
GitLab CI 1476241 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1476241. |
|
It's one of the known minor issues that we define gpuStream_t for CUDA/HIP in AMReX_Control.H and for SYCL in AMReX_GpuTypes.H. We probably should move them to AMReX_GpuTypes.H (and add appropriate CUDA/HIP headers). Initially Arena only included GpuControl.H, which in turn included GpuTypes.H. But now GpuTypes.H is removed from GpuControl.H. If we move gpuStream_t to GpuTypes.H, Arena.H will no longer need to include GpuControl.H since you just added GpuTypes.H. There is another minor issue (found by AI recently) that can be fixed. GpuTypes.H uses macros defined in AMReX_Qualifiers.H. I was planning to fix it. But maybe you can just fix it in this PR. |
|
/run-hpsf-gitlab-ci |
|
GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1476793. |
|
GitLab CI 1476793 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1476793. |
Summary
This PR aims to provide a unified interface to be able to write kernels using shared memory and __syncthreads for CUDA, HIP and SYCL without the need to use ifdefs.
The number of threads per block is always a compile-time known 1D value, while the number of blocks can be 1d, 2d or 3d using the build-in platform indexes like blockIdx.y etc.
Additional background
Example of an
amrex::LaunchRawkernel, which fuses a transpose operation in shared memory with data preprocess and postprocess stencils. (Only works on GPUs due to threads_per_block > 1)Checklist
The proposed changes: