Skip to content

cmake: Add VOLK_STATIC_DISPATCH for compile time machine selection#858

Open
xerpi wants to merge 1 commit intognuradio:mainfrom
xerpi:static-dispatch
Open

cmake: Add VOLK_STATIC_DISPATCH for compile time machine selection#858
xerpi wants to merge 1 commit intognuradio:mainfrom
xerpi:static-dispatch

Conversation

@xerpi
Copy link
Copy Markdown
Contributor

@xerpi xerpi commented Apr 14, 2026

When VOLK_STATIC_DISPATCH is set to a machine name (e.g. neonv8, avx2_64_mmx_orc), the build generates a header only static dispatch layer that maps generic kernel names directly to the best implementation via #define and static inline wrappers. No runtime CPU detection, no function pointer indirection, no filesystem access.

This is useful for baremetal/embedded targets where the CPU is known at compile time and the runtime dispatch infrastructure (cpu_features, volk_prefs, volk_rank_archs) is not available or desirable.

The generated volk_dispatch.h either contains:

  • Static dispatch: LV_HAVE_* defines, kernel header includes, #define aliases and static inline dispatchers.
  • Dynamic dispatch: extern function pointer declarations (unchanged).

The common parts (#includes, VOLK_OR_PTR) live in a static volk.h that includes the generated volk_dispatch.h.

When static dispatch is active, ENABLE_APPS, ENABLE_TESTING, ENABLE_PROFILING and ENABLE_MODTOOL are automatically disabled. cpu_features, fmt, ORC and dlfcn dependencies are all skipped.

For example, compiling with -DVOLK_STATIC_DISPATCH="avx2_64_mmx_orc" produces a libvolk with just volk_free and volk_malloc VOLK-exported symbols, and a volk_dispatch.h header (included by volk.h) that looks like:

//! Returns the name of the machine this instance will use
static inline const char *volk_get_machine(void)
{
    return "avx2_64_mmx_orc";
}

//! Get the machine alignment in bytes
static inline size_t volk_get_alignment(void)
{
    return 32;
}

//! Is the pointer on a machine alignment boundary?
static inline bool volk_is_aligned(const void *ptr)
{
    return ((intptr_t)(ptr) & (intptr_t)31) == 0;
}

#define LV_HAVE_GENERIC 1
#define LV_HAVE_64 1
#define LV_HAVE_MMX 1
#define LV_HAVE_SSE 1
#define LV_HAVE_SSE2 1
#define LV_HAVE_SSE3 1
#define LV_HAVE_SSSE3 1
#define LV_HAVE_SSE4_1 1
#define LV_HAVE_SSE4_2 1
#define LV_HAVE_POPCOUNT 1
#define LV_HAVE_AVX 1
#define LV_HAVE_FMA 1
#define LV_HAVE_AVX2 1
#define LV_HAVE_ORC 1

#include <volk/volk_16i_32fc_dot_prod_32fc.h>
#include <volk/volk_16i_branch_4_state_8.h>
#include <volk/volk_16i_convert_8i.h>
....

#define volk_16i_32fc_dot_prod_32fc_a volk_16i_32fc_dot_prod_32fc_a_avx2_fma
#define volk_16i_32fc_dot_prod_32fc_u volk_16i_32fc_dot_prod_32fc_u_avx2_fma

static inline void volk_16i_32fc_dot_prod_32fc(lv_32fc_t*  result, const short*  input, const lv_32fc_t*  taps, unsigned int  num_points)
{
    if (volk_is_aligned(VOLK_OR_PTR(result,VOLK_OR_PTR(input,VOLK_OR_PTR(taps,0)))))
        volk_16i_32fc_dot_prod_32fc_a(result, input, taps, num_points);
    else
        volk_16i_32fc_dot_prod_32fc_u(result, input, taps, num_points);
}

#define volk_16i_branch_4_state_8_a volk_16i_branch_4_state_8_a_ssse3
#define volk_16i_branch_4_state_8_u volk_16i_branch_4_state_8_generic

static inline void volk_16i_branch_4_state_8(short*  target, short*  src0, char**  permuters, short*  cntl2, short*  cntl3, short*  scalars)
{
    if (volk_is_aligned(VOLK_OR_PTR(target,VOLK_OR_PTR(src0,VOLK_OR_PTR(permuters,VOLK_OR_PTR(cntl2,VOLK_OR_PTR(cntl3,VOLK_OR_PTR(scalars,0))))))))
        volk_16i_branch_4_state_8_a(target, src0, permuters, cntl2, cntl3, scalars);
    else
        volk_16i_branch_4_state_8_u(target, src0, permuters, cntl2, cntl3, scalars);
}

#define volk_16i_convert_8i_a volk_16i_convert_8i_a_avx2
#define volk_16i_convert_8i_u volk_16i_convert_8i_u_avx2

static inline void volk_16i_convert_8i(int8_t*  outputVector, const int16_t*  inputVector, unsigned int  num_points)
{
    if (volk_is_aligned(VOLK_OR_PTR(outputVector,VOLK_OR_PTR(inputVector,0))))
        volk_16i_convert_8i_a(outputVector, inputVector, num_points);
    else
        volk_16i_convert_8i_u(outputVector, inputVector, num_points);
}
....

With VOLK_STATIC_DISPATCH="neonv8" :

#define volk_16i_32fc_dot_prod_32fc_a volk_16i_32fc_dot_prod_32fc_neonv8
#define volk_16i_32fc_dot_prod_32fc_u volk_16i_32fc_dot_prod_32fc_neonv8

static inline void volk_16i_32fc_dot_prod_32fc(lv_32fc_t*  result, const short*  input, const lv_32fc_t*  taps, unsigned int  num_points)
{
    if (volk_is_aligned(VOLK_OR_PTR(result,VOLK_OR_PTR(input,VOLK_OR_PTR(taps,0)))))
        volk_16i_32fc_dot_prod_32fc_a(result, input, taps, num_points);
    else
        volk_16i_32fc_dot_prod_32fc_u(result, input, taps, num_points);
}

@xerpi
Copy link
Copy Markdown
Contributor Author

xerpi commented Apr 14, 2026

Only Build on ubuntu22.04 armv7 g++ failed:

  99% tests passed, 1 tests failed out of 196
  
  Total Test time (real) = 229.54 sec
  
  The following tests FAILED:
  	 43 - qa_volk_32f_log2_32f (Failed)

Could be a flaky test.

EDIT: After running again now it's passing.

@xerpi xerpi force-pushed the static-dispatch branch 2 times, most recently from 498ab7d to 706b774 Compare April 14, 2026 08:59
@xerpi
Copy link
Copy Markdown
Contributor Author

xerpi commented Apr 14, 2026

Again, qa_volk_32f_log2_32f failed...
Also, kernels/volk/volk_32f_expfast_32f.h was polluting the global #define-space, so I added some #undefs.

Copy link
Copy Markdown
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good. I couldn't review it completely yet, though.

Comment thread kernels/volk/volk_32f_expfast_32f.h Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
@jdemel
Copy link
Copy Markdown
Contributor

jdemel commented Apr 26, 2026

Only Build on ubuntu22.04 armv7 g++ failed:

  99% tests passed, 1 tests failed out of 196
  
  Total Test time (real) = 229.54 sec
  
  The following tests FAILED:
  	 43 - qa_volk_32f_log2_32f (Failed)

Could be a flaky test.

EDIT: After running again now it's passing.

yes, unfortunately it is.

@jdemel
Copy link
Copy Markdown
Contributor

jdemel commented Apr 26, 2026

I like those additions. Though I'm worried this will break the API in some way. If we break the API, we can only add this feature in a new major release "v4".

volk.h is the canonical entrypoint for everything VOLK, that's why this needs to be carefully changed. Can you go into further detail how your changes do not affect the current API?

Can you add a CI test for your case? From prior experience: everything that does not receive a CI test will break soon.

@xerpi xerpi force-pushed the static-dispatch branch from 706b774 to a9aa3b2 Compare April 27, 2026 04:46
@xerpi
Copy link
Copy Markdown
Contributor Author

xerpi commented Apr 27, 2026

I like those additions. Though I'm worried this will break the API in some way. If we break the API, we can only add this feature in a new major release "v4".

volk.h is the canonical entrypoint for everything VOLK, that's why this needs to be carefully changed. Can you go into further detail how your changes do not affect the current API?

Can you add a CI test for your case? From prior experience: everything that does not receive a CI test will break soon.

Updates:

  • Renamed Mln2, A, B, C macros in volk_32f_expfast_32f.h to VOLK_EXPFAST_* (keeping the #undefs at the end).
  • constants.c.in is now compiled in both static and dynamic dispatch builds, using #cmakedefine VOLK_STATIC_DISPATCH to conditionally include the getenv call in volk_prefix() only for dynamic dispatch. constants.h is now always installed.

Regarding API compatibility:

  • The constants.h API is now identical in both static and dynamic dispatch builds.
  • However, volk_prefs.h and volk_cpu.h are still excluded from static dispatch installs, as those APIs depend on runtime machine detection which is not applicable in a static dispatch context.

I've added a build-ubuntu-static-dispatch CI job that builds with -DVOLK_STATIC_DISPATCH=generic.

@xerpi xerpi force-pushed the static-dispatch branch 5 times, most recently from 4c9de05 to c2d8ff5 Compare April 27, 2026 05:54
When VOLK_STATIC_DISPATCH is set to a machine name (e.g. neonv8,
avx2_64_mmx_orc), the build generates a header only dispatch layer
that maps generic kernel names directly to the best implementation
via #define and static inline wrappers. No runtime CPU detection,
no function pointer indirection, no filesystem access.

This is useful for baremetal/embedded targets where the CPU is known
at compile time and the runtime dispatch infrastructure (cpu_features,
volk_prefs, volk_rank_archs) is not available or desirable.

The generated volk_dispatch.h either contains:
  Static dispatch:  LV_HAVE_* defines, kernel header includes,
                    #define aliases and static inline dispatchers
  Dynamic dispatch: extern function pointer declarations (unchanged)

The common parts (includes, VOLK_OR_PTR) live in a static volk.h
that includes the generated volk_dispatch.h.

When static dispatch is active, ENABLE_APPS, ENABLE_TESTING,
ENABLE_PROFILING and ENABLE_MODTOOL are automatically disabled.
cpu_features, fmt, ORC and dlfcn dependencies are all skipped.

Signed-off-by: Sergi Granell Escalfet <xerpi.g.12@gmail.com>
@xerpi xerpi force-pushed the static-dispatch branch from c2d8ff5 to d5a66f4 Compare April 27, 2026 05:56
@xerpi
Copy link
Copy Markdown
Contributor Author

xerpi commented Apr 27, 2026

Actually, I realized that some of the options such as disabling apps and testing are not related to static dispatch but to cross-compiling. We can still generate a static dispatch build for the host architecture and run tests without any problem. So I switched some options to be gated on CMAKE_CROSSCOMPILING instead of VOLK_STATIC_DISPATCH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants