remove buggy and slow neonv8 kernel by marcusmueller · Pull Request #680 · gnuradio/volk

marcusmueller · 2023-10-23T17:11:33Z

as noticed by argilo when fixing the integer generation in #677 , that kernel was buggy. It seems compilers are better at building byte-swapping code than people writing SIMD intrinsics, so falling back on generic doesn't hurt.

argilo · 2023-10-23T17:20:06Z

Looks like this kernel has been slower than generic since it was added in #196. Good riddance.

argilo · 2023-10-23T17:59:41Z

For the record, the bug in the implementation was that the load (vld2q_u8) & store (vst2q_u8) were interleaved:

VLD2 loads 2 vectors from memory. It performs a 2-way de-interleave from memory to the vectors.

Signed-off-by: Marcus Müller <mmueller@gnuradio.org>

marcusmueller · 2023-10-23T18:02:52Z

this is a bit strange, it seems I need to figure out a method to avoid the #ifdef LV_HAVE_NEON when LV_HAVE_NEONV8 is defined, but I must not have an #if… LV_HAVE_NEONV8 line, otherwise the build system assumes there's an neonv8 kernel

argilo · 2023-10-23T18:11:20Z

Why do you need to avoid building the neon kernel if neonv8 is defined?

I didn't think that made sense in #668 so I removed the nested #ifdefs, which weren't working to begin with.

marcusmueller · 2023-10-23T18:13:15Z

Because on arm64 machines, the neon kernel malfunctions:
https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972909894?pr=680#step:3:2492

argilo · 2023-10-23T18:14:35Z

Ah. That seems like a bug that should be fixed.

marcusmueller · 2023-10-23T18:15:28Z

https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972917000?pr=680#step:3:2453 Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

argilo · 2023-10-23T18:16:57Z

In #668 I found that nested ifdefs prevent the neon kernel from running even on 32-bit ARM, so it's possible it's broken on 32-bit ARM too.

argilo · 2023-10-23T18:41:33Z

Yep, it's broken. Change inputPtr += 4 to inputPtr += 8 and the kernel works fine.

argilo · 2023-10-23T18:50:16Z

Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

I too noticed that. It seems on the armv7 build, NEON is not detected at all. Perhaps a bug in platform detection?

argilo · 2023-10-23T18:53:47Z

By the way, neon is slower than generic on my Raspberry Pi, 2345.84 ms vs. 1977.6 ms.

marcusmueller · 2023-10-23T18:58:26Z

Since that Pi and maybe an E310 would be the main target for that kernel: should we maybe just eradicate both?

argilo · 2023-10-23T19:01:58Z

Yeah, I'd say get rid of it.

As far as I can tell, it's been broken since it was created in 2014 (158a6b2). It only correctly swaps the first two integers in the input vector, so I can't imagine it's ever done anyone any good.

This was hidden for 9 years du to being shadowed when NEONV8 was available Signed-off-by: Marcus Müller <mmueller@gnuradio.org>

jdemel · 2023-11-04T09:05:25Z

https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972917000?pr=680#step:3:2453 Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

Unfortunately, no. If you happen to find some CI infrastructure for this, please add it. I didn't. We used to test these when TravisCI was still working.

jdemel

LGTM. The discussion concluded that these kernels were always broken.

jdemel · 2023-11-04T09:10:25Z

I'll close #606 after this PR is merged because this PR supersedes #606 . Thanks for working through this issue.

marcusmueller force-pushed the 64u_byteswap_remove_neonv8 branch from e952f32 to 34f4575 Compare October 23, 2023 17:18

argilo mentioned this pull request Oct 23, 2023

Generate random integers with uniform_int_distribution #677

Merged

64u_ byteswape: remove buggy Neonv8 protokernel

d49a8cd

Signed-off-by: Marcus Müller <mmueller@gnuradio.org>

marcusmueller force-pushed the 64u_byteswap_remove_neonv8 branch from 49d8822 to d49a8cd Compare October 23, 2023 18:01

64u_ byteswape: remove buggy Neon protokernel

2caf086

This was hidden for 9 years du to being shadowed when NEONV8 was available Signed-off-by: Marcus Müller <mmueller@gnuradio.org>

marcusmueller mentioned this pull request Oct 23, 2023

granular parallel generic kernel for 64u_byteswap #679

Open

argilo mentioned this pull request Oct 31, 2023

Use a different instruction for armv8 neon loads. #606

Closed

jdemel approved these changes Nov 4, 2023

View reviewed changes

jdemel merged commit fd20770 into gnuradio:main Nov 4, 2023

argilo mentioned this pull request Nov 4, 2023

volk_64u_byteswap_neonv8 incorrect results #605

Closed

Conversation

marcusmueller commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

marcusmueller commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

marcusmueller commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

marcusmueller commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

marcusmueller commented Oct 23, 2023

Uh oh!

argilo commented Oct 23, 2023

Uh oh!

jdemel commented Nov 4, 2023

Uh oh!

jdemel left a comment

Choose a reason for hiding this comment

Uh oh!

jdemel commented Nov 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants