More SIMD implementations #66

konsumlamm · 2023-03-28T02:04:59Z

See also #64.

make operations work for bit vectors with offset
support more operations
- bitIndex (Implement bitIndex &nthBitIndex in C #81)
- nthBitIndex (Implement bitIndex &nthBitIndex in C #81)
- selectBits (Implement selectBits & excludeBits in C #82)
- excludeBits (Implement selectBits & excludeBits in C #82)
- reverseBits (Use SIMD intrinsics for reverseBits #71)
support mutable variants

The text was updated successfully, but these errors were encountered:

konsumlamm · 2023-04-07T02:19:16Z

I did some experiments on SIMD implementations for the other operations. However, I wasn't able to find any implementation that can be vectorized by OpenMP, so I resorted to using SIMD intrinsics (unfortunately, this is a lot harder to get right). For most operations, I wrote SSE and AVX versions. Here are my findings (the factors are speedups over the standard implementation):

bitIndex: about the same up to 1024 bits, then up to 0.6x (SSE) and 0.4x (AVX)
nthBitIndex should be similar
reverseBits: up to 0.25x (SSE) and 0.15x (AVX)
selectBits: up to 0.36x (when using -mbmi2) and 0.06 (AVX)

Are you fine with using intrinsics? If so, should there be separate flags for SSE and AVX implementations (both are x86-specific, but quite common)?

Bodigrim · 2023-04-07T08:47:37Z

I'm fine with intrinsics, but not with additional flags.

SSE2 is always available on any x86_64 CPU.
Higher intrinsics require a runtime dispatch by __get_cpuid_count flags. Here are some examples:
- https://github.com/haskell/text/blob/master/cbits/measure_off.c
- https://github.com/haskell/bytestring/blob/master/cbits/is-valid-utf8.c

Bodigrim · 2023-08-11T20:47:19Z

@konsumlamm I'd like to make a release soon so that consumers could benefit from your work here. Shall I go ahead as is or do you have plans to work on excludeBits / selectBits soon?

konsumlamm · 2023-08-11T22:12:22Z

I have an implementation of selectBits/excludeBits lying around, that I could make a PR for, but only for the immutable versions. I don't have much time currently, so I can't work on the other things rn.

Bodigrim · 2023-08-11T23:40:27Z

I'll gladly take immutable versions only.

konsumlamm mentioned this issue Apr 7, 2023

Enable -mbmi2 #68

Closed

konsumlamm mentioned this issue Apr 11, 2023

Use SIMD intrinsics for reverseBits #71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More SIMD implementations #66

More SIMD implementations #66

konsumlamm commented Mar 28, 2023 •

edited

Loading

konsumlamm commented Apr 7, 2023

Bodigrim commented Apr 7, 2023

Bodigrim commented Aug 11, 2023

konsumlamm commented Aug 11, 2023

Bodigrim commented Aug 11, 2023

More SIMD implementations #66

More SIMD implementations #66

Comments

konsumlamm commented Mar 28, 2023 • edited Loading

konsumlamm commented Apr 7, 2023

Bodigrim commented Apr 7, 2023

Bodigrim commented Aug 11, 2023

konsumlamm commented Aug 11, 2023

Bodigrim commented Aug 11, 2023

konsumlamm commented Mar 28, 2023 •

edited

Loading