-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More SIMD implementations #66
Comments
I did some experiments on SIMD implementations for the other operations. However, I wasn't able to find any implementation that can be vectorized by OpenMP, so I resorted to using SIMD intrinsics (unfortunately, this is a lot harder to get right). For most operations, I wrote SSE and AVX versions. Here are my findings (the factors are speedups over the standard implementation):
Are you fine with using intrinsics? If so, should there be separate flags for SSE and AVX implementations (both are x86-specific, but quite common)? |
I'm fine with intrinsics, but not with additional flags.
|
@konsumlamm I'd like to make a release soon so that consumers could benefit from your work here. Shall I go ahead as is or do you have plans to work on |
I have an implementation of |
I'll gladly take immutable versions only. |
See also #64.
bitIndex
(ImplementbitIndex
&nthBitIndex
in C #81)nthBitIndex
(ImplementbitIndex
&nthBitIndex
in C #81)selectBits
(ImplementselectBits
&excludeBits
in C #82)excludeBits
(ImplementselectBits
&excludeBits
in C #82)reverseBits
(Use SIMD intrinsics forreverseBits
#71)The text was updated successfully, but these errors were encountered: