Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster floating point comparisons #814

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

mborland
Copy link
Member

@mborland mborland commented Aug 5, 2022

No description provided.

@mborland
Copy link
Member Author

mborland commented Aug 7, 2022

@NAThompson Since you got this to work previously for 128 bit float do you see anything obvious I am missing? Error for incomplete type here.

@NAThompson
Copy link
Collaborator

Since you got this to work previously for 128 bit float do you see anything obvious I am missing?

IIRC, this was something I looked up in the quadmath manual. I couldn't find it though . . .

@mborland mborland force-pushed the fast_next branch 4 times, most recently from 2c2b13a to 86b0dbd Compare August 14, 2022 00:20
@mborland
Copy link
Member Author

@NAThompson This is good for review. I pulled 128 bit support into it's own header otherwise you would have to link quadmath anytime you used next.hpp.

@AZero13
Copy link
Contributor

AZero13 commented Oct 13, 2022

Any updates on this?

@mborland
Copy link
Member Author

Here are the before and after benchmarks @NAThompson :

Original performance (Boost 1.80.0):

 Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
 This does not affect benchmark measurements, only the metadata output.
 2022-10-15T15:24:07-07:00
 Running ./new_next_performance
 Run on (10 X 24.0916 MHz CPU s)
 CPU Caches:
   L1 Data 64 KiB
   L1 Instruction 128 KiB
   L2 Unified 4096 KiB (x10)
 Load Average: 1.86, 2.53, 5.83
 ---------------------------------------------------------------------------------
 Benchmark                                       Time             CPU   Iterations
 ---------------------------------------------------------------------------------
 float_distance<float>/2/real_time            61.4 ns         61.4 ns      9074469
 float_distance<float>/4/real_time            61.7 ns         61.7 ns     11384150
 float_distance<float>/8/real_time            61.4 ns         61.4 ns     10814604
 float_distance<float>/16/real_time           61.7 ns         61.7 ns     11348376
 float_distance<float>/32/real_time           61.4 ns         61.4 ns     11387167
 float_distance<float>/64/real_time           61.6 ns         61.6 ns     11131932
 float_distance<float>/128/real_time          61.4 ns         61.4 ns     11382029
 float_distance<float>/256/real_time          61.4 ns         61.4 ns     11307649
 float_distance<float>/512/real_time          61.4 ns         61.4 ns     11376048
 float_distance<float>/1024/real_time         61.4 ns         61.4 ns     11355748
 float_distance<float>/2048/real_time         61.8 ns         61.8 ns     11373776
 float_distance<float>/4096/real_time         61.4 ns         61.4 ns     11382368
 float_distance<float>/8192/real_time         61.4 ns         61.4 ns     11353453
 float_distance<float>/16384/real_time        61.4 ns         61.4 ns     11378298
 float_distance<float>/real_time_BigO        61.48 (1)       61.47 (1)
 float_distance<float>/real_time_RMS             0 %             0 %
 float_distance<double>/2/real_time           55.6 ns         55.6 ns     12580218
 float_distance<double>/4/real_time           55.6 ns         55.6 ns     12577835
 float_distance<double>/8/real_time           55.6 ns         55.6 ns     12564909
 float_distance<double>/16/real_time          56.2 ns         56.2 ns     12554909
 float_distance<double>/32/real_time          56.0 ns         56.0 ns     12544381
 float_distance<double>/64/real_time          55.6 ns         55.6 ns     12566488
 float_distance<double>/128/real_time         55.6 ns         55.6 ns     12499581
 float_distance<double>/256/real_time         55.6 ns         55.6 ns     12565661
 float_distance<double>/512/real_time         56.1 ns         56.1 ns     12550023
 float_distance<double>/1024/real_time        55.8 ns         55.8 ns     12568603
 float_distance<double>/2048/real_time        55.6 ns         55.6 ns     12546049
 float_distance<double>/4096/real_time        55.6 ns         55.6 ns     12528525
 float_distance<double>/8192/real_time        55.9 ns         55.9 ns     12563030
 float_distance<double>/16384/real_time       56.0 ns         56.0 ns     12447644
 float_distance<double>/real_time_BigO       55.78 (1)       55.78 (1)
 float_distance<double>/real_time_RMS            0 %             0 %

 New performance:

 Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
 This does not affect benchmark measurements, only the metadata output.
 2022-10-15T15:31:37-07:00
 Running ./new_next_performance
 Run on (10 X 24.122 MHz CPU s)
 CPU Caches:
   L1 Data 64 KiB
   L1 Instruction 128 KiB
   L2 Unified 4096 KiB (x10)
 Load Average: 2.12, 2.17, 4.26
 ---------------------------------------------------------------------------------
 Benchmark                                       Time             CPU   Iterations
 ---------------------------------------------------------------------------------
 float_distance<float>/2/real_time            15.8 ns         15.8 ns     42162717
 float_distance<float>/4/real_time            15.9 ns         15.9 ns     44213877
 float_distance<float>/8/real_time            15.8 ns         15.8 ns     43972542
 float_distance<float>/16/real_time           15.8 ns         15.8 ns     44209456
 float_distance<float>/32/real_time           15.8 ns         15.8 ns     44200244
 float_distance<float>/64/real_time           15.8 ns         15.8 ns     44239293
 float_distance<float>/128/real_time          15.8 ns         15.8 ns     44171202
 float_distance<float>/256/real_time          15.8 ns         15.8 ns     44241507
 float_distance<float>/512/real_time          15.9 ns         15.8 ns     44230034
 float_distance<float>/1024/real_time         15.8 ns         15.8 ns     44241554
 float_distance<float>/2048/real_time         15.8 ns         15.8 ns     44220802
 float_distance<float>/4096/real_time         15.8 ns         15.8 ns     44220441
 float_distance<float>/8192/real_time         15.9 ns         15.9 ns     44213994
 float_distance<float>/16384/real_time        15.8 ns         15.8 ns     44215413
 float_distance<float>/real_time_BigO        15.83 (1)       15.83 (1)
 float_distance<float>/real_time_RMS             0 %             0 %
 float_distance<double>/2/real_time           15.5 ns         15.5 ns     45098165
 float_distance<double>/4/real_time           15.6 ns         15.6 ns     45065465
 float_distance<double>/8/real_time           15.5 ns         15.5 ns     45058733
 float_distance<double>/16/real_time          15.8 ns         15.7 ns     45078404
 float_distance<double>/32/real_time          15.5 ns         15.5 ns     44832734
 float_distance<double>/64/real_time          15.5 ns         15.5 ns     45077303
 float_distance<double>/128/real_time         15.5 ns         15.5 ns     45067255
 float_distance<double>/256/real_time         15.5 ns         15.5 ns     45073844
 float_distance<double>/512/real_time         15.6 ns         15.6 ns     45109342
 float_distance<double>/1024/real_time        15.5 ns         15.5 ns     44845180
 float_distance<double>/2048/real_time        15.5 ns         15.5 ns     45051846
 float_distance<double>/4096/real_time        15.5 ns         15.5 ns     45064317
 float_distance<double>/8192/real_time        15.5 ns         15.5 ns     45115653
 float_distance<double>/16384/real_time       15.5 ns         15.5 ns     45067642
 float_distance<double>/real_time_BigO       15.54 (1)       15.54 (1)
 float_distance<double>/real_time_RMS            0 %             0 %


BOOST_MATH_ASSERT(fast_float_distance(float_advance(val, 4), val) == -4);
BOOST_MATH_ASSERT(fast_float_distance(float_advance(val, -4), val) == 4);
if(std::numeric_limits<T>::is_specialized && (std::numeric_limits<T>::has_denorm == std::denorm_present))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the fast float distance unconditionally require denorm support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just type punning so as long as the floating point and fixed width integer are the same number of bits you should be fine. I don't know of a platform without denorm support to test on.

@NAThompson
Copy link
Collaborator

@mborland : Does it need a fast_float_distance header? Could we not just improve the performance of the current implementation?

@jzmaddock : This is looking like it's about ready to go; might want to take a look.

@mborland
Copy link
Member Author

@mborland : Does it need a fast_float_distance header? Could we not just improve the performance of the current implementation?

The float and double cases improve upon the current implementation. I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

@NAThompson
Copy link
Collaborator

I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

Wait, I thought we had to link libquadmath to use __float128 . . .

@NAThompson
Copy link
Collaborator

I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

Could we use a judicious __has_include to workaround?

Also, 99% of the value will be from float and double . . .

@mborland
Copy link
Member Author

mborland commented Mar 2, 2023

Having the include isn't enough in this case. You have to explicitly link to -lquadmath.

@mborland
Copy link
Member Author

@jzmaddock Can you please take a look at this one? @NAThompson hit me up about merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Floating-Point Comparison Performance Can boost::math::float_distance be sped up?
4 participants