-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Real to complex #21
Real to complex #21
Conversation
@@ -138,11 +139,96 @@ | |||
std::size_t my_size{}; | |||
}; | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bsl rfft backend needs to be implemented
template<class complex_value_type> | ||
void complex_dft_power2( | ||
void complex_dft_power2_dif( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the GSL documentation, the rfft can be deduced from the decimation-in-time (DIT) complex fft.
The default complex fft for powers of 2 is implemented as Decimation-in-frequency (DIF), i wrote another complex fft as DIT to test the algorithm before writing the rfft.
@@ -574,28 +574,14 @@ | |||
// -> size(out) >= N | |||
{ | |||
const std::size_t N = size(); | |||
vector_t<real_value_type> tmp(out,out+N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm that we use for rfft for powers of 2, comes naturally in-place if we use the halfcomplex representation used by FFTW.
@@ -174,6 +174,34 @@ namespace fft { namespace detail { | |||
gsl_fft_halfcomplex_wavetable *halfcomplex_wtable; | |||
gsl_fft_real_workspace *real_wspace; | |||
|
|||
void pack_halfcomplex(real_value_type* out) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm that we use for rfft for powers of 2, comes naturally in-place if we use the halfcomplex representation used by FFTW. Hence we have to pack and unpack the halfcomplex returned by GSL to make it compatible with our representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I appreciate your comments about what's going on in the code.
It's a bit unrelated to this comment, but I noticed that we could replace 2*boost::math::constants::pi<RealType>()
with boost::math::constants::two_pi<RealType>()
. This could eliminate 1 ULP of error.
} | ||
|
||
template <class T> | ||
void real_dft_power2(const T *in_first, const T *in_last, T* out, int sign) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the real-to-halfcomplex rfft for powers of 2. Changing the sign of the transform has the only effect of reversing the sign of the imaginary part in the result.
This algorithm uses half of the memory and computation than the complex fft of the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real-to-halfcomplex rfft for powers of 2
Cool. I'll give it a try as soon as i get a chance. Thanks Eduardo!
Hi @cosurgi and @ckormanyos. The following error message printed by b2 appears in the CI test and my local runs:
Do you know what could it be? |
I think it's some missing linker properties in the build rules. John is a lot better with jam files and I'm really not very good at it. I will try and figure out what's going on to give the linker what it needs/expects. But you got green in CI nonetheless, right? Cc: @Lagrang3 |
Yes. I've got green. But those test are being skipped. |
composite sizes. | ||
*/ | ||
using allocator_type = allocator_t; | ||
using ComplexType = ::boost::multiprecision::complex<T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice here that the real-to-halfcomplex routine for composite sizes is an adaptation of the complex Cooley-Tukey FFT decimation in time. In the algorithm we take advantage of the the complex arithmetic, hence we need to build a complex type from the real T
for inner calculations only. This is related to issue #16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good note. Thanks for pointing this out.
for (long i = 0; i < n; i += len) | ||
{ | ||
//std::cout << " i = " << i << "\n"; | ||
for(long k=0;2*k<=len_old;++k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here lies the optimization of real-to-halfcomplex. In the complex-to-complex routine this for loop would be for(long k=0;k<len_old;++k)
. Here instead we use half of the values of k
, the other half is redundant.
It remains to be done:
|
@@ -351,7 +351,7 @@ | |||
const T *in_first, | |||
const T *in_last, | |||
T* out, | |||
int sign, | |||
int /*sign*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sign for real transforms is not important. The default behavior is that the real-to-complex is a forward transform.
@@ -487,6 +487,137 @@ | |||
real_dft_composite_outofplace(in_first,in_last,out,sign,alloc); | |||
} | |||
|
|||
|
|||
template <class T, class allocator_t> | |||
void real_inverse_dft_composite_outofplace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The halfcomplex-to-real is again the reverse operation of the real-to-halfcomplex.
} | ||
} | ||
|
||
std::vector<T,allocator_type> tmp(out,out+n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the input is const, we need scratch space to perform the "bit-reversal". In the real-to-halfcomplex function this was not necessary, because the reversal was performed starting with the input data.
} | ||
|
||
template<class T,class Allocator_t> | ||
void real_inverse_dft_composite_inplace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helper functions appear very often, maybe we could wrap them into a single unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely. Maybe even in a separate header?
Hi @Lagrang3 just did a simple |
@ckormanyos concluding the CI - we now have full tests both for gcc, including quadmath, and clang ? Do we need msvc, or something else? |
Hi @cosurgi yes, definitely. Great point. I would like to put in some BSL-backend tests for MSVC 14.0 and 14.2 into CI. I believe the BSL stuff is the only kind of backend that MSVC is capable of compiling (and running). I will probably end up making 2 new MSVC-only jobs for 14.0 and 14.2, as these have different standards adherence for the test matrices. Is there/are thare/can you suggest any file(s) that you could recommend for BSL-only, capable of build and run in MSVC? Or do i need to configure a few of them newly myself? |
I think that |
Ahh, by BSL-only, you mean no float128, no MPFR ? EDIT: In such case I guess the solution is to add #ifdef MPFR_ALLOWED or something like that. And pass defines from the jamfile? |
Ughhh. I have to think a bit. On MSVC CI we will definitely not have:
On MSVC CI we might have:
On MSVC CI we definitely do have:
|
Ouch. Not fun. I guess that even getting |
Yes. I intend to look into this shortly. It shouldn't be that bad if we cut out the What remains, ... you may fairly enough ask? Pretty much plain I'll see how your new work(s) look in the world of MSVC... and report back with details. |
At the latest commit 0f7e425, we have already BSL real-to-complex and complex-to-real fully working for any size. The complexity is always O(n log n), even for prime numbers. The real optimization, ie. half of the computational cost, is valid only for composite sizes. I would like to think how to do the same kind of optimization for Rader's algorithm (prime sizes). Let me just push a test for correctness and then we can merge. |
Thank you Eduardo! This is an awesome achievement @Lagrang3
Perfect. Just after your push if you give me a brief chance to add some MSVC CI tests, I'll try. If that's too time-consuming, we will just stick with GCC/clan CI and I can work out a few selected MSVC tests later then |
in our convention halfcomplex-to-real is the inverse map of the real-to-halfcomplex. There's no good reason/advantage I can think of for neglecting the 1/N factor.
const std::size_t N = size(); | ||
const real_value_type inv_N = real_value_type{1.0}/N; | ||
for(unsigned int i=0;i<N;++i) | ||
out[i] *= inv_N; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when transforming halfcomplex-to-real, both GSL and FFTW, return an array which is a 1/N factor of the original real array. This could be in tune with the fact that the reverse FFT is the inverse of the forward FFT up to a 1/N factor. But the actual implementation of the halfcomplex-to-real (at least in BSL) is the reverse graph of the real-to-halfcomplex and the 1/N factor does not appear. I cannot see a good reason why the halfcomplex-to-real cannot be the inverse linear map of the real-to-halfcomplex. In the BSL this comes naturally. With the GSL and FFTW I need to multiply times 1/N.
UPD. I realized that the halfcomplex-to-real in BSL has multiplications by inverse factors of N all over. So it doesn't actually comes naturally like I previously said. @cosurgi @ckormanyos : if you agree I will change the halfcomplex-to-real to return the inverse of the real-to-halfcomplex times N, just to save some multiplications and be in tune with the FFTW and GSL conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lagrang3 yes, this is a good idea. Better to make sure that all backends produce the same kind of output. To avoid any kind of frustration for the users who have a nice working code, and only decide to switch to a different backend.
Having to put an if somewhere in the user code to multiply by 1/N, depending on the type of backend, would not be good.
{ | ||
test_r2c<fftw_rfft<double>>(n,8); | ||
test_r2c<gsl_rfft<double>>(n,16); | ||
test_r2c<bsl_rfft<double>>(n,4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that the BSL precision is better than GSL, and as good as the FFTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to multiply times 1/N
I am not totally certain on this, but I believe this is simply (or can be considered) a matter of convention. We might eventually want to scale internally on BSL in order to have the same output behavior on real-tom-half-complex. But, again, I think this can all be documented convention.
BSL precision is better than GSL, and as good as the FFTW
Great job! That is a really difficult one to achieve since FFTW is also known for both its speed, robustness as well as precision. You have really created the foundation of a nice body of work/utilities for powerful FFT math Eduardo!
Nice!
@ckormanyos @cosurgi, this PR is ready to merge. |
{ | ||
test_r2c<fftw_rfft<double>>(n,8); | ||
test_r2c<gsl_rfft<double>>(n,16); | ||
test_r2c<bsl_rfft<double>>(n,4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to multiply times 1/N
I am not totally certain on this, but I believe this is simply (or can be considered) a matter of convention. We might eventually want to scale internally on BSL in order to have the same output behavior on real-tom-half-complex. But, again, I think this can all be documented convention.
BSL precision is better than GSL, and as good as the FFTW
Great job! That is a really difficult one to achieve since FFTW is also known for both its speed, robustness as well as precision. You have really created the foundation of a nice body of work/utilities for powerful FFT math Eduardo!
Nice!
I wonder what @cosurgi thinks about 1/N scaling? |
I will augment CI where possible/helpful in a branch of my own. |
I think that the optimal approach is user convenience: make sure that all backends behave in exactly the same way. If this choice makes some algorithms not-optimal (e.g. a wasted multiplication) then provide a bool template parameter for a particular backend, which would not waste a multiplication and then would not be in sync with the other backends. |
One single |
@ckormanyos great! I will try to add a couple more tests. |
@ckormanyos I opened #26 for experimenting. |
In this PR I will push the boost backend routines to perform real to complex transforms.