Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real to complex #21

Merged
merged 13 commits into from
Aug 11, 2021
Merged

Real to complex #21

merged 13 commits into from
Aug 11, 2021

Conversation

Lagrang3
Copy link
Collaborator

@Lagrang3 Lagrang3 commented Aug 9, 2021

In this PR I will push the boost backend routines to perform real to complex transforms.

@@ -138,11 +139,96 @@
std::size_t my_size{};
};


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bsl rfft backend needs to be implemented

template<class complex_value_type>
void complex_dft_power2(
void complex_dft_power2_dif(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the GSL documentation, the rfft can be deduced from the decimation-in-time (DIT) complex fft.
The default complex fft for powers of 2 is implemented as Decimation-in-frequency (DIF), i wrote another complex fft as DIT to test the algorithm before writing the rfft.

@@ -574,28 +574,14 @@
// -> size(out) >= N
{
const std::size_t N = size();
vector_t<real_value_type> tmp(out,out+N);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm that we use for rfft for powers of 2, comes naturally in-place if we use the halfcomplex representation used by FFTW.

@@ -174,6 +174,34 @@ namespace fft { namespace detail {
gsl_fft_halfcomplex_wavetable *halfcomplex_wtable;
gsl_fft_real_workspace *real_wspace;

void pack_halfcomplex(real_value_type* out) const
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm that we use for rfft for powers of 2, comes naturally in-place if we use the halfcomplex representation used by FFTW. Hence we have to pack and unpack the halfcomplex returned by GSL to make it compatible with our representation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I appreciate your comments about what's going on in the code.

It's a bit unrelated to this comment, but I noticed that we could replace 2*boost::math::constants::pi<RealType>() with boost::math::constants::two_pi<RealType>(). This could eliminate 1 ULP of error.

}

template <class T>
void real_dft_power2(const T *in_first, const T *in_last, T* out, int sign)
Copy link
Collaborator Author

@Lagrang3 Lagrang3 Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the real-to-halfcomplex rfft for powers of 2. Changing the sign of the transform has the only effect of reversing the sign of the imaginary part in the result.

This algorithm uses half of the memory and computation than the complex fft of the same size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real-to-halfcomplex rfft for powers of 2

Cool. I'll give it a try as soon as i get a chance. Thanks Eduardo!

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Aug 9, 2021

Hi @cosurgi and @ckormanyos. The following error message printed by b2 appears in the CI test and my local runs:

error: No best alternative for ./fft_compile
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched
error: No best alternative for ./fft_allocator
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched
error: No best alternative for ./fft_pmr_allocator
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched
error: No best alternative for ./fft_iterators
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched
error: No best alternative for ./fft_non-complex
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched
error: No best alternative for ./fft_auxiliary_functions
    next alternative: required properties: <link>static
        matched
    next alternative: required properties: <link>static
        matched

Do you know what could it be?

@ckormanyos
Copy link
Member

know what could it be

I think it's some missing linker properties in the build rules. John is a lot better with jam files and I'm really not very good at it. I will try and figure out what's going on to give the linker what it needs/expects.

But you got green in CI nonetheless, right?

Cc: @Lagrang3

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Aug 9, 2021

Yes. I've got green. But those test are being skipped.

composite sizes.
*/
using allocator_type = allocator_t;
using ComplexType = ::boost::multiprecision::complex<T>;
Copy link
Collaborator Author

@Lagrang3 Lagrang3 Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice here that the real-to-halfcomplex routine for composite sizes is an adaptation of the complex Cooley-Tukey FFT decimation in time. In the algorithm we take advantage of the the complex arithmetic, hence we need to build a complex type from the real T for inner calculations only. This is related to issue #16.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note. Thanks for pointing this out.

for (long i = 0; i < n; i += len)
{
//std::cout << " i = " << i << "\n";
for(long k=0;2*k<=len_old;++k)
Copy link
Collaborator Author

@Lagrang3 Lagrang3 Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here lies the optimization of real-to-halfcomplex. In the complex-to-complex routine this for loop would be for(long k=0;k<len_old;++k). Here instead we use half of the values of k, the other half is redundant.

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Aug 10, 2021

It remains to be done:

  1. (done) the halfcomplex-to-real for composite sizes,
  2. the real-to-halfcomplex for prime sizes,
  3. the halfcomplex-to-real for prime sizes,
  4. test the correctedness,
  5. (maybe for another PR) provide a halfcomplex class to hide the representation details.

@@ -351,7 +351,7 @@
const T *in_first,
const T *in_last,
T* out,
int sign,
int /*sign*/,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sign for real transforms is not important. The default behavior is that the real-to-complex is a forward transform.

@@ -487,6 +487,137 @@
real_dft_composite_outofplace(in_first,in_last,out,sign,alloc);
}


template <class T, class allocator_t>
void real_inverse_dft_composite_outofplace(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The halfcomplex-to-real is again the reverse operation of the real-to-halfcomplex.

}
}

std::vector<T,allocator_type> tmp(out,out+n);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the input is const, we need scratch space to perform the "bit-reversal". In the real-to-halfcomplex function this was not necessary, because the reversal was performed starting with the input data.

}

template<class T,class Allocator_t>
void real_inverse_dft_composite_inplace(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions appear very often, maybe we could wrap them into a single unit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely. Maybe even in a separate header?

@ckormanyos
Copy link
Member

Hi @Lagrang3 just did a simple ::template dot syntax change to get CI going green

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 10, 2021

@ckormanyos concluding the CI - we now have full tests both for gcc, including quadmath, and clang ? Do we need msvc, or something else?

@ckormanyos
Copy link
Member

concluding the CI - we now have full tests both for gcc, including quadmath, and clang ? Do we need msvc, or something else?

Hi @cosurgi yes, definitely. Great point.

I would like to put in some BSL-backend tests for MSVC 14.0 and 14.2 into CI. I believe the BSL stuff is the only kind of backend that MSVC is capable of compiling (and running). I will probably end up making 2 new MSVC-only jobs for 14.0 and 14.2, as these have different standards adherence for the test matrices.

Is there/are thare/can you suggest any file(s) that you could recommend for BSL-only, capable of build and run in MSVC? Or do i need to configure a few of them newly myself?

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 10, 2021

I think that test/fft_compile.cpp is a good starting point. It doesn't have (yet) the float128 or MPFR.
Then test/fft_correctedness.cpp once the first one works. Once you start it I will join you there trying to add the files example/fft*pp with MPFR etc. I would prefer that you start modifying the jamfile, it's still my weak point.

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 10, 2021

Ahh, by BSL-only, you mean no float128, no MPFR ?

EDIT: In such case I guess the solution is to add #ifdef MPFR_ALLOWED or something like that. And pass defines from the jamfile?

@ckormanyos
Copy link
Member

ckormanyos commented Aug 10, 2021

Ahh, by BSL-only, you mean no float128, no MPFR

Ughhh. I have to think a bit.

On MSVC CI we will definitely not have:

  • GSL, FFTW
  • MPFR, GMP
  • quadmath (i.e., boost::muptiprecision::float128)

On MSVC CI we might have:

  • an MPIR (Win* version of GMP) configuration available

On MSVC CI we definitely do have:

  • the three built-in is_floating_point types
  • cpp_dec_float, cpp_bin_float

Cc: @cosurgi and @Lagrang3

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 10, 2021

Ouch. Not fun. I guess that even getting test/fft_compile.cpp to work will be quite an achievement. First we could comment out MPFR and GSL there and only fundamental types with BSL-only are left.

@ckormanyos
Copy link
Member

getting test/fft_compile.cpp to work

Yes. I intend to look into this shortly. It shouldn't be that bad if we cut out the fftw_- and gsl_-backends and configure the float128 out of the relevant header files.

What remains, ... you may fairly enough ask? Pretty much plain bsl_-backend stuff for built-in floating-point types and Boost's header-only wide-floats such as cpp_dec_float and cpp_bin_float.

I'll see how your new work(s) look in the world of MSVC... and report back with details.

Cc: @cosurgi and @Lagrang3

@ckormanyos
Copy link
Member

Hi @Lagrang3 and @cosurgi aside from MSVC potential configuration(s), is this PR (looking very good now) ready to merge to develop in our fork?

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Aug 11, 2021

Hi @Lagrang3 and @cosurgi aside from MSVC potential configuration(s), is this PR (looking very good now) ready to merge to develop in our fork?

At the latest commit 0f7e425, we have already BSL real-to-complex and complex-to-real fully working for any size. The complexity is always O(n log n), even for prime numbers. The real optimization, ie. half of the computational cost, is valid only for composite sizes. I would like to think how to do the same kind of optimization for Rader's algorithm (prime sizes).
I think anyways up to here we have a good achievement. As a side note, consider that the GSL transforms are O(n^2) for prime sizes.

Let me just push a test for correctness and then we can merge.

@ckormanyos
Copy link
Member

we have already BSL real-to-complex and complex-to-real fully working for any size. The complexity is always O(n log n), even for prime numbers. The real optimization, ie. half of the computational cost, is valid only for composite sizes. I would like to think how to do the same kind of optimization for Rader's algorithm (prime sizes).
I think anyways up to here we have a good achievement.

Thank you Eduardo! This is an awesome achievement @Lagrang3

Let me just push a test for correctness and then we can merge.

Perfect. Just after your push if you give me a brief chance to add some MSVC CI tests, I'll try. If that's too time-consuming, we will just stick with GCC/clan CI and I can work out a few selected MSVC tests later then

Lagrang3 added 2 commits August 11, 2021 08:04
in our convention halfcomplex-to-real is the inverse
map of the real-to-halfcomplex. There's no good reason/advantage
I can think of for neglecting the 1/N factor.
const std::size_t N = size();
const real_value_type inv_N = real_value_type{1.0}/N;
for(unsigned int i=0;i<N;++i)
out[i] *= inv_N;
Copy link
Collaborator Author

@Lagrang3 Lagrang3 Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when transforming halfcomplex-to-real, both GSL and FFTW, return an array which is a 1/N factor of the original real array. This could be in tune with the fact that the reverse FFT is the inverse of the forward FFT up to a 1/N factor. But the actual implementation of the halfcomplex-to-real (at least in BSL) is the reverse graph of the real-to-halfcomplex and the 1/N factor does not appear. I cannot see a good reason why the halfcomplex-to-real cannot be the inverse linear map of the real-to-halfcomplex. In the BSL this comes naturally. With the GSL and FFTW I need to multiply times 1/N.

UPD. I realized that the halfcomplex-to-real in BSL has multiplications by inverse factors of N all over. So it doesn't actually comes naturally like I previously said. @cosurgi @ckormanyos : if you agree I will change the halfcomplex-to-real to return the inverse of the real-to-halfcomplex times N, just to save some multiplications and be in tune with the FFTW and GSL conventions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lagrang3 yes, this is a good idea. Better to make sure that all backends produce the same kind of output. To avoid any kind of frustration for the users who have a nice working code, and only decide to switch to a different backend.

Having to put an if somewhere in the user code to multiply by 1/N, depending on the type of backend, would not be good.

{
test_r2c<fftw_rfft<double>>(n,8);
test_r2c<gsl_rfft<double>>(n,16);
test_r2c<bsl_rfft<double>>(n,4);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that the BSL precision is better than GSL, and as good as the FFTW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to multiply times 1/N

I am not totally certain on this, but I believe this is simply (or can be considered) a matter of convention. We might eventually want to scale internally on BSL in order to have the same output behavior on real-tom-half-complex. But, again, I think this can all be documented convention.

BSL precision is better than GSL, and as good as the FFTW

Great job! That is a really difficult one to achieve since FFTW is also known for both its speed, robustness as well as precision. You have really created the foundation of a nice body of work/utilities for powerful FFT math Eduardo!

Nice!

@Lagrang3
Copy link
Collaborator Author

@ckormanyos @cosurgi, this PR is ready to merge.

{
test_r2c<fftw_rfft<double>>(n,8);
test_r2c<gsl_rfft<double>>(n,16);
test_r2c<bsl_rfft<double>>(n,4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to multiply times 1/N

I am not totally certain on this, but I believe this is simply (or can be considered) a matter of convention. We might eventually want to scale internally on BSL in order to have the same output behavior on real-tom-half-complex. But, again, I think this can all be documented convention.

BSL precision is better than GSL, and as good as the FFTW

Great job! That is a really difficult one to achieve since FFTW is also known for both its speed, robustness as well as precision. You have really created the foundation of a nice body of work/utilities for powerful FFT math Eduardo!

Nice!

@ckormanyos
Copy link
Member

I wonder what @cosurgi thinks about 1/N scaling?

@ckormanyos ckormanyos merged commit 118d059 into develop Aug 11, 2021
@ckormanyos
Copy link
Member

this PR is ready to merge
Done. This is a really nice job @Lagrang3. Complements again Eduardo.

I will augment CI where possible/helpful in a branch of my own.

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 11, 2021

I wonder what @cosurgi thinks about 1/N scaling?

I think that the optimal approach is user convenience: make sure that all backends behave in exactly the same way.

If this choice makes some algorithms not-optimal (e.g. a wasted multiplication) then provide a bool template parameter for a particular backend, which would not waste a multiplication and then would not be in sync with the other backends.

@ckormanyos
Copy link
Member

augment CI where possible/helpful

One single fft_compile.cpp configured for MSVC only as described above and now in develop is running in CI on MSVC 14.0 and 14.2. Gives us at least some sanity checks in MSVC world.

@cosurgi

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 11, 2021

@ckormanyos great! I will try to add a couple more tests.

@cosurgi
Copy link
Collaborator

cosurgi commented Aug 11, 2021

@ckormanyos I opened #26 for experimenting.

@Lagrang3 Lagrang3 deleted the real-to-complex branch August 14, 2021 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants