Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster routines for Fourier transform (e.g., by using external libraries like FFTW) #213

Open
zinphi opened this issue Sep 21, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@zinphi
Copy link

zinphi commented Sep 21, 2023

Description

I'm experiencing a very poor performance when trying to generate longer noise time series from a PSD. For a month with 5s sampling (about 500k elements), I need to wait already several minutes for the generator to complete. Basically, generating noise from a PSD just requiers a fft transform of the noise frequency series into the time domain. Using FFTW (e.g., through Matlab), transforming a 500k vector takes about 4ms:
vec=randn(5e5,1); tic;vect=fft(vec);toc
I looked into the specific code and saw that GROOPS uses its own fft implementation. Without further investigation, I would assume that this function is the bottleneck. My suggestion is to replace the Fourier transforms (dft, dct) with fftw routines both for speed and numerical accuracy (since fftw is heavily used and tested all over the world).
The current performance is really painful when trying to generate longer contiguous noise time series for simulation purpose. Generating just the noise for all instruments for a monthly SST solution takes several hours on my machine. I did not yet dare to repeat it with a 1s sampling since I fear that the runtime scaling is far from beeing linear...

@zinphi zinphi added the enhancement New feature or request label Sep 21, 2023
@tmayerguerr
Copy link
Member

I assume you used Instrument2PowerSpectralDensity. As stated in the documentation it is is based on Lomb's method, which is based on least squares adjustment and also works with unevenly distributed data. So FFT is not used here and you must not compare the performance with it. Its intentional usage is to divide the data into short segments (arcs) with InstrumentSynchronize and compute am mean PSD over all arcs. In this case the program is fast enough for our applications.

The use of external software packages is always a trade-off and there are a lot of things to consider: Does the package exist and work on all platforms? Is the licence compatible so that we can use and publish it? Is it reasonably certain that the package will be available and maintained in the future? In GROOPS we try to keep external dependencies as low as possible.

In this specific case, we decided against using FFTW because the FFT routine is rarley used within GROOPS and FFTW is huge. We think this is really shooting with cannons at sparrows. To make it clear: The complete GROOPS consists about 180,000 lines of code only. The FFTW package would add about 300,000 lines in the .c/.h files.

regards
Torsten

@zinphi
Copy link
Author

zinphi commented Sep 22, 2023

Hi Torsten,

Many thanks again for your quick response. First of all, sorry for not beeing concise enough: the programs in question are basically all programs which begin with Noise* (such as NoiseTimeSeries) and use the NoiseGeneratorExpressionPSD class. There, in the function noise(), calls to Fourier::fft() and Fourier::synthesis() are made which do both not consider any non-uniform sampling and do internally call a recursive Fft algorithm.
As you might know, we currently try to adapt GROOPS to perfom SST simulations. For this purpose, the ability of generating longer noise time series is important since one might want to investigate the influence of the arc length while keeping the noise identical. Thus, generating the noise on arc segments would be more like a compromise and not a final solution for us.

Regarding code dependency, I can understand your reasoning. However, you might consider two additional points: firstly, GROOPS already depends indirectly on FFTW since you depend on GMT which has FFTW (among many others) on its dependency list (I needed to manually build GMT and it depends on nearly everything you can imagine). Secondly, the size of the FFTW codebase should be reason enough why you should not consider implementing something similar by your own. You probably also don't want to implement LAPACK/BLAS.

For FFTW, the answers to your questions to consider are pretty simple: Yes, FFTW basically exists on all platforms for which you can find BLAS/LAPACK (Unix/non-Unix). Yes, FFTW uses GPLv3 too. Yes, FFTW is probably one of the most used software libraries in the world.

Best regards,
Philipp

@tmayerguerr
Copy link
Member

Hi Philipp,

I would support you in implementing FFTW. The best way to do this is via pull request.

Use of the fftw library should be optional. Please define a constant in base/portable.h

// #define GROOPS_DISABLE_FFTW   // compile without FFTW library

And in base/fourier.cpp:

std::vector<std::complex<Double>> Fourier::fft(const Vector &data)
{
  try
  {
#ifdef GROOPS_DISABLE_FFTW
    // original GROOPS code
#else
    // FFTW call
#endif
  }
  catch(std::exception &e)
  {
    GROOPS_RETHROW(e)
  }
}

Similar for Fourier::synthesis() and maybe cosTransformation().

It would be very nice if you could also update INSTALL.md directly.

If you need help, we can arrange a zoom meeting.

best regards
Torsten

@zinphi
Copy link
Author

zinphi commented Sep 26, 2023

Hi Torsten,

Thanks for providing your support :)

I‘ll look into it once I have a little bit spare time. It is probably necessary to adjust the cmake configuration too to search and optionally include the fftw library. I‘ll contact you if I encounter any problems.

Best regards,
Philipp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants