Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit FFTW #1617

Closed
ViralBShah opened this issue Nov 27, 2012 · 9 comments
Closed

64-bit FFTW #1617

ViralBShah opened this issue Nov 27, 2012 · 9 comments

Comments

@ViralBShah
Copy link
Member

FFTW also has a 32-bit interface. As per the documentation, it seems that we have to use the guru interface to get 64-bit.

http://www.fftw.org/doc/64_002dbit-Guru-Interface.html#64_002dbit-Guru-Interface

@ViralBShah
Copy link
Member Author

@stevengj Does FFTW have a 64-bit API? I found the link above in the documentation, which suggests that one has to use the Guru interface for 64-bit.

This is in the context of #1527 and #1532.

@StefanKarpinski
Copy link
Member

You don't really need a 64-bit FFTW interface since you can just use the 32-bit interface in a loop.

@ViralBShah
Copy link
Member Author

How? It would need non-trivial work. I'd just rather use the 64-bit interface if such a thing was available.

@StefanKarpinski
Copy link
Member

Sorry, I was thinking of DSFMT rather than FFTW. You're right, of course, for FFTW you need to have a 64-bit interface.

@ViralBShah
Copy link
Member Author

Ok. DSFMT is already patched up - that was easy.

@stevengj
Copy link
Member

Yes, the 64-bit API in FFTW is the guru64 interface (http://www.fftw.org/fftw3_doc/64_002dbit-Guru-Interface.html). So, you can't use the basic interface if you want 64-bit support, but since you are creating the plans internally the messiness of calling the guru interface for simple things is hidden anyway.

PS. Would be nice to have a simple way to plan with FFTW_PATIENT. Also, there is the issue of data alignment (you need 16-byte aligned arrays if you want to get the benefit of SSE2, AVX, etc., and 32-byte aligned is better).

@ViralBShah
Copy link
Member Author

All arrays in julia are all 16-byte aligned. Is 32-byte alignment required by AVX?

Could you say a bit more about planning with FFTW_PATIENT? Do you mean exposing some of the FFTW plan APIs in julia to provide more flexibility? Nobody had asked for it, until now.

@stevengj
Copy link
Member

32-byte alignment is not required by AVX, it is just a bit faster. The good news is that 16-byte alignment is all that is required by FFTW to re-use a plan from one array to another of the same size.

I'm not surprised that no one has asked for it, since users rarely dig into performance issues or do careful benchmark comparisons in my experience. Even forgetting about FFTW_PATIENT, you are sacrificing a lot of performance by creating and destroying the plan for every FFT (which not only re-creates the plan but also recomputes the trigonometric tables etcetera each time).

For example, for a complex double-precision FFT of size 1024, @time fft(x) in Julia reports a running time of about 60us, whereas FFTW's tests/bench -oestimate 1024 reports a running time of 16us. Using FFTW_PATIENT (tests/bench -opatient 1024) results in a running time of 9us. For larger sizes, the overhead of re-creating and destroying the plan each time is less, but the advantage of FFTW_PATIENT is greater. e.g. for size 2^18, Julia takes 29ms while tests/bench -oestimate gives 25ms, but tests/bench -opatient gives a runtime of only 8ms (albeit after 5min to create the plan).

It would be nice to have something like:

  julia> p = plan_fft(x)
  julia> y = p(x);  # computes fft(x) efficiently
  julia> z = p(x.^2); # re-use plan for a different array of same dimensions/strides

That is, plan_fft would just return an optimized fft function (technically, a wrapper around a plan and fftw_execute_dft). (Re-using the plan in this way, at least for arrays of the same strides, is safe since Julia guarantees 16-byte alignment.) plan_fft(x) could default to FFTW_ESTIMATE, but you could also support plan_fft(x, FFTW_PATIENT) etcetera, and even plan_fft(x, FFTW_PATIENT, 10sec) and similar to use the fftw_timelimit mechanism to specify a time limit on the plan creation. And of course plan_fftn, plan_bfft, and similar variants.

Seems like this should be fairly easy to implement (I guess you can use finalizer to make the gc call fftw_destroy_plan when p is garbage-collected).

You'd also want to export the import/export_wisdom functions so that people could save plans.

@ViralBShah
Copy link
Member Author

Fixed in #1856

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants