-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
64-bit FFTW #1617
Comments
You don't really need a 64-bit FFTW interface since you can just use the 32-bit interface in a loop. |
How? It would need non-trivial work. I'd just rather use the 64-bit interface if such a thing was available. |
Sorry, I was thinking of DSFMT rather than FFTW. You're right, of course, for FFTW you need to have a 64-bit interface. |
Ok. DSFMT is already patched up - that was easy. |
Yes, the 64-bit API in FFTW is the guru64 interface (http://www.fftw.org/fftw3_doc/64_002dbit-Guru-Interface.html). So, you can't use the basic interface if you want 64-bit support, but since you are creating the plans internally the messiness of calling the guru interface for simple things is hidden anyway. PS. Would be nice to have a simple way to plan with FFTW_PATIENT. Also, there is the issue of data alignment (you need 16-byte aligned arrays if you want to get the benefit of SSE2, AVX, etc., and 32-byte aligned is better). |
All arrays in julia are all 16-byte aligned. Is 32-byte alignment required by AVX? Could you say a bit more about planning with FFTW_PATIENT? Do you mean exposing some of the FFTW plan APIs in julia to provide more flexibility? Nobody had asked for it, until now. |
32-byte alignment is not required by AVX, it is just a bit faster. The good news is that 16-byte alignment is all that is required by FFTW to re-use a plan from one array to another of the same size. I'm not surprised that no one has asked for it, since users rarely dig into performance issues or do careful benchmark comparisons in my experience. Even forgetting about FFTW_PATIENT, you are sacrificing a lot of performance by creating and destroying the plan for every FFT (which not only re-creates the plan but also recomputes the trigonometric tables etcetera each time). For example, for a complex double-precision FFT of size 1024, @time fft(x) in Julia reports a running time of about 60us, whereas FFTW's tests/bench -oestimate 1024 reports a running time of 16us. Using FFTW_PATIENT (tests/bench -opatient 1024) results in a running time of 9us. For larger sizes, the overhead of re-creating and destroying the plan each time is less, but the advantage of FFTW_PATIENT is greater. e.g. for size 2^18, Julia takes 29ms while tests/bench -oestimate gives 25ms, but tests/bench -opatient gives a runtime of only 8ms (albeit after 5min to create the plan). It would be nice to have something like:
That is, plan_fft would just return an optimized fft function (technically, a wrapper around a plan and fftw_execute_dft). (Re-using the plan in this way, at least for arrays of the same strides, is safe since Julia guarantees 16-byte alignment.) plan_fft(x) could default to FFTW_ESTIMATE, but you could also support plan_fft(x, FFTW_PATIENT) etcetera, and even plan_fft(x, FFTW_PATIENT, 10sec) and similar to use the fftw_timelimit mechanism to specify a time limit on the plan creation. And of course plan_fftn, plan_bfft, and similar variants. Seems like this should be fairly easy to implement (I guess you can use finalizer to make the gc call fftw_destroy_plan when p is garbage-collected). You'd also want to export the import/export_wisdom functions so that people could save plans. |
Fixed in #1856 |
FFTW also has a 32-bit interface. As per the documentation, it seems that we have to use the guru interface to get 64-bit.
http://www.fftw.org/doc/64_002dbit-Guru-Interface.html#64_002dbit-Guru-Interface
The text was updated successfully, but these errors were encountered: