-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of butterfly FFT with various ways of indexing #6
Conversation
Might be worthwhile to test some simplified version of my pure-Julia FFT from JuliaLang/julia#6193 (@yuyichao has an updated version of this somewhere, I forget where), too; we need to have benchmarks of optimized code too. |
The updated version in on the yyc/dftnew_rebase branch and I've just rebased it on the current master. |
The current test here simply measures differences between different ways of indexing, not between ways of computing the FFT. What would be an appropriate benchmark for the optimized FFT (which I'm looking forward to using btw)? Of course it is worthwhile in any case to benchmark the Julia implementation against the external library, but I'm not sure what that says about indexing performance. Another use case for views I've been wondering about is tensor-product generalizations. I see the multidimensional Julia FFT is based on StridedArray's and it seems to use some explicit calculations with strides. I guess one could compare this to an implementation using 1D fft's applied to views/slices, as well as to an implementation that copies each row/column/mode? I use n-dimensional ffts quite a lot in FrameFuns.jl and currently we are using the copy approach. Would that make sense as an indexing benchmark? |
@daanhb, my general concern is that benchmarking a slow and unrealistic way to compute something is not always meaningful. An ideal benchmark, to me, is an algorithm for a realistic problem that is (or should be) competitive with "serious" production implementations. (Radix-2 FFTs to begin with have not been very competitive for decades, and making lots of subarray copies and computing trig functions in inner loops only makes things worse.) ( (In FrameFuns, why aren't you just using the built-in (FFTW-based) multidimensional FFTs?) |
That being said, I agree that we need benchmarks of indexing etcetera. ~~~But in that case, why bother to compute an FFT? Why not just do a bunch of indexing in a tight loop, e.g. just summing different subarrays chosen arbitrarily? That way you get a cleaner result that only measures indexing and nothing else.~~~ Oh, see, the whole purpose of this repo is to collect quasi-realistic user codes that test array-view performance. Still, you might consider pre-computing the trigonometric factors so that you aren't benchmarking complex |
It's true this example is synthetic and that remains so after precomputing the trigonometric factors (even more so). I did this example recently simply to try out views and I'm not using it in any code. Perhaps this repository is not the place for it :-) For completeness, after precomputing all trigonometric factors, the test mainly benchmarks the overhead of using |
@stevengj Regarding FrameFuns, currently the 1D fft's are hidden from each other behind a few layers of abstractions, so it was a practical choice. It will take some well-placed multiple dispatch magic to call a multidimensional fft. I was initially not unhappy with the performance of copying slices, but that was for small problems. We will do some more systematic tests (including with the Julia fft) in the next couple of weeks. With algorithmic improvements elsewhere, the fft's became our bottleneck, so there is a clear motivation. |
Would be great to have this in for exactly these reasons, with the precomputed trig factors. |
In any case it was also good to revisit the experiment, because the initial BigFloat implementation was wrong (I must always remember that 2 times pi converts to Float64 by default). |
@stevengj Closing this old pull request again, but just for the record: using multidimensional FFT's was (of course) way way faster than the iterated 1d version with copying we had before. |
This test is a comparison of several naive implementations of the recursive butterfly algorithm for the FFT. Large numbers of views are being created in the recursion. Currently, the cost of the actual computations seems to be dominant over the overhead of creating views. Whether data is being copied or not, does not seem to matter much.
The example does show that views allocate memory (quite a lot, in this case). The test compares implementations where all data is being copied, where views/subs are created, or where views are simulated manually by passing around either strides or ranges as extra parameters. The latter approach does not allocate memory.
Perhaps by optimizing the implementation (precomputing twiddle factors comes to mind), the difference between the various ways of indexing can be made more pronounced.