Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use call overload in FFTW.plan_* #11994

Closed
wants to merge 9 commits into from
Closed

Use call overload in FFTW.plan_* #11994

wants to merge 9 commits into from

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Jul 2, 2015

I'm aware of #6193. However, I think it is still useful to get this into 0.4 for following reasons.

  1. This PR does not break the API

  2. The API in the PR does not conflict with the one in WIP: new DFT api #6193 (the commits themselves might)

  3. Unless we're going to fix the anonymous function performance issue, this PR has a non-negligible performance improvment.

    I have a time propagator which uses FFT to transform between position and momentum space and this PR boost the performance by 23% (2.6s to 2.0s) and reduce the allocation number by 10x and size by 130M (2720 k -> 300k times, 440M -> 310M). These are numbers that includes the time spend for other calculations. (cleaner benchmark: Use call overload in FFTW.plan_* #11994 (comment))

A few other notes about this PR:

  1. I breaks the PR into several commits mainly because the nature of this PR means there's a lot of code movement and having separate commits makes it easier to write, test and review.
  2. Is this missing a assert_applicable?

@stevengj

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 2, 2015

Also worth noting that the performance improvement is 6x the one I get by specifying FFTW.MEASURE. (2.14s -> 2.04s with MEASURE vs 2.66s -> 2.04s with this PR)

Edit: Update the number. I certainly did't know how to divide just now....

@yuyichao yuyichao added the performance Must go faster label Jul 2, 2015
@yuyichao yuyichao changed the title Use call overload in plan_* Use call overload in FFTW.plan_* Jul 2, 2015
@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 2, 2015

Better benchmark. As expected, the speed up is the most visible with small input size (in my case the size is ~ 1000)

Tested with

function test_fft(ary)
    println(length(ary))
    plan = plan_fft!(copy(ary), 1:1, FFTW.MEASURE)
    iplan = plan_ifft!(copy(ary), 1:1, FFTW.MEASURE)
    plan(ary)
    gc()
    @time for i in 1:10_000
        plan(ary)
        iplan(ary)
    end
end

for s in (10, 100, 10000, 16, 128, 16384)
    test_fft(rand(Complex{Float64}, s))
end

Current master

10
  74.272 milliseconds (351 k allocations: 17652 KB)
100
  75.764 milliseconds (340 k allocations: 17188 KB)
10000
   1.572 seconds      (340 k allocations: 17188 KB)
16
  60.032 milliseconds (340 k allocations: 17188 KB)
128
  75.934 milliseconds (340 k allocations: 17188 KB)
16384
   2.661 seconds      (340 k allocations: 17188 KB)

This PR

10
  12.091 milliseconds (61495 allocations: 1250 KB)
100
   9.360 milliseconds (50000 allocations: 781 KB)
10000
   1.470 seconds      (50000 allocations: 781 KB)
16
   3.248 milliseconds (50000 allocations: 781 KB)
128
  11.136 milliseconds (50000 allocations: 781 KB)
16384
   2.485 seconds      (50000 allocations: 781 KB)

@stevengj
Copy link
Member

stevengj commented Jul 3, 2015

This will make #6193 significantly harder to merge because it will introduce lots of conflicts... Continual rebasing of 6193 is hard enough as it is.

@ScottPJones
Copy link
Contributor

I'd vote (if I had a vote!) for getting #6193 in first, and not have @stevengj spending time rebasing. He's got much more important things to spend time on, IMO.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 3, 2015

@stevengj An easy (although possibly stupid) way would be to just revert this in #6193.

I would definitely prefer having #6193 if it was not in the 0.5 milestone. This is by all mean a temporary solution and I'm only bringing it up because it gives a significant enough speed up for small and medium size arrays.

I didn't read #6193 very carefully. Is the structure/main part of the current fftw.jl still there? If the plan_* functions are just changed to return a different type than I'll be happy to match the name in it. Otherwise, I would argue that getting this performance improvement in 0.4 with this PR and revert it as part of #6193 is the way to go.

P.S. can you also have a look at the place that I suspect missing a assert? (here

return Z::StridedArray{T} -> begin
)

Edit: add link to the code since it was not that trivial to find in my original post.

@ViralBShah
Copy link
Member

@ScottPJones I am not sure what you are voting on here and why you have a viewpoint on one developer's time is more important than another. It would be nice to restrict commenting where you are making a contribution or raising an issue. It is ok to upvote once in a while, and only if you are part of the implementation. I realize that you are excited and want to chime in on everything - but please think before chiming in, as it could end up being noise rather than adding value.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 3, 2015

Actually this shouldn't introduce any bad conflict since the first commit in #6193 removes (move/split) this file and the conflict resolution should just be a reset.

It would indeed be better if this is changed to a non-breaking and no-function-added part of #6193 (basically moving code and returning functors in plan_*) but that might be too much work. I'm willing to help/do the splitting it if this plan sounds better though.

Other than that, I still believe it is nice to have this performance gain in for 0.4.

@ViralBShah
Copy link
Member

Are we likely to merge #6193 for 0.4? If not, it seems worthwhile to at least have these gains on 0.4, perhaps after we branch, and target #6193 for 0.5. Also, we could revert this when we want to merge #6193 for 0.5. It appears that this is a very nice performance gain and nice to have.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 3, 2015

Are we likely to merge #6193 for 0.4?

It's on 0.5 milestone so my impression is unlikely.

If not, it seems worthwhile to at least have these gains on 0.4, perhaps after we branch, and target #6193 for 0.5. Also, we could revert this when we want to merge #6193 for 0.5. It appears that this is a very nice performance gain and nice to have.

Yep. That's exactly my plan. It shouldn't be too bad and I'm waiting to see if there's anything I can change in this PR to make @stevengj's work on #6193 easier.

@ViralBShah
Copy link
Member

bump @stevengj

@stevengj
Copy link
Member

stevengj commented Jul 8, 2015

I'd rather just rip out the pure-Julia FFT from #6193 if people prefer that route. That should be mainly a simple matter of just deleting a couple of files from that patch.

@stevengj stevengj mentioned this pull request Jul 9, 2015
@yuyichao
Copy link
Contributor Author

yuyichao commented Jul 9, 2015

Close in favor of #12087 .

@yuyichao yuyichao closed this Jul 9, 2015
@yuyichao yuyichao deleted the yyc/fftw branch July 19, 2015 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants