Use call overload in FFTW.plan_* #11994

yuyichao · 2015-07-02T23:11:32Z

I'm aware of #6193. However, I think it is still useful to get this into 0.4 for following reasons.

This PR does not break the API
The API in the PR does not conflict with the one in WIP: new DFT api #6193 (the commits themselves might)
Unless we're going to fix the anonymous function performance issue, this PR has a non-negligible performance improvment.

I have a time propagator which uses FFT to transform between position and momentum space and this PR boost the performance by 23% (2.6s to 2.0s) and reduce the allocation number by 10x and size by 130M (2720 k -> 300k times, 440M -> 310M). These are numbers that includes the time spend for other calculations. (cleaner benchmark: Use call overload in FFTW.plan_* #11994 (comment))

A few other notes about this PR:

I breaks the PR into several commits mainly because the nature of this PR means there's a lot of code movement and having separate commits makes it easier to write, test and review.
Is this missing a assert_applicable?

yuyichao · 2015-07-02T23:15:18Z

Also worth noting that the performance improvement is 6x the one I get by specifying FFTW.MEASURE. (2.14s -> 2.04s with MEASURE vs 2.66s -> 2.04s with this PR)

Edit: Update the number. I certainly did't know how to divide just now....

yuyichao · 2015-07-02T23:35:34Z

Better benchmark. As expected, the speed up is the most visible with small input size (in my case the size is ~ 1000)

Tested with

function test_fft(ary)
    println(length(ary))
    plan = plan_fft!(copy(ary), 1:1, FFTW.MEASURE)
    iplan = plan_ifft!(copy(ary), 1:1, FFTW.MEASURE)
    plan(ary)
    gc()
    @time for i in 1:10_000
        plan(ary)
        iplan(ary)
    end
end

for s in (10, 100, 10000, 16, 128, 16384)
    test_fft(rand(Complex{Float64}, s))
end

Current master

10
  74.272 milliseconds (351 k allocations: 17652 KB)
100
  75.764 milliseconds (340 k allocations: 17188 KB)
10000
   1.572 seconds      (340 k allocations: 17188 KB)
16
  60.032 milliseconds (340 k allocations: 17188 KB)
128
  75.934 milliseconds (340 k allocations: 17188 KB)
16384
   2.661 seconds      (340 k allocations: 17188 KB)

This PR

10
  12.091 milliseconds (61495 allocations: 1250 KB)
100
   9.360 milliseconds (50000 allocations: 781 KB)
10000
   1.470 seconds      (50000 allocations: 781 KB)
16
   3.248 milliseconds (50000 allocations: 781 KB)
128
  11.136 milliseconds (50000 allocations: 781 KB)
16384
   2.485 seconds      (50000 allocations: 781 KB)

stevengj · 2015-07-03T01:55:51Z

This will make #6193 significantly harder to merge because it will introduce lots of conflicts... Continual rebasing of 6193 is hard enough as it is.

ScottPJones · 2015-07-03T02:02:12Z

I'd vote (if I had a vote!) for getting #6193 in first, and not have @stevengj spending time rebasing. He's got much more important things to spend time on, IMO.

yuyichao · 2015-07-03T02:13:50Z

@stevengj An easy (although possibly stupid) way would be to just revert this in #6193.

I would definitely prefer having #6193 if it was not in the 0.5 milestone. This is by all mean a temporary solution and I'm only bringing it up because it gives a significant enough speed up for small and medium size arrays.

I didn't read #6193 very carefully. Is the structure/main part of the current fftw.jl still there? If the plan_* functions are just changed to return a different type than I'll be happy to match the name in it. Otherwise, I would argue that getting this performance improvement in 0.4 with this PR and revert it as part of #6193 is the way to go.

P.S. can you also have a look at the place that I suspect missing a assert? (here

julia/base/fftw.jl

Line 768 in 8ad0c1c

return Z::StridedArray{T} -> begin

)

Edit: add link to the code since it was not that trivial to find in my original post.

ViralBShah · 2015-07-03T22:31:39Z

@ScottPJones I am not sure what you are voting on here and why you have a viewpoint on one developer's time is more important than another. It would be nice to restrict commenting where you are making a contribution or raising an issue. It is ok to upvote once in a while, and only if you are part of the implementation. I realize that you are excited and want to chime in on everything - but please think before chiming in, as it could end up being noise rather than adding value.

yuyichao · 2015-07-03T22:54:40Z

Actually this shouldn't introduce any bad conflict since the first commit in #6193 removes (move/split) this file and the conflict resolution should just be a reset.

It would indeed be better if this is changed to a non-breaking and no-function-added part of #6193 (basically moving code and returning functors in plan_*) but that might be too much work. I'm willing to help/do the splitting it if this plan sounds better though.

Other than that, I still believe it is nice to have this performance gain in for 0.4.

ViralBShah · 2015-07-03T23:21:42Z

Are we likely to merge #6193 for 0.4? If not, it seems worthwhile to at least have these gains on 0.4, perhaps after we branch, and target #6193 for 0.5. Also, we could revert this when we want to merge #6193 for 0.5. It appears that this is a very nice performance gain and nice to have.

yuyichao · 2015-07-03T23:25:40Z

Are we likely to merge #6193 for 0.4?

It's on 0.5 milestone so my impression is unlikely.

If not, it seems worthwhile to at least have these gains on 0.4, perhaps after we branch, and target #6193 for 0.5. Also, we could revert this when we want to merge #6193 for 0.5. It appears that this is a very nice performance gain and nice to have.

Yep. That's exactly my plan. It shouldn't be too bad and I'm waiting to see if there's anything I can change in this PR to make @stevengj's work on #6193 easier.

ViralBShah · 2015-07-08T12:02:23Z

bump @stevengj

stevengj · 2015-07-08T12:52:32Z

I'd rather just rip out the pure-Julia FFT from #6193 if people prefer that route. That should be mainly a simple matter of just deleting a couple of files from that patch.

yuyichao · 2015-07-09T19:09:33Z

Close in favor of #12087 .

yuyichao added 9 commits July 2, 2015 18:05

Replace trivial plan_* with call overload

f9a6dad

Use call overload for plan_{b,}fft

c16c520

Use call overload for plan_{b,}fft!

f665fb0

Use call overload of ScaleWrap

7158b7f

Use call overload for plan_{b,}rfft

732803a

Copying version of PlanWrapR

3751a38

Use call overload for plan_r2r

d23fad0

PlanWrapR2R for Number

fb07bd0

Use call overload for plan_r2r!

89f7891

yuyichao added the performance Must go faster label Jul 2, 2015

yuyichao changed the title Use call overload in plan_* Use call overload in FFTW.plan_* Jul 2, 2015

stevengj mentioned this pull request Jul 9, 2015

new DFT api #12087

Closed

yuyichao closed this Jul 9, 2015

yuyichao deleted the yyc/fftw branch July 19, 2015 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use call overload in FFTW.plan_* #11994

Use call overload in FFTW.plan_* #11994

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

stevengj commented Jul 3, 2015

ScottPJones commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 8, 2015

stevengj commented Jul 8, 2015

yuyichao commented Jul 9, 2015

Use call overload in FFTW.plan_* #11994

Use call overload in FFTW.plan_* #11994

Conversation

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

stevengj commented Jul 3, 2015

ScottPJones commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 3, 2015

yuyichao commented Jul 3, 2015

ViralBShah commented Jul 8, 2015

stevengj commented Jul 8, 2015

yuyichao commented Jul 9, 2015