in-place fftshift and in-place shifted ffts #20696

JKrehl · 2017-02-20T18:00:08Z

For various reasons I have large multidimensional complex datasets (sth like 512^3) which I need both in real space and Fourier transformed, both times centered around 0, i.e. separated by ifftshift(fft(fftshift(A))). Allocating a new array for every transform is a waste of good RAM.

I have not found any in-place fftshift so I wrote my own. The basic idea is to exchange four memory places over cross ( a,b = b,a ) which gives an atomically contained operation and the fftshift can be achieved using (almost) exclusively this operation.

Of the n-dimensional array a subset of the axes are to be shifted (named TS in the code) and one of these axes has to be iterated only half over (denoted HA). Actually half of the n-dimensional space has to be iterated over and if split along a not-to-be-shifted axis I cannot use the over-cross exchange operation.
The case of an axis of odd length has to be considered on its own. If it is not the half iterated-over axis it is sufficient to formulate the shift parameters correctly. It it is iterated over (indicated by HAEV and that is sometimes unavoidable) one element has to be separately saved before the iteration, a lop-sided cross-exchange be used ( a,b-1 = b,a ) and the saved element copied back in. In order that only one element has to be saved in this case the half-length-iteration is pulled into the innemost loop while otherwise the loops are nested as the elements lie in memory (assuming Fortran order).

It's called swaphs2! not to interfere with the builtin fftshift and swaphs! is a doctored fftshift based on circshift! which cannot operate in-place.

using Base.Cartesian

##############################################################################
# an n-dimensional array is shifted along a given subset of its axes for a
# distance of half of the axes length (rounded downwards) as is done by 
# common implementations of fftshift


@generated function _swaphs2!{T, N, TS, HA, HAEV}(dest::AbstractArray{T,N}, src::AbstractArray{T,N}, sizes::NTuple{N, Int}, ::Type{Val{TS}}, ::Type{Val{HA}}, ::Type{Val{HAEV}})
    shift_syms = ((Symbol("sh_", i) for i in 1:N)...)
    wshift_syms = ((Symbol("wsh_", i) for i in 1:N)...)
    
    x_syms = ((Symbol("x_", i) for i in 1:N)...)
    sx_syms = ((TS[i] ? Symbol("sx_", i) : x_syms[i] for i in 1:N)...)
    bsx_syms = ((!HAEV && HA==i ? Symbol("bsx_", i) : sx_syms[i] for i in 1:N)...)
    
    ex = :(($(Expr(:ref, :dest, x_syms...)), $(Expr(:ref, :dest, bsx_syms...))) = ($(Expr(:ref, :src, sx_syms...)), $(Expr(:ref, :src, x_syms...))))
    
    if !HAEV
        ex = quote
            $(x_syms[HA]) = $(shift_syms[HA])+1
            tmp = $(Expr(:ref, :src, x_syms...))
            for $(x_syms[HA]) in 1:sizes[$HA]-$(shift_syms[HA])-1
                $(sx_syms[HA]) = $(x_syms[HA]) + $(shift_syms[HA])+1
                $(bsx_syms[HA]) = $(sx_syms[HA])-1
                $ex
            end
            $(x_syms[HA]) = sizes[$HA]
            $(Expr(:ref, :dest, x_syms...)) = tmp
        end
    end
    
    for i in 1:N
        if i==HA 
            if HAEV
                ex = quote
                    for $(x_syms[HA]) in 1:sizes[$HA]-$(shift_syms[HA])
                        $(sx_syms[HA]) = $(x_syms[HA]) + $(shift_syms[HA])
                        $ex
                    end
                end
            else
                nothing
            end
        else
            if TS[i]
                ex = quote
                    for $(x_syms[i]) in 1:sizes[$i]-$(shift_syms[i])
                        $(sx_syms[i]) = $(x_syms[i]) + $(shift_syms[i])
                        $ex
                    end
                    for $(x_syms[i]) in sizes[$i]-$(shift_syms[i])+1:sizes[$i]
                        $(sx_syms[i]) = $(x_syms[i]) + $(wshift_syms[i])
                        $ex
                    end
                end
            else
                ex = quote
                    for $(x_syms[i]) in 1:sizes[$i]
                        $ex
                    end
                end
            end
        end
    end
    
    quote
        $(Expr(:inbounds, true))
        @nexprs $N i->(sh_i = div(sizes[i], 2))
        @nexprs $N i->(wsh_i = div(sizes[i], 2)-sizes[i])
        
        $ex
        
        $(Expr(:inbounds, :pop))
        dest
    end
end

function swaphs2!{T, N}(dest::AbstractArray{T,N}, src::AbstractArray{T,N}, dim)
    @assert size(dest) == size(src)
    
    sizes = size(src)
    toshift = ((i in dim for i in 1:N)...)
    evensize = map(iseven, sizes)
    halfaxis = findlast(map(&, toshift, evensize))
    if halfaxis==0
        halfaxis = findfirst(toshift)
    end
    
    _swaphs2!(dest, src, sizes, Val{toshift}, Val{halfaxis}, Val{evensize[halfaxis]})
end

Of course, based on this shifted ffts that operate in-place can be formulated quite easily.

Is there any interest in this functionality at all?
And in such a (quite unreadable) approach?

If so I could tinker together a testing suite (I have tested it so far against fftshift in quite a number of situations and in benchmarking it reliably beat fftshift, even with fftshifts array allocation overhead removed). Also comment the code and maybe structure it more (this current state is optimized for brevity).

As a bye the bye question: Should I make such a question/contribution a PR from the start?

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2017-02-20T20:58:46Z

cc @stevengj

stevengj · 2017-02-20T22:37:46Z

Seems like a dup of #19923.

JKrehl · 2017-02-20T22:42:46Z

The function circshift! is not in-place, it does not accept the same array as both arguments. As it is based on copy operations (without intermediate caching) it cannot facilitate in-place shifting.

Maybe I should clarify what I mean with in-place here: for a given array A the function fftshift! should be invokable fftshift!(A,A) and no allocate any meaningful amounts of memory on the way.

stevengj · 2017-02-21T15:24:54Z

@JKrehl, as discussed in that issue, there is some desire to have a completely in-place circshift!. of which fftshift! would then be a special case. (It's no harder to write a general in-place circshift! than it is to write an in-place fftshift!, by the algorithm I suggested in #16032, so if one is to do this it should be via the general case. Indeed the general circshift-by-reverse algorithm is considerably shorter and simpler than the fftshift! code proposed above.)

JKrehl · 2017-02-21T16:07:37Z

Why does fftshift! need to be a special case? Since it swaps the half-spaces an algorithm is possible that only modifies each memory cell once and has very benign memory access patterns (it has achieved higher performance than a fftshift! based on the current circshift! by having a higher L2 hit ratio all the while having a lower IPC and not parallelising or vectoring) in reasonable cases (odd-axis length looks worse but still faster than copy-based circshift).

Is there no place for a bespoke half-space swap (or shift because that's the same here) in the light that this problem can be solved more efficiently?

What's the current status of the truly in-place circshift!?

stevengj · 2017-02-21T17:34:35Z

Yes, the fftshift! special case can be solved slightly more efficiently than circshift! because it avoids an extra pass over the array, but both of them have good spatial locality for cache utilization. (Again, I'm comparing to the true in-place circshift! ala the one in suggested in #16032, not the current one that uses a buffer.) But I'm skeptical that the performance difference matters (in cases where you are doing fftshift, you are also presumably doing fft, and the latter is far more expensive).

I'd rather see the implementation effort go into a more general-purpose function (circshift!) than into a more specialized one. Especially if the general-purpose code is no longer than the specialized code, as seems to be the case here. The more code we have, the higher the maintenance cost we incur in the long run. We especially want to avoid complicated @generated functions except where it is really essential.

stevengj · 2017-02-21T17:53:14Z

(By the way, if you really care about performance, a bunch of individual swap loops along each dimension, which is what it looks like your @generated function produces, is actually far from optimal along the dimensions that are not contiguous in memory. See e.g. the section "2d optimized methods (same shift for every row)" in this homework solutions notebook for performance discussions in a related problem — the case of fftshift along only the 2nd dimension of a 2d array is a special case of what is considered in that notebook. But again, in this application I'm not convinced that such constant-factor performance improvements are worth the effort to pursue.)

JKrehl · 2017-02-21T21:25:49Z

So is it just by lack of manhours that this hasn't been implemented yet?
Is it worth getting into?

There is one swap expression around which loops in Fortran order (except for the odd-axis-length case) are nested (in all the ex = quote expressions). When shifting only the second axis, the inner loop is just an iteration.
I considered reshaping arrays to reduce the number of nested loops, as well as using flat indexes, but one has to stop somewhere,

stevengj · 2017-02-21T22:12:54Z

@JKrehl, yes, mostly, although there is also not a complete consensus on what set of functions should be included in Base for this. See also #19923.

JKrehl · 2017-02-22T10:31:02Z

@stevengj
Something like that?
Shall I do a testset/some benchmarks/a PR?
Or maybe explain what I did exactly? :)

using Base.Cartesian
function _splitreverse_loop_nester(ex, half, i, x, rx, hex)
    quote
        $rx = split[$i]
        for $x in 1:$(half ? :(div(split[$i], 2)) : :(split[$i]))
            $ex
            $rx -= 1
        end
        $(half ? :(if isodd(split[$i]); $x=$rx=div(split[$i]+1, 2); $hex; end) : nothing)
        $rx = size(src, $i)
        for $x in split[$i]+1:$(half ? :(div(size(src, $i)+split[$i], 2)) : :(size(src, $i)))
            $ex
            $rx -= 1
        end
        $(half ? :(if isodd(size(src, $i)-split[$i]); $x=$rx=div(size(src, $i)+split[$i]+1, 2); $hex; end) : nothing)
    end
end

@generated function splitreverse!{T,N,I<:Integer}(dest::AbstractArray{T,N}, src::AbstractArray{T,N}, split::NTuple{N, I})
    xs = ((Symbol("x_", i) for i in 1:N)...)
    rxs = ((Symbol("rx_", i) for i in 1:N)...)
    
    ex = :((dest[$(xs...)], dest[$(rxs...)]) = (src[$(rxs...)], src[$(xs...)]))
    hex = copy(ex)
    
    for i in 1:N
        hex = _splitreverse_loop_nester(ex, true, i, xs[i], rxs[i], hex)
        ex = _splitreverse_loop_nester(ex, false, i, xs[i], rxs[i], nothing)
    end
    
    quote
        @nexprs $N i->(size_i = size(src, i))
        @inbounds $hex
        dest
    end
end

function circshift!{T,N,I<:Integer}(dest::AbstractArray{T,N}, src::AbstractArray{T,N}, shifts::NTuple{N, I})
    @assert size(dest) == size(src)
    
    splits = map(mod, map(-, shifts), size(src))

    splitreverse!(dest, src, splits)
    splitreverse!(dest, dest, size(src))
end

circshift!{I<:Integer}(dest::AbstractVector, src::AbstractVector, shifts::I) = circshift!(dest, src, (shifts,))

Now with correct handling of odd-length outermost axis. The mechanism complicates the code code generation a bit.

Some tests, this thing passes:

@testset begin
    @test begin A = collect(1:6); circshift!(A, A, 0) == Int[1, 2, 3, 4, 5, 6] end
    @test begin A = collect(1:5); circshift!(A, A, 2) == Int[4, 5, 1, 2, 3] end
    @test begin A = collect(1:5); circshift!(A, A, 7) == Int[4, 5, 1, 2, 3] end 
    @test begin A = collect(1:7); circshift!(A, A, -4) == Int[5, 6, 7, 1, 2, 3, 4] end
    begin
        A = reshape(collect(1:30), (5,6))
        circshift!(A, A, (2,4))
        @test A== [14 19 24 29 4 9; 
                           15 20 25 30 5 10; 
                            11 16 21 26 1 6; 
                            12 17 22 27 2 7; 
                            13 18 23 28 3 8]
        A = reshape(collect(1:90), (5,6,3))
        @test circshift!(copy(A), A, (1,2,2)) == circshift!(A, A, (1,2,2))
        A = reshape(collect(1:90), (5,6,3))
        B = copy(A)
        @test circshift!(B, A, (3,3,1)) == circshift!(A, A, (3,3,1))
        B = copy(A)
        circshift!(A, A, (0,0,1))
        circshift!(A, A, (4,0,0))
        circshift!(A, A, (0,1,-1))
        @test A == circshift!(B, B, (4,1,0))
    end
end;

KristofferC · 2018-04-05T12:10:02Z

Seems like there is nothing concrete here to do. Also https://github.com/JuliaMath/FFTW.jl is moved so fft specific things could be reopened at that repo.

ararslan added the maths Mathematical functions label Feb 20, 2017

This was referenced Feb 26, 2017

consolidate ror/rol and circshift #19923

Closed

proposal for multidimensional in-place circshift! #20849

Open

KristofferC closed this as completed Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in-place fftshift and in-place shifted ffts #20696

in-place fftshift and in-place shifted ffts #20696

JKrehl commented Feb 20, 2017 •

edited

Loading

StefanKarpinski commented Feb 20, 2017

stevengj commented Feb 20, 2017

JKrehl commented Feb 20, 2017 •

edited

Loading

stevengj commented Feb 21, 2017 •

edited

Loading

JKrehl commented Feb 21, 2017 •

edited

Loading

stevengj commented Feb 21, 2017 •

edited

Loading

stevengj commented Feb 21, 2017

JKrehl commented Feb 21, 2017

stevengj commented Feb 21, 2017

JKrehl commented Feb 22, 2017 •

edited

Loading

KristofferC commented Apr 5, 2018 •

edited

Loading

in-place fftshift and in-place shifted ffts #20696

in-place fftshift and in-place shifted ffts #20696

Comments

JKrehl commented Feb 20, 2017 • edited Loading

StefanKarpinski commented Feb 20, 2017

stevengj commented Feb 20, 2017

JKrehl commented Feb 20, 2017 • edited Loading

stevengj commented Feb 21, 2017 • edited Loading

JKrehl commented Feb 21, 2017 • edited Loading

stevengj commented Feb 21, 2017 • edited Loading

stevengj commented Feb 21, 2017

JKrehl commented Feb 21, 2017

stevengj commented Feb 21, 2017

JKrehl commented Feb 22, 2017 • edited Loading

KristofferC commented Apr 5, 2018 • edited Loading

JKrehl commented Feb 20, 2017 •

edited

Loading

JKrehl commented Feb 20, 2017 •

edited

Loading

stevengj commented Feb 21, 2017 •

edited

Loading

JKrehl commented Feb 21, 2017 •

edited

Loading

stevengj commented Feb 21, 2017 •

edited

Loading

JKrehl commented Feb 22, 2017 •

edited

Loading

KristofferC commented Apr 5, 2018 •

edited

Loading