Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory allocation increases unexpectedly in sparse getindex at a threshold #18051

Closed
ViralBShah opened this issue Aug 16, 2016 · 11 comments
Closed
Labels
performance Must go faster sparse Sparse arrays
Milestone

Comments

@ViralBShah
Copy link
Member

ViralBShah commented Aug 16, 2016

@pranavbhat and I were looking into sparse vector indexing performance issues, and discovered this. Basically, as the size of the input becomes larger than 512, more allocations happen, increasing by exactly 4 bytes with every increment in the input size:

foo(x) = for i=1:length(x); getindex(x, i); end

julia> @time foo(sparse(ones(100)));
  0.000044 seconds (16 allocations: 7.469 KB)

julia> @time foo(sparse(ones(512)));
  0.000066 seconds (16 allocations: 19.938 KB)

julia> @time foo(sparse(ones(513)));
  0.000068 seconds (20 allocations: 36.156 KB)

julia> @time foo(sparse(ones(520)));
  0.000070 seconds (32 allocations: 20.500 KB)
@ViralBShah ViralBShah added the sparse Sparse arrays label Aug 16, 2016
@ViralBShah
Copy link
Member Author

Perhaps it points to some type instability, but the code for sparsevector getindex looks very straightforward.

@ViralBShah
Copy link
Member Author

ViralBShah commented Aug 16, 2016

FWIW, it appears that the extra allocation is linked to the loop iteration going above 512, and that is when allocation increases. It is not linked to the size of the input sparse vector as such. I am using v"0.6.0-dev.221"

@ViralBShah ViralBShah added the performance Must go faster label Aug 16, 2016
@ViralBShah ViralBShah added this to the 0.5.x milestone Aug 16, 2016
@KristofferC
Copy link
Member

KristofferC commented Aug 16, 2016

If the length to run getindex from is a constant then the allocation dissapears.

const _100 = sparse(ones(100));
const _512 = sparse(ones(512));
const _513 = sparse(ones(513));
const _520 = sparse(ones(520));

foo(x) = for i = 1:length(x); x[i]; end
goo(x) = for i = 1:100; x[i]; end


function measure()
    @time foo(_100);
    @time foo(_512);
    @time foo(_513);
    @time foo(_520);

    @time goo(_100);
    @time goo(_512);
    @time goo(_513);
    @time goo(_520);
end
julia> measure()
  0.000008 seconds
  0.000032 seconds (2 allocations: 32 bytes)
  0.000029 seconds (4 allocations: 64 bytes)
  0.000030 seconds (18 allocations: 288 bytes)
  0.000006 seconds
  0.000006 seconds
  0.000005 seconds
  0.000005 seconds

@KristofferC
Copy link
Member

KristofferC commented Aug 16, 2016

The difference in LLVM output between foo and goo is:

goo:

define void @julia_goo_68804(%jl_value_t*) #0 {
top:
  br label %if

L2:                                               ; preds = %if
  ret void

if:                                               ; preds = %top, %if
  %"#temp#.03" = phi i64 [ 1, %top ], [ %1, %if ]
  %1 = add nuw nsw i64 %"#temp#.03", 1
  %2 = call double @julia_getindex_68796(%jl_value_t* %0, i64 %"#temp#.03") #0
  %3 = icmp eq i64 %1, 101
  br i1 %3, label %L2, label %if
}

foo:

define void @julia_foo_68795(%jl_value_t*) #0 {
top:
  %1 = bitcast %jl_value_t* %0 to i64*
  %2 = load i64, i64* %1, align 16
  %3 = icmp slt i64 %2, 1
  br i1 %3, label %L2, label %if.preheader

if.preheader:                                     ; preds = %top
  br label %if

L2.loopexit:                                      ; preds = %if
  br label %L2

L2:                                               ; preds = %L2.loopexit, %top
  ret void

if:                                               ; preds = %if.preheader, %if
  %"#temp#.03" = phi i64 [ %4, %if ], [ 1, %if.preheader ]
  %4 = add i64 %"#temp#.03", 1
  %5 = call double @julia_getindex_68796(%jl_value_t* %0, i64 %"#temp#.03") #0
  %6 = icmp eq i64 %"#temp#.03", %2
  br i1 %6, label %L2.loopexit, label %if
}

@ViralBShah ViralBShah added the compiler:codegen Generation of LLVM IR and native code label Aug 16, 2016
@JeffBezanson
Copy link
Member

Note: the runtime has a cache of pre-allocated boxes for integers in the range -512 to 512.

@ViralBShah
Copy link
Member Author

There was no impact in 0.4, and in 0.5, the impact was roughly 30-50% depending on the exact type of problem.

@JeffBezanson
Copy link
Member

The source of the allocation seems to be the index argument to searchsortedfirst getting boxed (inside _spgetindex).

@JeffBezanson
Copy link
Member

Due I believe to the sort order argument complexity (calling ord(lt,by,rev,order)). The sparse code in 0.4 worked around this by directly calling searchsortedfirst with 5 arguments.

@ViralBShah
Copy link
Member Author

Should we tweak the sparse code, or is this something that should be tweaked in the compiler? Happy to wait in case the compiler can handle it.

@JeffBezanson
Copy link
Member

I think this patch is good for now:

diff --git a/base/sort.jl b/base/sort.jl
index d388e55..91e7522 100644
--- a/base/sort.jl
+++ b/base/sort.jl
@@ -184,6 +184,7 @@ for s in [:searchsortedfirst, :searchsortedlast, :searchsorted]
         $s(v::AbstractVector, x;
            lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) =
             $s(v,x,ord(lt,by,rev,order))
+        $s(v::AbstractVector, x) = $s(v, x, Forward)
     end
 end

That fixes 2-argument calls to searchsorted*. Later we can probably pull out most of the Ordering stuff.

@ViralBShah
Copy link
Member Author

I assume you will push it.

@tkelman tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Aug 16, 2016
@JeffBezanson JeffBezanson removed the compiler:codegen Generation of LLVM IR and native code label Aug 16, 2016
JeffBezanson added a commit that referenced this issue Aug 16, 2016
fix #18051, allocation in sparse vector getindex
tkelman pushed a commit that referenced this issue Aug 20, 2016
Caused by the return type of `ord()` being unknown.

(cherry picked from commit b74bc6f)
ref #18060
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
Caused by the return type of `ord()` being unknown.
@KristofferC KristofferC removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

4 participants