Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat overflows on >= 10000 inputs #50473

Closed
GiggleLiu opened this issue Jul 8, 2023 · 16 comments
Closed

cat overflows on >= 10000 inputs #50473

GiggleLiu opened this issue Jul 8, 2023 · 16 comments

Comments

@GiggleLiu
Copy link
Contributor

GiggleLiu commented Jul 8, 2023

I want to concatenate multiple arrays. The hcat function works well when the number of input arguments is large, however for cat, it errors on 10000 input argument. The error message as well as a MWE is as bellow:

julia> hcat([[1,2] for i=1:10000]...)
2×10000 Matrix{Int64}:
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2     2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2

julia> cat([[1,2] for i=1:10000]...; dims=2)
ERROR: StackOverflowError:
Stacktrace:
 [1] __cat_offset!(::Matrix{Int64}, ::Tuple{Int64, Int64}, ::Tuple{Bool, Bool}, ::Tuple{Int64, Int64}, ::Vector{Int64}, ::Vector{Int64}, ::Vararg{Vector{Int64}}) (repeats 9787 times)
   @ Base ./abstractarray.jl:1802
 [2] __cat(::Matrix{Int64}, ::Tuple{Int64, Int64}, ::Tuple{Bool, Bool}, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ Base ./abstractarray.jl:1797
 [3] _cat_t(::Int64, ::Type{Int64}, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ Base ./abstractarray.jl:1790
 [4] _cat(::Int64, ::Vector{Int64}, ::Vararg{Vector{Int64}})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/special.jl:415
 [5] cat(::Vector{Int64}, ::Vararg{Vector{Int64}}; dims::Int64)
   @ Base ./abstractarray.jl:1979
 [6] top-level scope
   @ REPL[13]:1

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 32 virtual cores
Environment:
  JULIA_PKG_SERVER = http://cn-southeast.pkg.juliacn.com/
@Seelengrab
Copy link
Contributor

Splatting arbitrarily large collections is bound to overflow at some point.

@jakobnissen
Copy link
Contributor

Having it stack overflow is a pretty bad error message for something that's essentially a compiler/language limitation.
Could we instead manually throw an error when a function is being called with e.g. more than 200 arguments?

Is there an issue tracking this problem? It's a fairly common newbie problem in Julia, and it requires specialized knowledge of the language internals to understand why it shouldn't just work.

@Seelengrab
Copy link
Contributor

It's pretty difficult to detect this kind of thing (semi) accurately, as it's not the splatting or slurping that fails here, but the recursion over that and detecting "this will throw before it finishes due to ressource constraints" is a highly irregular and hard to estimate heuristic due to being machine, ressource use AND function specific.

If you want to capture the failure mode during dispatch of the method, you're going to run into much the same issue in regards to estimating how much you're going to need - and it can't capture the allocation of that tuple either, because that needs to happen first.

@jakobnissen
Copy link
Contributor

Yes, but practically speaking, the way Julians handle this is to internalise the rule "never splat a large number of elements", so we might as well make that emit a warning.

@Seelengrab
Copy link
Contributor

Seelengrab commented Jul 9, 2023

Where do we emit that warning though? Just attach it to any method taking more than N arguments..? I imagine that's not super simple to do, as most slurping functions end up taking Vararg (they don't generally know the number of splatter arguments after all), and unconditionally attaching IO to any slurping method seems bad.

@jakobnissen
Copy link
Contributor

I imagine it could be inserted at the language level, since it's a language level limitation, and be compiled away for all function calls where the number of arguments is known at compile time

@Seelengrab
Copy link
Contributor

That's the same as attaching IO effects to any Vararg method though. If you only want to attach them conditionally depending on the length, you'll have to know how many arguments are in the collection at compile time, which won't in general be possible for the array case shown in the OP. The array case relies on constant propagation.

@GiggleLiu
Copy link
Contributor Author

GiggleLiu commented Jul 10, 2023

For funcitons with unknown number of arguments, recursion should be avoided. Such type of error in cat is very unpredictable!

By checking the code, there is another possible way to improve.

function __cat_offset!(A, shape, catdims, offsets, x, X...)
    # splitting the "work" on x from X... may reduce latency (fewer costly specializations)
    newoffsets = __cat_offset1!(A, shape, catdims, offsets, x)
    return __cat_offset!(A, shape, catdims, newoffsets, X...)
end

Is it possible to optimize out tail recursion with for loops using Julia compiler?

@KristofferC
Copy link
Member

Dup of #30796

@GiggleLiu
Copy link
Contributor Author

Shouldn't we at least improve the implementation of cat before closing both issues?
Implementing cat with recursion does not make sense, we should at least try replacing it with for loops.

@KristofferC
Copy link
Member

KristofferC commented Jul 10, 2023

Why would you not use reduce(hcat, [[1,2] for i=1:10000]) in situations like this?

Cf the doctstring for hcat:

For a large vector of arrays, reduce(hcat, A) calls an efficient method when A isa AbstractVector{<:AbstractVecOrMat}. For a vector of vectors, this can also be written stack(A).

@GiggleLiu
Copy link
Contributor Author

GiggleLiu commented Jul 10, 2023

Why would you not use reduce(hcat, [[1,2] for i=1:10000]) in situations like this?

I have a program that plays with high order tensors. Given a sample set with with 10000 configurations, for each configuration, I need to slice some tensors along certain dimensions (which is not possible to know beforehand). With the obtained 10000 sliced tensors, I need to concatenate them into a large tensor to feed them into the batched matrix multiplication routines.

Maybe the following code explains better. This is the current tensor slicing program, which is not needed if cat work for concatenating 10000 slices.

# `x` is the tensor,
# `slicedim` is the dimensions to be sliced
# `configs` is a matrix of size `(nsubindices, nsample)`, each column specifies a subindex for slicing.
function get_slice(x::AbstractArray{T}, slicedim, configs::AbstractMatrix) where T
    outdim = setdiff(1:ndims(x), slicedim)
    res = similar(x, [size(x, d) for d in outdim]..., size(configs, 2))
    return get_slice!(res, x, outdim, slicedim, configs)
end

function get_slice!(res, x::AbstractArray{T}, outdim, slicedim, configs::AbstractMatrix) where T
    xstrides = strides(x)
    @inbounds for ci in CartesianIndices(res)
        idx = 1
        # the output dimensions
        for (dim, k) in zip(outdim, ci.I)
            idx += (k-1) * xstrides[dim]
        end
        # the batch dimension
        batchidx = ci.I[end]
        for (dim, k) in zip(slicedim, view(configs, :, batchidx))
            idx += k * xstrides[dim]
        end
        res[ci] = x[idx]
    end
    return res
end

@mikmoore
Copy link
Contributor

mikmoore commented Jul 11, 2023

For a bit now I've been advocating for concat or concatenate or <bikeshed> that generalizes cat to collections like maximum generalizes max. We recently got stack, but that can only be used while inserting dimensions.

I really dislike the reduce(hcat,itr) specialization. First, it's not discoverable as intuitively it's the worst possible way to perform concatenation. reduce does pairwise reduction for every predicate except for hcat/vcat/merge on very specific input types (see methods(reduce)). The only reason to think it's anything other than a horrible idea is to know that somebody explicitly specialized that particular method. Second, it's fragile in that it's extremely easy to miss. For example,

julia> using BenchmarkTools

julia> x = [zeros(2,2) for _ in 1:100];

julia> @btime reduce(hcat, $x);
  929.167 ns (1 allocation: 3.25 KiB)

julia> @btime mapreduce(vec, hcat, $x);
  20.900 μs (299 allocations: 177.42 KiB)

julia> @btime reduce(hcat, Iterators.map(vec,$x));
  20.400 μs (299 allocations: 177.42 KiB)


julia> t = ntuple(_->zeros(2,2),100);

julia> @btime reduce(hcat, $t);
  18.900 μs (100 allocations: 170.41 KiB)

julia> @btime hcat($t...); # now hcat is better
  4.300 μs (4 allocations: 5.64 KiB)

Predicate specializations only help for the cases we can imagine beforehand. They're performance minefields.

@GiggleLiu
Copy link
Contributor Author

GiggleLiu commented Jul 11, 2023

After checking the implementation of cat, I noticed that this function is designed to be generic and low overhead. Then I agree @mikmoore 's proposal of implementing a new function concatenate makes more sense. I also agree with the point that reduce(hcat, x) is not intuitive. When @KristofferC proposed using this one, I was confused because by definition, it should allocate a lot of times. But, very unexpectedly, it is not slow. If it is not from @KristofferC 's comment, I would not even try it out.

@StefanKarpinski
Copy link
Member

How about we just cap the number of function arguments that are allowed at 1k? Anyone calling a function with that many arguments is doing something bad and should be chastised. That would allow us to give a better error message instead of crashing.

@oscardssmith
Copy link
Member

This would cause problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants