Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change README rule of thumb #506

Open
ChrisRackauckas opened this issue Sep 24, 2018 · 7 comments
Open

Change README rule of thumb #506

ChrisRackauckas opened this issue Sep 24, 2018 · 7 comments
Labels
documentation documentation improvements

Comments

@ChrisRackauckas
Copy link
Member

The README says:

Note that in the current implementation, working with large StaticArrays puts a lot of stress on the compiler, and becomes slower than Base.Array as the size increases. A very rough rule of thumb is that you should consider using a normal Array for arrays larger than 100 elements. For example, the performance crossover point for a matrix multiply microbenchmark seems to be about 11x11 in julia 0.5 with default optimizations.

In Julia v1.0 there were some pretty big changes to tuple optimizations, for example JuliaLang/julia#27398 . Since SVectors are backed by tuples, this seemed to have a pretty big effect on large static arrays. For example, 11x11 used to be over the tuple inference limit, counter acted by some of the nice generated functions in StaticArrays.jl, and that made them seem to match. But without that inference limit, 11x11 seems to be much better than before.

@dpsanders mentioned in Slack

Chris Rackauckas [6:43 PM]

>For example, the performance crossover point for a matrix multiply microbenchmark seems to be about 11x11 in julia 0.5 with default optimizations.

Is probably much higher now.

David Sanders [6:47 PM]
Yes, my global optimization code in N dimensions now is super fast
(N-dimensional `IntervalBox`es just wrap an `SVector` of `Interval`s)
Something like a factor of 10 faster than with Julia 0.6

I think it might be worth re-evaluating the rule of thumb for the README since now the definition of "small" may have shifted. In another sense, "small" seems to have shifted high enough that now it's a discussion of runtime vs compilation time (compilation time can grow pretty fast for much larger static arrays, once again quoting @dpsanders

If I use SVectors of big dimension (e.g. 100, 200) there seems to be a huge compile-time penalty, but the calculations themselves are blindingly fast. Is this normal?

). So a more nuanced discussion is probably needed in Julia v1.0 land, and I think this is important since that 11x11 number is something I and others copy around as a nice heuristic which may now be old!

@andyferris
Copy link
Member

Yes... it's hard to condense down all the nuances into a sentence or two for new users. I'm definitely open to suggestions!

Out of curiousity, how high dimensional are these IntervalBoxes? (I've been playing with my own interval boxes with a similar-sounding implementation, for implementation of R-trees for 2D and 3D geometries, but this seems orthogonal to large static arrays).

@dpsanders
Copy link
Contributor

An IntervalBox wraps an SVector{N, Interval{Float64}} to represent a Cartesian product of intervals living in $N$ dimensions, e.g.

julia> using IntervalArithmetic

julia> X = IntervalBox(-10..10, 20)
[-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10] × [-10, 10]

so that X is a subset of R^20.

I can do global optimization of a function R^N -> R with, say, N=200, except that compiling the code takes ages. I'm not sure up to which value of N the calculation would run in a reasonable time if the compile-time penalty were less significant. I'm happy to give more details.

@andyferris
Copy link
Member

OK, yep.

Also note #494. The long-term goal here has always been to use loops and so-on for larger arrays, which would solve the compilation time issue. For N = 200 I would start to wonder if it makes sense to keep this data on the stack rather than the heap.

PS: random thought, what about the syntax -10..10 ⊗ -10..10 ⊗ -10..10 ⊗ ...?

@dpsanders
Copy link
Contributor

julia> (-10..10) × (-10..10) × (-10..10)
[-10, 10] × [-10, 10] × [-10, 10]

already works (but you don't want to do that for 50 of them) ;) Slight abuse of the × operator maybe.

I also wonder about heap vs stack, but it doesn't seem to be something that's easy to play with?

@andyferris
Copy link
Member

I also wonder about heap vs stack, but it doesn't seem to be something that's easy to play with?

Try compare MVector vs SVector

@ChrisRackauckas
Copy link
Member Author

MVector is still backed by a tuple? Its storage is still stack-allocated and it's just the wrapper type that gets allocated each time though?

I also wonder about heap vs stack, but it doesn't seem to be something that's easy to play with?

DiffEq has both inplace and out-of-place routes, both optimized respectively. That would probably be a good "full program" perspective as to where it's a good spot to cut it off. We can make a test that is a PDE discretization of N points and find the switching point N.

@KristofferC
Copy link
Contributor

MVector is stored on the heap (that's why you can mutate it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation documentation improvements
Projects
None yet
Development

No branches or pull requests

5 participants