Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extensible bounds checking removal #7799

Closed
ivarne opened this issue Jul 31, 2014 · 56 comments
Closed

extensible bounds checking removal #7799

ivarne opened this issue Jul 31, 2014 · 56 comments
Milestone

Comments

@ivarne
Copy link
Member

ivarne commented Jul 31, 2014

For user written short functions that have a huge overhead from bounds checking, it would be nice if it was possible to hook into the boundscheck system with @inbounds in order to let the user be able to disable manual bounds checking inside the function.

This might be beneficial for SubString, SubArray, and many wrapping types that have to check bounds before forwarding to the internal storage.

One suggestion is to have a @bounds macro that you can wrap the optional checking code in, but that seems somewhat brittle.

@JeffBezanson JeffBezanson changed the title use [@]inbounds in user code extensible bounds checking Jul 31, 2014
@timholy
Copy link
Member

timholy commented Aug 1, 2014

Yep, for me this is a priority with 0.4. My presumption is that the right way forward is to add the Expr(:withmeta, block, metadata) functionality #3796 (comment), which would provide a general mechanism for annotating blocks of code for the compiler.

@ivarne ivarne changed the title extensible bounds checking extensible bounds checking removal Aug 1, 2014
@timholy timholy mentioned this issue Aug 10, 2014
15 tasks
@timholy
Copy link
Member

timholy commented Aug 10, 2014

I'd recommend we rename the current Expr(:boundscheck, false) to Expr(:checkbounds, false), because it's an imperative statement. Then we can use

@boundscheck 1 <= i <= length(a) || throw(BoundsError())

as a declarative statement.

@kmsquire
Copy link
Member

I'm wondering if having both of those combined defined might be a little confusing...?

@kmsquire
Copy link
Member

Whoops! Combined => defined

@timholy
Copy link
Member

timholy commented Aug 10, 2014

Users won't see both. Expr(:boundscheck, false) is what's generated by @inbounds now. I'm only thinking in terms of internal cleanliness.

@kmsquire
Copy link
Member

Okay, thanks for the clarification. Cheers!

@simonster
Copy link
Member

Another strategy would be:

  • Implement unsafe_getindex and unsafe_setindex! to call the arrayref/arrayset intrinsics without a bounds check
  • Make the @inbounds macro just rewrite getindex/setindex! to unsafe_getindex/unsafe_setindex!, rather than hooking into the codegen as it does now

This would be extensible, and also stop @inbounds from affecting inlined expressions, which would fix the segfault case in #7427 (comment). There's probably some reason we're not already doing this, though.

@timholy
Copy link
Member

timholy commented Aug 10, 2014

That seems like an interesting strategy.

@JeffBezanson
Copy link
Member

That does sound like a good approach. The existing design comes from (1) not wanting to add names, (2) considering the behavior with inlining a feature. The idea was to automatically catch getindex calls that just wrap other getindex calls.

To really break this into orthogonal components, you'd define unsafe_getindex, and checkbounds, and have a single getindex that calls both. unsafe_getindex would generally be implemented by making other unsafe_getindex calls. Technically I think this design is good, but I also don't like the idea of lots of ugly library code that has to call unsafe_getindex everywhere instead of using [], and that you can't define getindex directly for new types.

@timholy
Copy link
Member

timholy commented Aug 10, 2014

Yeah, I too worry that it's a bit ugly.

@timholy
Copy link
Member

timholy commented Aug 10, 2014

You also may want to create a type that can't have its bounds-checks disabled. For example, there are times where you really don't want to check bounds upon construction (see examples in #4044). In such cases, you might want to implement a getindex for the type that is never unsafe. I guess you could still call it unsafe_getindex even if it's not true. But having a getindex that doesn't change its behavior under @inbounds seems cleaner.

@simonster
Copy link
Member

My original thought was that we'd define a unsafe_getindex(x, i...) = getindex(x, i...) fallback, so you could define getindex directly for new types if you don't care about bounds check removal, but to get @inbounds to do anything you'd have to opt into bounds check removal by defining unsafe_getindex. This may be a good or bad thing compared to the current @inbounds design depending on whether you privilege DRY or being able to use @inbounds in generic code without concern of segfaulting if someone passes you a type with a bad getindex method.

The orthogonal design is clearly less verbose. Ideally if you define getindex, it would be called for both x[i] and @inbounds x[i], or if you define unsafe_getindex, x[i] would call checkbounds(x, i); unsafe_getindex(x, i) and @inbounds x[i] would call unsafe_getindex(x, i). But that seems to imply fallbacks for both methods that call the other, which would lead to a stack overflow if neither are defined.

As far as ugliness of library code, if @inbounds is implemented as described, you could write:

unsafe_getindex(x::MyType, i) = @inbounds x.a[i]

which doesn't seem so ugly.

@timholy
Copy link
Member

timholy commented Aug 11, 2014

Yep, fair enough. I think this could work.

Perhaps the best argument for the other approach is that it's part of a whole new suite of capabilities that stem from being able to talk directly to the compiler. @inline, @please_run_LLVM's_SLPvectorizer_on_this_block, etc. But I think your approach is more selective in how it works with inlining, so it warrants serious consideration. As you say, I guess the fundamental issue is what we want to have happen: should @inbounds effectively propagate across layers of inlining? I'm not convinced it should.

@simonster
Copy link
Member

Returning to this issue, I think there are cases where we don't want @inbounds to propagate across inlining, and the current behavior is a bit unsafe. For example, in reduce.jl, we have:

    @inbounds fx1 = evaluate(f, A[ifirst])
    @inbounds fx2 = evaluate(f, A[ifirst+=1])

This is only okay because at present we can't inline function-valued arguments, so we can never inline evaluate(f, x). If evaluate(f, x) gets inlined and contains a getindex operation, we don't want to elide the bounds checks, because the user didn't ask for that. This code should really be written as:

    @inbounds v1 = A[ifirst]
    fx1 = evaluate(f, v1)
    @inbounds v2 = A[ifirst+=1]
    fx2 = evaluate(f, v2)

which would be safe, but is awkward. In general the effects of @inbounds seem difficult to reason about since they depend on what the compiler chooses to inline. There is also a tendency to put @inbounds around blocks of code in library functions. This is similarly unsafe unless the code only calls functions that we know don't involve any unsafe getindex operations, which can never really be the case for functions that accept user-defined types. With my proposal, @inbounds would really just claim that all indexing in the block is within the bounds of the respective arrays, and the only way adding @inbounds can cause code to segfault is if there is a bug in the code wrapped in @inbounds or a bug in unsafe_getindex. If I've convinced you, I may take a stab at implementing this soon 😄.

@timholy
Copy link
Member

timholy commented Aug 27, 2014

I've had the same thoughts. It would be a bit of a pity to not have a way to @inbounds "all the way down," but I agree that we probably don't want that behavior casually, and the current hybrid is unfortunate. I have wondered whether a reasonable solution might be to cause @inbounds to propagate all the way down, but for types like in #4044 define getindex with bounds-checking hardwired (no way to disable).

@mbauman
Copy link
Member

mbauman commented Mar 25, 2015

Another syntax that this should ideally be able to support is iteration loops like:

@inbounds for elt in A
    
end

This is a tough case because we've generally talked about @inbounds not propagating through methods, but here we want it to affect the getindex call within next. Ideally, I think next could be written in such a way to opt-in to @inbounds propagation:

function next(itr, state)
    @boundscheck checkbounds(itr,state)
    @inbounds x = itr[state]
    return (x, state+1)
end

This is effectively #10614 (comment) with per-method compilation settings… and unfortunately #8227 would not be able to support this sort of thing.

@simonster
Copy link
Member

That's a good point. I think the Unsafe/Unchecked module proposal from #8227 could do this:

Base.next(itr, state) = (itr[state], state+1)
Base.Unchecked.next(itr, state) = @inbounds (itr[state], state+1)

We could have a @propagateinbounds macro that automatically defines the Unsafe version.

If @inbounds only has an effect for types that opt in, it seems we could safely lower for elt in A to call @inbounds next(...) instead of next(...) and eliminate the need for @inbounds in the code above. If next and done are correct, for elt in A can't go out of bounds.

@mbauman
Copy link
Member

mbauman commented Mar 25, 2015

I'm still very hesitant about having separate functions (be they unsafe_* or Unchecked.*). Would it be possible for users to add their own functions to their own Unchecked module? Otherwise, I don't like that some Base functions are deemed special… and that list keeps getting longer: getindex, setindex!, next, sub2ind, and more (maybe even arithmetic?).

It also just seems like having two functions is asking for trouble. It's the kind of thing that would drive me batty during debugging if one happened to have an error while the other didn't. I know the typical way to write these functions will be f(…) = (checkbounds(…); Unchecked.f(…)), but it feels backwards to me: I think the re-direction should occur around checkbounds and not the code for the method itself.

@simonster
Copy link
Member

I think you are right that two generic functions is not really a viable solution if both getindex and its unchecked cousin can be defined for abstract types. We'd end up with cases where we'd call a less specialized unchecked getindex where we should be calling the more specialized ordinary getindex.

The problem is that, at the codegen level, either we make @inbounds only apply to inlined functions (which is fine if the code is correct, but makes things less predictable), or we need to compile and choose between two functions at the native code level, which would potentially introduce additional complexity.

I'm left wondering how hard it would be to implement range analysis and automatic bounds check elimination to the point where we can get good performance in the majority of the cases where it matters, and sidestep this whole issue.

@carnaval
Copy link
Contributor

carnaval commented Apr 8, 2015

On Wed, Apr 8, 2015 at 6:00 PM, Simon Kornblith [email protected]
wrote:

I'm left wondering how hard it would be to implement range analysis and
automatic bounds check elimination to the point where we can get good
performance in the majority of the cases where it matters, and sidestep
this whole issue.

You'd be surprised at how quick the analysis would have to give up. Given
that arrays are mutable, as soon as a reference to the array escapes
anywhere we don't have any guarantees on its size anymore without resorting
to heavier tools (interprocedural, alias, ...).
An easier project IMO would be to hoist the bounds check out of loops, then
you only have to ensure that the array's dimension don't get mutated inside
the loop (or only increasing). I don't think that eliminating the checks is
really useful once they are out of the hot path since those are quite cheap.

@mbauman
Copy link
Member

mbauman commented Apr 8, 2015

Hoisting bounds checks already works fairly well… just so long as everything inlines properly (checkbounds doesn't inline its checks currently, whereas I believe that arrayref does, cf. 700a69b). (Not a bounds check hoist; see subsequent comments)

@ivarne
Copy link
Member Author

ivarne commented Jun 30, 2015

I would expect SubArray to check bounds on construction (unless they are constructed in a @inbounds block). That way it is obvious that an @inbounds declaration on indexing of a SubArray only checks bounds on the "outer" array view.

I have trouble seeing when you would want to override checkbounds. Usually I see @inbounds used as an assertion that i1 in size(A,1) && i2 in size(A,2), .... Is that wrong insufficient sometims? Wouldn't we need a different macro to dissable other sanity checks?

@prcastro
Copy link
Contributor

We seriously need to make Traits part of the language (#5). They are starting to appear everywhere on base, and this would make them much easier to work with.

@timholy
Copy link
Member

timholy commented Jun 30, 2015

@ivarne, SubArrays do (now) check on construction. But users may want to construct a similar type that doesn't check upon construction (see #4044), and that's what I was using as an example motivating the need to override checkbounds.

@ivarne
Copy link
Member Author

ivarne commented Jun 30, 2015

@timholy I'd say such objects doesn't follow (what I expect to be) the @inbounds interface contract (i in 1:size(A, 1)), so you'd need a different approach to dissable the safety checks for such objects.

When I apply @inbounds to some piece of code, I see that as a promise that I will only access valid indexes. Previously I have assumed that meant i in 1:size(A, 1), and if we make that logic more complicated, @inbounds will be much harder to use safely.

@timholy
Copy link
Member

timholy commented Jun 30, 2015

I don't understand your concern; with this proposal, you can have your cake and eat it too. The point is that users who want to do fancy things in their own code trees can add extra safety checks to checkbounds, depending on the specific AbstractArray type. But none of the types currently in Base will need such shenanigans.

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

Exactly, the whole point behind this idea is that you almost always want to check bounds before doing indexing. And checking bounds is almost always asserting that each i or each element of the array I is in 1:size(A, d) (or trailingsize). So let's just do that all by default.

The bonuses to this proposal are that it breaks these things down into orthogonal components:

  • it's simple to override getindex if you want to do something completely different, but you must be aware that library code may transform an indexing expression into unsafe_getindex.
  • it's simple to override checkbounds if you simply want to alter how bounds are checked when indexing into your custom array type. You're right - Julia has long held the assumption that indices are Int, start at 1, and are listed in order. But I think it'd be really powerful (and not too much extra work) to allow custom array types to break those conventions. E.g., Fortran allows specifying the start index, or AxisArrays.jl will allow you to specify named dimensions in any order.
  • it's simple to override in_bounds if you have a custom array type you'd like to support indexing with and have a different or more efficient way of checking bounds for it

As I was thinking about this last night, I realized that we'll have to make this scheme support Cartesian indices, which may be tough to do without duplicating functionality (and work). Edit: actually, it's not really any worse than the status quo, I think — it just requires a little extra indirection. Mixed cartesian/integer indexing breaks the assumption that each index corresponds to exactly one dimension… which means that the end keyword lowers incorrectly. But that's a whole 'nother can of worms.

@ivarne
Copy link
Member Author

ivarne commented Jun 30, 2015

The trouble is that if you have an array type that break the i in 1:size(A,1), most (or all) the existing code that uses @inbounds is wrong, and will cause segfaults (or other undefined behaviour).

Can you give an example on a correct piece of generic code that uses inbounds and works on both Julia (1 based) and C (0 based) arrays (without resorting to eachindex)?

@simonster
Copy link
Member

Right now it would cause segfaults. With the proposed scheme, it would only cause a segfault if unsafe_getindex itself uses @inbounds. Otherwise it will just throw BoundsErrors even if used from code that uses @inbounds.

The easiest solution is to say that an array with 0-based indexing should not subtype AbstractArray. This seems pretty reasonable to me, since I suspect there's not much that accepts AbstractArrays but not ordinary iterators that will actually work with 0-based indexing.

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

Ok, but this seems like a discussion for another day. We certainly don't need to document the fact that checkbounds happens to be written in an orthogonal manner that would allow specialization without tons of ambiguity warnings or encourage its overloading. Perhaps I named #11895 wrong: the point there is more about giving in_bounds a better name and allowing it to be overloaded, which is very necessary for efficient indexing by nullable arrays. The checkbounds changes simply fall out of that.

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

Ah, right, and I totally forgot - we do need to encourage the overloading of checkbounds for non-abstractarray indexables.

@JeffBezanson
Copy link
Member

I still have some serious doubts about splitting getindex into two functions. It's quite ugly to have to tell people to define unsafe_getindex.

Here's a possible alternative. First, we mark code that performs bounds checks with @boundscheck( ... ). All code marked this way is subject to removal by @inbounds. At that point, we have all the needed metadata in the code and the rest is up to compiler optimizations, where this belongs.

@StefanKarpinski
Copy link
Member

+1 that was something like what I was thinking of. What's a little weird though is that you seem to need to have machinery to compile two different versions of functions that have @boundscheck in them – it's sort of implicitly two different methods.

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

Yes, please! That was definitely my first choice (#10525 (comment)), but as I said there, "that just punts the complexity up the chain to a system I don't know and only a few could implement."

This proposal was my compromise that was implementable by peons. :)

@JeffBezanson
Copy link
Member

As a first cut, we might try having @inbounds only affect code up to 1 level of inlining. Eventually we might indeed need to have the compiler generate different versions of methods. But that's way better than shoving this in people's faces. All the unsafe_ stuff can be handled internally.

I haven't read all of the discussion on this, but is it thought to be important to remove bounds checks even if getindex isn't inlined?

@JeffBezanson
Copy link
Member

Also, this needn't tax method tables or dispatch. A benefit of leaving this to the compiler is that it can do any dirty trick it wants. We will eventually have some kind of __secret__ submodule of every module, where the compiler can stash things it generates. It could handle this by renaming, basically just inserting the unsafe_ prefix itself, and putting those definitions in __secret__ (a good name for this is sought).

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

The consensus seems to be in strong favor of decoupling @inbounds from inlining. The issue isn't just if it doesn't inline, but also if too much inlines. That means that a "safe" base method effectively sticks @inbounds annotations all throughout user code (which may have been expecting bounds checking). None of us like that behavior.

@ScottPJones
Copy link
Contributor

The big thing I care about, for correctness sake, is that @inbounds doesn't act like a dynamic scope, as it apparently does now. If you can also do some magic at some point in the future to cut down the explicit @inbounds I've had to put in for performance sake, that would be delightful!

@JeffBezanson
Copy link
Member

To be safer, @inbounds and @boundscheck could specify an array, and then we'd have to match them up. The check is only removed if the compiler can prove the two macros are talking about the same array.

As I said, we could also limit it to 1 level of inlining. That's a bit arbitrary, but seems pretty safe to me.

@mbauman
Copy link
Member

mbauman commented Jun 30, 2015

Ah, whoops, I missed the "up to 1 level." I like that. I don't think it's all that arbitrary - I think that each indexing level should need to explicitly opt-in to @inbounds if they know it'll be safe.

I also like the idea of specifying which array the indexing is inbounds for, but that might be somewhat redundant with the above. I'll have to think through that a bit more.

@carnaval
Copy link
Contributor

I don't like the idea of specifying the array : if the array gets stored into temporaries or something then it becomes harder, and if you already went to the trouble of adding @inbounds I don't think you'd like to have to fiddle with the code to get it to actually work.

@simonster
Copy link
Member

I've thought about this solution before. I don't think it would matter for performance if you're not able to eliminate bounds checks from functions that aren't inlined, but here is the worst case scenario:

function getindex(x, i)
    @checkbounds begin
        ...
        i += 1 # oops, meant to put this outside the @checkbounds block
    end
    @inbounds x.x[i]
end

Now your Travis build with --inline=no --code-coverage=user might pass even though your code doesn't work. Or you might not realize there was a bug in your code until someone changes the inlining threshold and bounds checks that were previously enabled get disabled. I'm not sure if this is a show stopper for this approach, just pointing it out.

@StefanKarpinski
Copy link
Member

Jeff, is the idea that you would generate both the getindex and unsafe_getindex methods from the single definition that contains @checkbounds and then on the calling side, the @inbounds macro would rewrite getindex calls to unsafe_getindex calls? If so this seems to be a special case of a macro that can rewrite the entire top-level expression that it appears in. Maybe a candidate for @@checkbounds? Or maybe you just put a macro around the method definition too.

@blakejohnson
Copy link
Contributor

This issue has been quiet for 4.5 months... what's necessary to actually make progress on @JeffBezanson's proposal? Are the required changes buried within codegen?

@mbauman
Copy link
Member

mbauman commented Feb 7, 2016

Closed by #14474. 🍰

@mbauman mbauman closed this as completed Feb 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests