RFC: Macro for expression noalias hints #19658

Keno · 2016-12-19T22:37:48Z

We are missing significant optimization opportunity for small constant-size loops. For such loops it is not profitable to generate run-time alias checks, but we still loose significant performance due to lack of vectorization. I'm proposing introducing a macro to annotate variables that are known not to alias, similar to restrict in C. My initial thought is something like this.

@assert_nolias a::Array # Implies that `a` does not share memory with any other variable in the current scope

For generic structs:

@assert_noalias b::Foo # morally equivalent to
for i in nfields(typeof(b))
@assert_noalias getfield(b, i)
end

I think these semantics are pretty strong and may be weakened over time, but it seems like an ok place to start. Eventually we may want to introduce more first class support for non-aliasing arrays (which array almost are, were it not for sharing by reshaping etc).

Motivating benchmark is jeff-regier/Celeste.jl#483

The text was updated successfully, but these errors were encountered:

Keno · 2016-12-19T22:52:24Z

Motivating example that does not require loading Celeste:

function foo(a,b)
    @inbounds for i = 1:3, j=1:3
         a[i] += b[i,j]
    end
end
a = [1,2,3]
b = [1 2 3; 4 5 6; 7 8 9]

Looking at the IR:

define void @julia_foo_71319(%jl_value_t*, %jl_value_t*) #0 !dbg !5 {
top:
  %2 = bitcast %jl_value_t* %1 to i64**
  %3 = load i64*, i64** %2, align 8
  %4 = getelementptr inbounds %jl_value_t, %jl_value_t* %1, i64 3, i32 0
  %5 = bitcast %jl_value_t** %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = bitcast %jl_value_t* %0 to i64**
  %8 = load i64*, i64** %7, align 8
  %9 = load i64, i64* %8, align 8
  %10 = load i64, i64* %3, align 8
  %11 = add i64 %10, %9
  store i64 %11, i64* %8, align 8
  %12 = getelementptr i64, i64* %3, i64 %6
  %13 = load i64, i64* %12, align 8
  %14 = add i64 %13, %11
  store i64 %14, i64* %8, align 8
  %15 = shl i64 %6, 1
  %16 = getelementptr i64, i64* %3, i64 %15
  %17 = load i64, i64* %16, align 8
  %18 = add i64 %17, %14
  store i64 %18, i64* %8, align 8
  %19 = getelementptr i64, i64* %8, i64 1
  %20 = load i64, i64* %19, align 8
  %21 = getelementptr i64, i64* %3, i64 1
  %22 = load i64, i64* %21, align 8
  %23 = add i64 %22, %20
  store i64 %23, i64* %19, align 8
  %24 = add i64 %6, 1
  %25 = getelementptr i64, i64* %3, i64 %24
  %26 = load i64, i64* %25, align 8
  %27 = add i64 %26, %23
  store i64 %27, i64* %19, align 8
  %28 = or i64 %15, 1
  %29 = getelementptr i64, i64* %3, i64 %28
  %30 = load i64, i64* %29, align 8
  %31 = add i64 %30, %27
  store i64 %31, i64* %19, align 8
  %32 = getelementptr i64, i64* %8, i64 2
  %33 = load i64, i64* %32, align 8
  %34 = getelementptr i64, i64* %3, i64 2
  %35 = load i64, i64* %34, align 8
  %36 = add i64 %35, %33
  store i64 %36, i64* %32, align 8
  %37 = add i64 %6, 2
  %38 = getelementptr i64, i64* %3, i64 %37
  %39 = load i64, i64* %38, align 8
  %40 = add i64 %39, %36
  store i64 %40, i64* %32, align 8
  %41 = add i64 %15, 2
  %42 = getelementptr i64, i64* %3, i64 %41
  %43 = load i64, i64* %42, align 8
  %44 = add i64 %43, %40
  store i64 %44, i64* %32, align 8
  ret void
}

We see a number of redundant stores to %19, because LLVM doesn't know the arrays don't alias. Removing those stores and thus enabling vectorization is the goal.

Keno · 2016-12-19T23:01:11Z

I should clarify that I intend the assertion to apply to the entire scope of the local variable, not just after the assertion.

vtjnash · 2016-12-19T23:29:39Z

related issue: #8087

Keno · 2016-12-20T01:46:54Z

@vtjnash feels strongly that these semantics are too strong and that instead, the alias set should be more directly described and properly scoped wrt to normal control flow. Thus, let me suggest the following:

@assert_noalias a, b, c # a,b,c don't alias. If they have fields, no two fields alias, etc.

With the assertion being an actual assertion in that it throws an error. After this assertion the compiler can rely on regular alias analysis (assuming it can figure out the check ok if it should). I still feel like there should a command line switch to elide the check, but that feature could then be implemented using an llvm pass that looks at the generated alias check and transforms it into metadata.

Keno · 2016-12-20T01:59:14Z

The obvious problem with this is of course that the check is at least n log n time and probably an equal amount in both code and data space.

Keno · 2016-12-20T03:36:49Z

In particular, the problem I have with mandating this is the following:
It's a common pattern to have a type like:

immutable Preallocated
vec::NTuple{20,Vector{Float64}}
end

Now, I feel like it would be perfectly reasonable to want to assert none of those vectors alias, and then do things with the vectors without having to manually specify exactly which alias checks are needed. However, if we actually do make sure that there is no pairwise aliasing, we very quickly end up in a situation where the check dominates the actual work to be done. Consider for example the Celeste workload above. It has precisely this situation. The fastest we can do the alias check would be go generate a size 20 sort network and do the comparison. I haven't actually written it out, but I suspect that would take ~200 vector instructions. However, each loop in that example is only ~20 vector instructions and there are five of them, so even in that case, we're looking at a 2x overhead for the check.

Ideally the semantics would allow only emitting the check if the assumption was actually used, but I think that would be difficult, both semantically and from an implementation standpoint.

Keno · 2016-12-20T03:59:23Z

I suppose at least for the array case, we could assert that their buffers are not shared (since we do keep track of that). That would only cause a linear number of checks (though still has the problem of unnecessary checks).

yuyichao · 2016-12-20T04:22:52Z

I suppose at least for the array case, we could assert that their buffers are not shared

Doesn't prevent the array to be used in multiple places?

Keno · 2016-12-20T04:26:29Z

Yeah, I suppose that's true.

tkoolen · 2016-12-20T10:21:19Z

Eigen does this with a noalias() method, called on the destination variable, as follows:

C.noalias() = A * B; // where A, B and C are Eigen Matrices

Note that with Eigen, the result of A * B is just an unevaluated Eigen expression type. The evaluation of the expression happens in the assignment operator, so effectively the noalias() applies to the evaluation of the expression as a whole. By calling noalias(), the user provides a guarantee that evaluating the expression will not alias; whether that's actually true is not verified in any way.

Maybe this idea also makes sense in Julia. Instead of saying that several variables don't alias as in #19658 (comment) (including all their fields), one could instead tell the compiler that a certain expression doesn't alias. Applied to the example from #19658 (comment), perhaps something like @noalias @inbounds for i = 1:3, j=1:3 [...] would be nice.

Here's an example where annotating sets of variables with @assert_noalias, at least in the sense of #19658 (comment), wouldn't make sense, but annotating the expression would:

data = rand(3, 9)
A = view(data, :, 1:3)
B = view(data, :, 4:6)
C = view(data, :, 7:9)
A[:] = B * C # doesn't alias, but the 'parent' field of the SubArrays A, B, and C is the same Array

I'm also not sure whether having the compiler check that aliasing really doesn't occur is really necessary, or feasible. It feels very similar to @inbounds, so why treat them differently?

Keno · 2017-01-06T21:31:53Z

The problem I see with that is that's not entirely clear what such an annotation would mean when applied to an expression. There's not really an obvious semantic to apply here, because the operators can be arbitrary functions. Eigen can get away with it, because the operations and operands are limited, but that's not the case in Julia.

Keno · 2017-01-06T22:39:47Z

@vtjnash and I have discussed this some more and we discussed that perhaps a lexically scoped const annotation would be easier to reason about and implement. The options are pretty much as follows:

Option 1

function foo(a,b)
    @inbounds @const b for i = 1:3, j=1:3
        a[i] += b[i,j]
    end
end

lowers to something like

function foo(a,b)
    @inbounds let b = Const(b)
        $(Expr(:aliasscope))
        for i = 1:3, j=1:3
            a[i] += b[i,j]
        end
        $(Expr(:popaliasscope))
    end
end

Inside an alias scope, all stores do not alias any loads from Const arrays (and it would be illegal to write to the underlying memory).

Option 2

Have functions implicitly be an alias scope, and you can manually create Const wrappers, which are valid with respect to their scoped:

function foo(a,b)
    let b = Const(b)
        @inbounds for i = 1:3, j=1:3
            a[i] += b[i,j]
        end
    end
end

Though I suppose this would mean the same thing.

function foo(a,b)
    @inbounds for i = 1:3, j=1:3
        a[i] += Const(b)[i,j]
    end
end

The second option seems somewhat nice, but I fear it will get confusing. The first option quite clearly delineates in what region the const applies, while the second option is rather implicit.

KristofferC · 2017-01-06T22:45:50Z

The first option seems to fit in with existing performance macro annotations very nicely.

Keno · 2017-01-06T22:47:13Z

It was implicit, but I should perhaps mention it explicitly. One major advantage of the Const approach is that it's enforceable in a linear amount of code if we want to do so in the future.

StefanKarpinski · 2017-01-08T16:42:18Z

Even though it's a departure from the way we've done a lot of performance annotations, I like the explicitness of the Const approach. Does it also mean that we could define methods for Const types explicitly and pass Const values through to functions that are called, or would the Const wrapper not be that first class?

tkoolen · 2017-01-08T20:13:05Z

I think the Const approach would cover a lot of use cases. Something like the numerical integration example in this blog post ('Non-aliased memory windows' section) would not be covered though. I suppose the Const approach does require something like #12205; otherwise I'd expect GC/allocation overhead to dominate.

Keno · 2017-01-09T04:41:10Z

Yes const would be relatively first class. Functions would be an implicit alias scope, but you could also declare alias scopes directly with a macro (which would cover the numerical integration example). I'm very much concerned about the nonlocality of the semantics though.

StefanKarpinski · 2017-01-11T05:43:38Z

I'm very much concerned about the nonlocality of the semantics though.

This is the real reason to favor the macro approach – to avoid exposing anything non-lexical.

Keno · 2017-04-07T18:30:34Z

Having played with this a whole bunch, I find the Const approach lacking. You do also want to express no-alias between different arrays you're writing to (e.g. if update them using +=).

vtjnash · 2018-09-04T02:12:55Z

I’m a bit concerned by this when I see Hal saying that LLVM doesn’t support doing this for C due to observed correctness and performance issues (i.e., it sounds like they currently get the wrong result (wrong answer, slower) in the presence of inlining or operator overloading occurring, and that fixing that gave uniformly even worse performance (because the correct annotation seems it must prohibit many more important optimizations). Ref http://llvm.1065342.n5.nabble.com/llvm-dev-alias-scope-and-local-restricted-C-pointers-tp121373p121385.html for the latest comments I’ve seen about this.)

He says there that they’re working on improving it, and interested in new ideas though. It sounds like he expects that it will never really work for C++ though, due to the ability to define constructors and operators—something that’s obviously very relevant to Julia!

Keno · 2018-09-04T17:07:02Z

I'm increasingly in favor of a model where you can freeze a mutable array into an immutable one. If the compiler can figure out that your mutable array doesn't escape, it can immediately allocate the immutable one, otherwise it'll make a copy. That would solve 99% of the cases where you want aliasing. Most of the time arrays are immutable after construction and the aliasing problems arise because the compiler doesn't know which those kinds of arrays are. Making that super performant would require more sophisticated escape analysis, but I do think we can do that along with type inference (since your code needs to be well-inferred to be fast anyway).

ChrisRackauckas · 2018-09-05T15:50:12Z

That would solve 99% of the cases where you want aliasing. Most of the time arrays are immutable after construction and the aliasing problems arise because the compiler doesn't know which those kinds of arrays are.

99% seems like a high over-estimate. It won't cover anything differential equation related. For example, with issue #28126 it wouldn't be applicable since the arrays are built iteratively: a is defined by b, c, d, and e, but then it's added as the next array for the linear combination to create f, and this triangle grows. In optimization you have something similar as well: you're updating mutable cache arrays using other caches you just updated. With (iterative) linear algebra you're working on a bunch of mutable cache arrays with linear combinations between them.

Maybe that thought holds for more computer science applications, but the general workflow in scientific computing seems to be:

Update cache array with newest f call (differential equation function, optimization gradient, etc.
Update internal arrays using said cache
Decide how to call f again using the internal arrays.

For step 2 to be correct we have to setup the cache arrays so that they don't alias. But the compiler doesn't know that, so what we would need is some way to tell the compiler that.

StefanKarpinski · 2018-09-10T19:20:52Z

@Keno: what you're talking about reminds me of John Rose's "larval objects" idea:

https://blogs.oracle.com/jrose/larval-objects-in-the-vm

Implementation-wise, if we could convert a mutable value to a corresponding immutable one in a usually-zero-copy fashion, that would be what we want. Another way to approach that might be to be able to mark the mutable version is "illegal to use" and reassign the memory to the immutable version. Unfortunately, I can't think of a way to do that without either forcing a flag check on each access or using memory protections and faults.

Keno · 2018-09-10T19:24:04Z

Hadn't seen that before, but yeah, it's the same concept. Implementation to be worked out, but I think something like it is promising. Note that I do think it makes sense to be able to go both ways (with the compiler eliding if possible, copies if not).

Keno mentioned this issue Jan 26, 2017

WIP: Implement alias scopes #20257

Closed

yuyichao mentioned this issue May 19, 2017

Performance issue (possible codegen bug?) #21966

Closed

vtjnash referenced this issue Jun 14, 2018

let's not

b68efd3

mbauman mentioned this issue Jun 18, 2018

Document @simd and remove invalid uses #27495

Closed

ChrisRackauckas mentioned this issue Jul 18, 2018

Broadcasting is much slower than a for loop #28126

Open

YingboMa mentioned this issue Sep 4, 2018

Suboptimal code generation because of the lack of TBAA on Array #29026

Open

YingboMa mentioned this issue Feb 9, 2019

Implement alias scopes #31018

Merged

vtjnash mentioned this issue Mar 30, 2021

aliasing in Julia #8087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Macro for expression noalias hints #19658

RFC: Macro for expression noalias hints #19658

Keno commented Dec 19, 2016

Keno commented Dec 19, 2016 •

edited

Loading

Keno commented Dec 19, 2016

vtjnash commented Dec 19, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

yuyichao commented Dec 20, 2016

Keno commented Dec 20, 2016

tkoolen commented Dec 20, 2016 •

edited

Loading

Keno commented Jan 6, 2017

Keno commented Jan 6, 2017

KristofferC commented Jan 6, 2017

Keno commented Jan 6, 2017

StefanKarpinski commented Jan 8, 2017

tkoolen commented Jan 8, 2017

Keno commented Jan 9, 2017

StefanKarpinski commented Jan 11, 2017

Keno commented Apr 7, 2017

vtjnash commented Sep 4, 2018

Keno commented Sep 4, 2018

ChrisRackauckas commented Sep 5, 2018

StefanKarpinski commented Sep 10, 2018

Keno commented Sep 10, 2018

RFC: Macro for expression noalias hints #19658

RFC: Macro for expression noalias hints #19658

Comments

Keno commented Dec 19, 2016

Keno commented Dec 19, 2016 • edited Loading

Keno commented Dec 19, 2016

vtjnash commented Dec 19, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

Keno commented Dec 20, 2016

yuyichao commented Dec 20, 2016

Keno commented Dec 20, 2016

tkoolen commented Dec 20, 2016 • edited Loading

Keno commented Jan 6, 2017

Keno commented Jan 6, 2017

Option 1

Option 2

KristofferC commented Jan 6, 2017

Keno commented Jan 6, 2017

StefanKarpinski commented Jan 8, 2017

tkoolen commented Jan 8, 2017

Keno commented Jan 9, 2017

StefanKarpinski commented Jan 11, 2017

Keno commented Apr 7, 2017

vtjnash commented Sep 4, 2018

Keno commented Sep 4, 2018

ChrisRackauckas commented Sep 5, 2018

StefanKarpinski commented Sep 10, 2018

Keno commented Sep 10, 2018

Keno commented Dec 19, 2016 •

edited

Loading

tkoolen commented Dec 20, 2016 •

edited

Loading