-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document @simd
and remove invalid uses
#27495
Conversation
Fixes #27482. Copies the information from the performance tips into a docstring; makes it a bit scarier.
base/simdloop.jl
Outdated
This feature is experimental and could change or disappear in future versions of Julia. | ||
Incorrect use of the `@simd` macro may cause unexpected results. | ||
|
||
By using `@simd`, you are are asserting several properties of the loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are are
I started with the HTML docs here, assuming they were in sync with the source. They weren't, so I'll have to go back and incorporate those changes. Perhaps I should also just pare down the |
Ok, updated to use PerformanceTips.md as more of a pointer to Any further comments? |
doc/src/manual/performance-tips.md
Outdated
like allowing floating-point re-associativity and ignoring dependent memory accesses. Again, be very | ||
careful when asserting `@simd` as erroneously annotating a loop with dependent iterations may result | ||
in unexpected results. In particular, note that `setindex!` on some `AbstractArray` subtypes is | ||
enherently dependent upon iteration order. **This feature is experimental** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: enherently -> inherently
Will this slow down a lot of operations, e.g. |
Would be good to check Nanosoldier just to see the eventual damage. |
@nanosoldier |
Things should be marginally better now: @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
I've added a few workarounds for common cases. I know we'll still have some broadcast regressions but it's unavoidable — we simply cannot tell if it's safe to perform |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Oh interesting — looks like Nanosoldier has two different race conditions in what it compares. This is an obvious one both ways: this PR got the effects of #27331 but the "master" comparison didn't. At the same time it was triggered on a horribly broken commit here, but by the time the queue cleared it picked up the latest commit. In any case: we can still see the effects. Looks like I should add some |
I knew I wasn't crazy! |
This is ready to go by my account. Appveyor failed a download with a 404, but I don't think the travis failures are related but I've not seen them before. Mac failed with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that the notion of simdable
function is viable. I agree that we are currently a bit to aggressive, but I do not think the problem is unpure functions.
From my understanding the problem is arrays that alias each other, since that could lead results that differ between a simd loop and a non-simd loop.
# The @simd transformation is only valid in limited situations | ||
struct SIMDableFunction{f}; end | ||
(::SIMDableFunction{f})(args...) where {f} = f(args...) | ||
simdable(f::Union{map(typeof, (+, *, &, |, add_sum, mul_prod, -, /, ^, identity))...}) = SIMDableFunction{f}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that sure that this is meaningful. The question is when is a function
not valid to be executed as simd?
When it has side-effects? @simd
is an advise to LLVM that gives it permissions to perform some limited transformations and encourages it to be more lenient, but LLVM still has to proof that the simd transformation is valid and beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a poor name. I don't really want to be building a trait-system here — I wanted to just use a hard-coded Union{}
— but due to bootstrapping issues not all the functions I wanted to whitelist were available. Of course we can not guarantee that these functions won't have an invalid method on them.
Really, the problem is that @simd
is non-local. It affects any function that gets inlined — even passed user functions. We have no idea if the particular method is going to have observable side-effects that break the SIMD guarantees. This is an attempt at a white-list to recoup some of the performance in common cases — but perhaps we shouldn't even try to do that.
LLVM still has to proof that the simd transformation is valid and beneficial
My understanding is that @simd
is promising that the transformation is valid — allowing LLVM to do things that it cannot prove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing LLVM to do things that it cannot prove.
Essentially true, but the list above is largely redundant with things that LLVM should be able to prove (since it's approximately just a subset list of known pure functions), and, beyond that, is incorrect in general (there's no guarantee – or expectation – that my implementation of any of those functions is actually pure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that
@simd
is promising that the transformation is valid
It defines that certain transforms are valid, but we implement that list. Currently it contains:
- primitive ops that contribute towards the computation of a floating point computation reduction may be reordered (reassociated / commuted)
- all memory writes are independent of all other memory writes and reads that are performed inside the loop (e.g. all memory that is read from is readonly, and no location is written to twice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great, that is extremely helpful. So if we disable the memory independence assertion (b68efd3), then is it correct that one safely could annotate @simd
over unknown generic functions? My read is yes: this simply adds extra handling for reduction variables — which, by definition, must be lexically visible to the @simd
author.
Memory dependencies, on the other hand, might not be lexically visible to the @simd
author. Doing something like #19658 could perhaps allow us to recoup some of the lost perf in the future in a much safer and precise way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding too. Although there's some non-locality in the definition of what constitutes "computation of the reduction", since we just examine the use/def graph after optimization. It is possible to make a closure which obscures the reduction variable lexically. For example:
result = Ref(0.0)
for i = 1:10
result[] += i
end
result = Ref(0.0)
update(i) = (result[] += i)
for i = 1:10
update(i)
end
This pattern appears, for example, in the Stateful
iterator.
I'd be okay dropping the last two commits here if we want to err more conservatively. It's really sad, but I think we should also seriously consider deleting |
The more I think about it the more I agree with this... I suspect that we need something that does a scope local annotation of being able to reorder operations and a annotation on memory loads for them to be allowed in parallel and non-aliasing. But the current form is not composable. |
Let's merge this for now and we can begin a broader discussion about the future of SIMD in a new issue. |
Maybe we should redo the nanosoldier run, so we can get a real picture of the results. We could also instead merge b68efd3#commitcomment-29355867, which just completely removes the offending optimization – and appears to affect fewer tests |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Even if we do merge b68efd3#commitcomment-29355867, I think this patch is still needed due to the non-local effects of |
Failures all look unrelated: Sockets and SharedArrays failures with travis ( |
currently needed for all array accesses. The compiler can sometimes turn | ||
short `&&`, `||`, and `?:` expressions into straight-line code if it is safe | ||
to evaluate all operands unconditionally. Consider using the [`ifelse`](@ref) | ||
function instead of `?:` in the loop if it is safe to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this would be bad advice, since it's harder to generate fast code for ifelse
than for ?:
(it's generally easy for the compiler to optimize ?:
into ifelse
if it would be profitable, but the reverse is not true). The better expression (often) is to say to avoid calling functions in the branches, and instead just select between two values. (also essentially equivalent to defining this as ifelse(x, y, z) = (x ? y : z)
, although we wouldn't currently infer this definition any better than the existing ifelse
, because we don't do IPO conditional-value propagation)
Superseded by #27670 |
Fixes #27482.
Copies the information from the performance tips into a docstring; makes it a bit scarier.