Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document @simd and remove invalid uses #27495

Closed
wants to merge 4 commits into from
Closed

Document @simd and remove invalid uses #27495

wants to merge 4 commits into from

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Jun 8, 2018

Fixes #27482.

Copies the information from the performance tips into a docstring; makes it a bit scarier.

Fixes #27482.

Copies the information from the performance tips into a docstring; makes it a bit scarier.
@mbauman mbauman added the compiler:simd instruction-level vectorization label Jun 8, 2018
base/simdloop.jl Outdated
This feature is experimental and could change or disappear in future versions of Julia.
Incorrect use of the `@simd` macro may cause unexpected results.

By using `@simd`, you are are asserting several properties of the loop:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are are

@mbauman
Copy link
Member Author

mbauman commented Jun 8, 2018

I started with the HTML docs here, assuming they were in sync with the source. They weren't, so I'll have to go back and incorporate those changes.

Perhaps I should also just pare down the @simd section in the perf tips and just point to its docs?

@mbauman
Copy link
Member Author

mbauman commented Jun 11, 2018

Ok, updated to use PerformanceTips.md as more of a pointer to @inbounds/@fastmath/@simd and a place for the combined usecases.

Any further comments?

like allowing floating-point re-associativity and ignoring dependent memory accesses. Again, be very
careful when asserting `@simd` as erroneously annotating a loop with dependent iterations may result
in unexpected results. In particular, note that `setindex!` on some `AbstractArray` subtypes is
enherently dependent upon iteration order. **This feature is experimental**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: enherently -> inherently

@jebej
Copy link
Contributor

jebej commented Jun 11, 2018

Will this slow down a lot of operations, e.g. sum?

@KristofferC
Copy link
Member

KristofferC commented Jun 11, 2018

Would be good to check Nanosoldier just to see the eventual damage.

@mbauman
Copy link
Member Author

mbauman commented Jun 11, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@ararslan
Copy link
Member

Things should be marginally better now: @nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@mbauman
Copy link
Member Author

mbauman commented Jun 12, 2018

I've added a few workarounds for common cases. I know we'll still have some broadcast regressions but it's unavoidable — we simply cannot tell if it's safe to perform @simd over an arbitrary function. It's uncommon, but that function might just be mutating in a manner that violates @simd.

@mbauman
Copy link
Member Author

mbauman commented Jun 12, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@mbauman
Copy link
Member Author

mbauman commented Jun 13, 2018

Oh interesting — looks like Nanosoldier has two different race conditions in what it compares. This is an obvious one both ways: this PR got the effects of #27331 but the "master" comparison didn't. At the same time it was triggered on a horribly broken commit here, but by the time the queue cleared it picked up the latest commit. In any case: we can still see the effects. Looks like I should add some @inlines to the inner loop functions in reductions. Beyond that, I think this is as good as we can get with the existing @simd semantics.

@KristofferC
Copy link
Member

Oh interesting — looks like Nanosoldier has two different race conditions in what it compares.

I knew I wasn't crazy!

@mbauman
Copy link
Member Author

mbauman commented Jun 13, 2018

This is ready to go by my account.

Appveyor failed a download with a 404, but I don't think the travis failures are related but I've not seen them before.

Mac failed with a LoadError: cannot obtain socket name: operation not permitted (EPERM) in SharedArrays and precompile; both linux builds failed in Sockets/test/runtests.jl:228 with Got exception UDP send failed: address not available (EADDRNOTAVAIL) (edit: the linux failures were seen previously in #27376 (comment)).

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that the notion of simdable function is viable. I agree that we are currently a bit to aggressive, but I do not think the problem is unpure functions.

From my understanding the problem is arrays that alias each other, since that could lead results that differ between a simd loop and a non-simd loop.

# The @simd transformation is only valid in limited situations
struct SIMDableFunction{f}; end
(::SIMDableFunction{f})(args...) where {f} = f(args...)
simdable(f::Union{map(typeof, (+, *, &, |, add_sum, mul_prod, -, /, ^, identity))...}) = SIMDableFunction{f}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that sure that this is meaningful. The question is when is a function not valid to be executed as simd?
When it has side-effects? @simd is an advise to LLVM that gives it permissions to perform some limited transformations and encourages it to be more lenient, but LLVM still has to proof that the simd transformation is valid and beneficial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a poor name. I don't really want to be building a trait-system here — I wanted to just use a hard-coded Union{} — but due to bootstrapping issues not all the functions I wanted to whitelist were available. Of course we can not guarantee that these functions won't have an invalid method on them.

Really, the problem is that @simd is non-local. It affects any function that gets inlined — even passed user functions. We have no idea if the particular method is going to have observable side-effects that break the SIMD guarantees. This is an attempt at a white-list to recoup some of the performance in common cases — but perhaps we shouldn't even try to do that.

LLVM still has to proof that the simd transformation is valid and beneficial

My understanding is that @simd is promising that the transformation is valid — allowing LLVM to do things that it cannot prove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing LLVM to do things that it cannot prove.

Essentially true, but the list above is largely redundant with things that LLVM should be able to prove (since it's approximately just a subset list of known pure functions), and, beyond that, is incorrect in general (there's no guarantee – or expectation – that my implementation of any of those functions is actually pure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that @simd is promising that the transformation is valid

It defines that certain transforms are valid, but we implement that list. Currently it contains:

  • primitive ops that contribute towards the computation of a floating point computation reduction may be reordered (reassociated / commuted)
  • all memory writes are independent of all other memory writes and reads that are performed inside the loop (e.g. all memory that is read from is readonly, and no location is written to twice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great, that is extremely helpful. So if we disable the memory independence assertion (b68efd3), then is it correct that one safely could annotate @simd over unknown generic functions? My read is yes: this simply adds extra handling for reduction variables — which, by definition, must be lexically visible to the @simd author.

Memory dependencies, on the other hand, might not be lexically visible to the @simd author. Doing something like #19658 could perhaps allow us to recoup some of the lost perf in the future in a much safer and precise way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding too. Although there's some non-locality in the definition of what constitutes "computation of the reduction", since we just examine the use/def graph after optimization. It is possible to make a closure which obscures the reduction variable lexically. For example:

result = Ref(0.0)
for i = 1:10
    result[] += i
end
result = Ref(0.0)
update(i) = (result[] += i)
for i = 1:10
    update(i)
end

This pattern appears, for example, in the Stateful iterator.

@vchuravy vchuravy requested review from vtjnash and Keno June 13, 2018 21:23
@mbauman
Copy link
Member Author

mbauman commented Jun 13, 2018

I'd be okay dropping the last two commits here if we want to err more conservatively.

It's really sad, but I think we should also seriously consider deleting @simd entirely. It's just a huge trap that we've explicitly encouraged many folks to fall into.

@vchuravy
Copy link
Member

It's really sad, but I think we should also seriously consider deleting @simd entirely. It's just a huge trap that we've explicitly encouraged many folks to fall into.

The more I think about it the more I agree with this... I suspect that we need something that does a scope local annotation of being able to reorder operations and a annotation on memory loads for them to be allowed in parallel and non-aliasing. But the current form is not composable.

@mbauman mbauman added the triage This should be discussed on a triage call label Jun 13, 2018
@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Jun 14, 2018
@StefanKarpinski StefanKarpinski added this to the 0.7 milestone Jun 14, 2018
@mbauman
Copy link
Member Author

mbauman commented Jun 14, 2018

Let's merge this for now and we can begin a broader discussion about the future of SIMD in a new issue.

@vtjnash
Copy link
Member

vtjnash commented Jun 14, 2018

Maybe we should redo the nanosoldier run, so we can get a real picture of the results.

We could also instead merge b68efd3#commitcomment-29355867, which just completely removes the offending optimization – and appears to affect fewer tests

@mbauman
Copy link
Member Author

mbauman commented Jun 14, 2018

@nanosoldier runbenchmarks("array" || "problem" || "random", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@mbauman
Copy link
Member Author

mbauman commented Jun 15, 2018

Even if we do merge b68efd3#commitcomment-29355867, I think this patch is still needed due to the non-local effects of @simd on passed user functions. That's precisely why we see the broadcast fusion regressions from Nanosoldier — those tests are broadcasting user functions, which we don't know if they're safe for re-associativity.

@mbauman
Copy link
Member Author

mbauman commented Jun 15, 2018

Failures all look unrelated: Sockets and SharedArrays failures with travis (UDP send failed: address not available (EADDRNOTAVAIL) and cannot obtain socket name: operation not permitted (EPERM)), AV hit a 404 with DownloadFile.

currently needed for all array accesses. The compiler can sometimes turn
short `&&`, `||`, and `?:` expressions into straight-line code if it is safe
to evaluate all operands unconditionally. Consider using the [`ifelse`](@ref)
function instead of `?:` in the loop if it is safe to do so.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this would be bad advice, since it's harder to generate fast code for ifelse than for ?: (it's generally easy for the compiler to optimize ?: into ifelse if it would be profitable, but the reverse is not true). The better expression (often) is to say to avoid calling functions in the branches, and instead just select between two values. (also essentially equivalent to defining this as ifelse(x, y, z) = (x ? y : z), although we wouldn't currently infer this definition any better than the existing ifelse, because we don't do IPO conditional-value propagation)

mbauman added a commit that referenced this pull request Jun 20, 2018
@mbauman
Copy link
Member Author

mbauman commented Jun 20, 2018

Superseded by #27670

@mbauman mbauman closed this Jun 20, 2018
@mbauman mbauman deleted the mb/simd branch June 20, 2018 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:simd instruction-level vectorization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants