Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reland: Improve performance of global code by emitting fewer atomic barriers. #47636

Merged
merged 3 commits into from
Apr 19, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 19, 2022

Re-land #47578 which got reverted because of a GC issue on win32. I haven't debugged this, but want to have an open PR so that I don't lose track of this.

Fixes #47561

@maleadt maleadt added performance Must go faster backport 1.8 Change should be backported to release-1.8 backport 1.9 Change should be backported to release-1.9 labels Nov 19, 2022
@maleadt maleadt changed the title Improve performance of global code by emitting fewer atomic barriers. Reland: Improve performance of global code by emitting fewer atomic barriers. Nov 19, 2022
@vchuravy
Copy link
Member

The good news is that it also seems to happen on Linux i686 so we might have a hope to debug it :)

@giordano giordano added the re-land This relands a PR that was previously merged but was later reverted. label Nov 19, 2022
@maleadt
Copy link
Member Author

maleadt commented Dec 2, 2022

So this reproduces pretty consistently on 32-bit Linux by just doing runtests.jl LinearAlgebra/special, and I even have an rr trace, but it's not very helpful. I'll probably need a debug build.

@maleadt maleadt force-pushed the tb/global_atomic_barrier branch from b368475 to 2623a10 Compare December 3, 2022 15:16
@KristofferC KristofferC mentioned this pull request Dec 14, 2022
26 tasks
@maleadt maleadt force-pushed the tb/global_atomic_barrier branch from 2623a10 to abf54fd Compare December 20, 2022 14:22
@maleadt
Copy link
Member Author

maleadt commented Dec 21, 2022

Finally caught this in a trace with debug info: https://julialang-dumps.s3.amazonaws.com/reports/2022-12-21T12-04-45-maleadt.tar.zst. Note that this is a 32-bit process, so you either need to replay this with on a 32-bit system, or use a build of rr that supports replaying 32-bit processes (i.e. not the one from rr_jll).

@maleadt
Copy link
Member Author

maleadt commented Dec 21, 2022

Some digging in rr. This crashes when accessing an invalid field of an object:

(rr) p *pnew_obj
$3 = (jl_value_t *) 0x13

(rr) p (jl_value_t*)parent
$14 = (jl_value_t *) 0xb1a97b5c

(rr) call jl_((jl_value_t*)parent)
LinearAlgebra.QRCompactWY{Float64, Array{Float64, 2}, Array{Float64, 2}}(factors=#<intrinsic #29 sle_int>, T=#<19>)

Note the invalid factors=#<intrinsic #29 sle_int> field, which should be a matrix instead. Setting a watchpoint on that field brings us to the place it gets overwritten:

Old value = 19
New value = -1359114816
0xe7da775e in julia__useref_setindex!_14279 () at compiler/ssair/ir.jl:466
466     @noinline function _useref_setindex!(@nospecialize(stmt), op::Int, @nospecialize(v))

(rr) call jl_((jl_value_t*)0xb1a97b5c)
Core.GotoIfNot(cond=SSAValue(51), dest=-1359114816)

Not only does the field get overwritten here, the object is entirely different (i.e. the object's header is different). I'm not sure whether that means that the QRCompactWY wasn't properly rooted, or how we end up with a mix of two objects. Breaking on where the object's header gets set to the QRCompactWY type doesn't give me an interesting backtrace:

Old value = 2980674488
New value = 2926770304
0xb5a88c1c in ?? ()

# manual expansion of jl_typeof (is what I set a watchpoint on)
(rr) call jl_((jl_astaggedvalue((jl_value_t*)0xb1a97b5c))->header  & ~(uintptr_t)15)
LinearAlgebra.QRCompactWY{Float64, Array{Float64, 2}, Array{Float64, 2}}

(rr) bt
#0  0xb5a88c1c in ?? ()
#1  0x00000000 in ?? ()

@maleadt maleadt added the DO NOT MERGE Do not merge this PR! label Dec 21, 2022
@maleadt maleadt force-pushed the tb/global_atomic_barrier branch 2 times, most recently from 847c633 to 707682e Compare December 21, 2022 14:41
@maleadt
Copy link
Member Author

maleadt commented Dec 21, 2022

So this consistently crashes during LinearAlgebra/special with the above GC corruption when removing the atomic barrier between every toplevel instruction. To be sure, the latest commits puts it back, showing that the issue isn't caused by either switching to monotonic ordering, or by late-inserting barriers before calls.

@vtjnash Do you have any thoughts how this could be related? Could you explain the full purpose of these barriers? I thought it was only required to ensure a correct world age, so I'm not sure how removing those loads could result in GC corruption.

@maleadt maleadt added the help wanted Indicates that a maintainer wants help on an issue or pull request label Jan 5, 2023
@KristofferC KristofferC mentioned this pull request Jan 17, 2023
35 tasks
@KristofferC KristofferC mentioned this pull request Feb 6, 2023
16 tasks
@KristofferC KristofferC mentioned this pull request Feb 20, 2023
50 tasks
@KristofferC KristofferC mentioned this pull request Mar 7, 2023
52 tasks
@giordano
Copy link
Contributor

Bump. It'd be a bit unfortunate to have another minor version with this bug 🙂

@KristofferC KristofferC mentioned this pull request Apr 9, 2023
26 tasks
@maleadt maleadt force-pushed the tb/global_atomic_barrier branch from ab65997 to a34b0ed Compare April 17, 2023 08:11
@maleadt
Copy link
Member Author

maleadt commented Apr 17, 2023

Rebased. Looks like the error doesn't happen anymore? I couldn't try locally, because this time even a non-debug build failed because of address space exhaustion.

EDIT: using the i686 Buildkite artifact, I'm running the LinearAlgebra/special testsuite a bunch of times right now, and it hasn't failed yet. Looking good?

@vtjnash vtjnash removed the help wanted Indicates that a maintainer wants help on an issue or pull request label Apr 17, 2023
@vtjnash vtjnash removed the DO NOT MERGE Do not merge this PR! label Apr 17, 2023
@vtjnash vtjnash marked this pull request as ready for review April 17, 2023 15:35
@vtjnash
Copy link
Member

vtjnash commented Apr 17, 2023

Why are we relaxing this from acquire to monotonic?

@maleadt
Copy link
Member Author

maleadt commented Apr 17, 2023

I think you suggested that?

@vtjnash
Copy link
Member

vtjnash commented Apr 17, 2023

I don't remember

@maleadt
Copy link
Member Author

maleadt commented Apr 17, 2023

Do you want me to remove that part?

@maleadt maleadt force-pushed the tb/global_atomic_barrier branch from a34b0ed to 6cc1542 Compare April 18, 2023 13:22
src/codegen.cpp Outdated Show resolved Hide resolved
@maleadt maleadt force-pushed the tb/global_atomic_barrier branch from 6cc1542 to d2daaa5 Compare April 18, 2023 13:43
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Apr 18, 2023
@maleadt
Copy link
Member Author

maleadt commented Apr 18, 2023

Backporting these changes on top of 1.9 still results in segfaults in i686 while running LinearAlgebra/special...

@vtjnash if you want, I can try generating an rr trace for you to take a look (as I'm slightly concerned that the underlying issue may still be present here; and it would also be useful to have this on 1.9).

@vtjnash
Copy link
Member

vtjnash commented Apr 18, 2023

Yes, that would be helpful

@maleadt maleadt merged commit 1c6271f into master Apr 19, 2023
@maleadt maleadt deleted the tb/global_atomic_barrier branch April 19, 2023 08:53
@giordano giordano removed the merge me PR is reviewed. Merge when all tests are passing label Apr 19, 2023
@KristofferC KristofferC mentioned this pull request May 8, 2023
51 tasks
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.8 Change should be backported to release-1.8 performance Must go faster re-land This relands a PR that was previously merged but was later reverted.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious performance regression in Julia 1.8 vs 1.7 for @time in top-level-scope
5 participants