Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace libosxunwind wth LLVM libunwind #39127

Merged
merged 3 commits into from
Feb 19, 2021
Merged

Replace libosxunwind wth LLVM libunwind #39127

merged 3 commits into from
Feb 19, 2021

Conversation

omus
Copy link
Member

@omus omus commented Jan 6, 2021

Replaces libosxunwind with LLVM's libunwind (#30154). It appears that we can also replace the nongnu libunwind as well with this library in another PR.

  • Ensure LLVM libunwind builds using BinaryBuilder locally
  • Ensure LLVM libunwind builds without BinaryBuilder locally
  • Ensure build works on CI
  • Validate special cases for LIBOSXUNWIND can be safely removed (still required otherwise segfaults can occur)
  • Update to use LLVM libunwind 11.0.1
  • Resolve issue with make -C deps distclean-llvmunwind when building from source
  • Use patched libunwind.h which includes unw_init_local_dwarf

@omus omus added building Build system, or building Julia or its dependencies system:mac Affects only macOS stdlib Julia's standard library labels Jan 6, 2021
@omus omus requested a review from staticfloat January 6, 2021 19:52
@DilumAluthge
Copy link
Member

@staticfloat Any idea why CI isn't starting?

@omus
Copy link
Member Author

omus commented Jan 6, 2021

I need to enable the LLVMLibUnwind stub only on macOS for now. The macOS failure is strange as it appears it's looking for libunwind.1.dylib where my local system required libunwind.1.0.dylib

@omus omus changed the title Replace libosxunwind wth LLVM libunwind WIP: Replace libosxunwind wth LLVM libunwind Jan 6, 2021
@omus
Copy link
Member Author

omus commented Jan 6, 2021

Dropping the version number from libunwind now worked locally

deps/Makefile Show resolved Hide resolved
@IanButterworth
Copy link
Member

I tried this locally on MacOS and got profiling working!

julia> function profile_test(n)
                         for i = 1:n
                             A = randn(100,100,20)
                             m = maximum(A)
                             Am = mapslices(sum, A; dims=2)
                             B = A[:,:,5]
                             Bsort = mapslices(sort, B; dims=1)
                             b = rand(100)
                             C = B.*b
                         end
                     end
profile_test (generic function with 1 method)

julia> profile_test(1)

julia> using Profile

julia> @profile profile_test(10)

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
   ╎37  @Base/client.jl:492; _start()
   ╎ 37  @Base/client.jl:309; exec_options(opts::Base.JLOptions)
   ╎  37  @Base/client.jl:379; run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
   ╎   37  @Base/essentials.jl:708; invokelatest
   ╎    37  @Base/essentials.jl:710; #invokelatest#2
   ╎     37  @Base/client.jl:394; (::Base.var"#890#892"{Bool, Bool, Bool})(REPL::Module)
   ╎    ╎ 37  @REPL/src/REPL.jl:305; run_repl(repl::REPL.AbstractREPL, consumer::Any)
   ╎    ╎  37  @REPL/src/REPL.jl:317; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
   ╎    ╎   37  @REPL/src/REPL.jl:185; start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
   ╎    ╎    37  @REPL/src/REPL.jl:200; repl_backend_loop(backend::REPL.REPLBackend)
   ╎    ╎     37  @REPL/src/REPL.jl:139; eval_user_input(ast::Any, backend::REPL.REPLBackend)
  1╎    ╎    ╎ 37  @Base/boot.jl:369; eval(m::Module, e::Any)
   ╎    ╎    ╎  14  REPL[1]:3; profile_test(n::Int64)
   ╎    ╎    ╎   14  @Random/src/normal.jl:210; randn
   ╎    ╎    ╎    14  @Random/src/normal.jl:204; randn
   ╎    ╎    ╎     6   @Base/boot.jl:467; Array
  6╎    ╎    ╎    ╎ 6   @Base/boot.jl:455; Array
   ╎    ╎    ╎     4   @Random/src/normal.jl:190; randn!(rng::Random.MersenneTwister, A::Array{Float64, 3})

however, when using in conjunction with ProfileView there are some new warnings

julia> using ProfileView
[ Info: Precompiling ProfileView [c46f51b8-102a-5cf2-8d2c-8597cb0e0da7]

julia> @profview profile_test(10)
libunwind: malformed __unwind_info at 0x1045AC09C bad second level page
libunwind: malformed __unwind_info at 0x10D8E482C bad second level page
libunwind: malformed __unwind_info at 0x1073E75E0 bad second level page
Gtk.GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALSE, has-focus=FALSE, is-focus=FALSE, focus-on-click=TRUE, can-default=FALSE, has-default=FALSE, receives-default=FALSE, composite-child=FALSE, style, events=0, no-show-all=FALSE, has-tooltip=FALSE, tooltip-markup=NULL, tooltip-text=NULL, window, opacity=1.000000, double-buffered, halign=GTK_ALIGN_FILL, valign=GTK_ALIGN_FILL, margin-left, margin-right, margin-start=0, margin-end=0, margin-top=0, margin-bottom=0, margin=0, hexpand=FALSE, vexpand=FALSE, hexpand-set=FALSE, vexpand-set=FALSE, expand=FALSE, scale-factor=1, border-width=0, resize-mode, child, type=GTK_WINDOW_TOPLEVEL, title="Profile", role=NULL, resizable=TRUE, modal=FALSE, window-position=GTK_WIN_POS_NONE, default-width=800, default-height=600, destroy-with-parent=FALSE, hide-titlebar-when-maximized=FALSE, icon, icon-name=NULL, screen, type-hint=GDK_WINDOW_TYPE_HINT_NORMAL, skip-taskbar-hint=FALSE, skip-pager-hint=FALSE, urgency-hint=FALSE, accept-focus=TRUE, focus-on-map=TRUE, decorated=TRUE, deletable=TRUE, gravity=GDK_GRAVITY_NORTH_WEST, transient-for, attached-to, has-resize-grip, resize-grip-visible, application, is-active=FALSE, has-toplevel-focus=FALSE, startup-id, mnemonics-visible=FALSE, focus-visible=FALSE, is-maximized=FALSE)

Unfortunately when ProfileView does work, it doesn't immediately solve the dropped samples issue #38350 i.e. the profile view is still 80% blank
But I'm sure this is the right direction to go in to get that fixed.

Thanks!

src/signals-mach.c Outdated Show resolved Hide resolved
@omus
Copy link
Member Author

omus commented Jan 13, 2021

I'm fixing some issues with building Julia dependencies from source which should allow me to iterate faster on this PR.

@omus omus force-pushed the cv/llvm-libunwind branch from 9704c9e to 770b169 Compare January 17, 2021 20:08
@omus
Copy link
Member Author

omus commented Jan 17, 2021

All tests are passing but the "Profile" tests have the following warnings when using the LLVM libunwind dependency built from source and using JCPPFLAGS+=-DSUPPORT_UNWIND_FORCE_DWARF:

julia> Base.runtests("Profile")
Test  (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
Profile    (1) |        started at 2021-01-17T13:59:09.433
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x10B5EEC2C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203338F8 bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x7FFF203338F8 bad second level page
libunwind: malformed __unwind_info at 0x115754D78 bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x10D048C2C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x10D048C2C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x10D048C2C bad second level page
libunwind: malformed __unwind_info at 0x10D048C2C bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x10D048C2C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203338F8 bad second level page
libunwind: malformed __unwind_info at 0x115754D78 bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x7FFF203B217C bad second level page
libunwind: malformed __unwind_info at 0x7FFF203338F8 bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
libunwind: malformed __unwind_info at 0x1101F22BC bad second level page
Profile    (1) |     6.39 |   0.13 |  2.0 |     392.90 |  1507.67

Test Summary: | Pass  Total
  Overall     |   24     24
    SUCCESS

The output above was generated with a patched version of LLVM libunwind which lets Julia force the use of DWARF during unwinding (adapted from JuliaLang/libosxunwind@bb95a5c). The BB version doesn't have this patch at the moment which why the SUPPORT_UNWIND_FORCE_DWARF is not defined by default.

I am uncertain what is required to change to address these warnings. Someone more familiar with libunwind should probably continue this work at this point.

@Keno
Copy link
Member

Keno commented Feb 16, 2021

Alright, to summarize the situation here:

  1. On Darwin, there are two kinds of unwind info, "compact unwind" and DWARF based unwind info. Compact unwind is an Apple invention, while DWARF is standard and used on all other non-Windows platforms as well.
  2. In various situations, for example the Profiler, we need to unwind while not at a call site. This is referred to as "asynchronous unwinding". To support this, we build all our code with -fasynchronous-unwind-tables which forces the compiler to emit DWARF unwind info that permits asynchronous unwinding.
  3. Compact unwind info does not support asynchronous unwinding, so we build both kinds of unwind info for all our Darwin code.
  4. Apple's system libraries are inconsistent about which (if any) unwind info they support
  5. Unwinding with compact unwind info while at a an asynchronous unwind location may result in a crash (as does incorrect unwind info in general). Because of the prevalence of incorrect or missing unwind info, we have a mechanism to detect these crashes an reset the unwinder.
  6. Thus, on Darwin, our algorithm for unwinding is:
    a. Try unwinding with Compact unwind info (hoping we're not at an async-unsafe unwind location)
    b. ... If that crashes, try DWARF based unwinding (hoping we're not in an Apple system function that doesn't have it) (Doing this requires the patch included in this PR)
    c. ... If that crashes again, give up (causing issues like Profiling on MacOS suffering a lot of detached samples #38350)
  7. On x86, we used to carry a few patches to increase the likelihood of success by heuristically profiling the assembly of the function being unwound (JuliaLang/libosxunwind@ca57a5b). This works ok, but not great (as evidenced by Profiling on MacOS suffering a lot of detached samples #38350).

In an ideal world, compact unwind info would be async safe, so we could stop building both kinds of unwind info. There was a discussion of that at https://lists.llvm.org/pipermail/llvm-dev/2018-January/120741.html, but I don't believe that ever went anywhere.

In the absence of that, and more achievable, we should be able to use DWARF unwind info everywhere. This is already true for code we built, but of course sometimes Apple-provided code is on the stack, which is generally not async-unwind safe. I'll follow up with Apple to see if something can be done here. I understand @IanButterworth filed FB8974841 to request this also.

I've looked into the warnings posted by @omus above and they are generally just caused by these issues. However, they are part of a debug print and should be disabled. There is probably some release flag missing somewhere to turn these off. I think we can proceed with this PR as is and just east the unwind imprecision on aarch64. If it's easy to do, we can consider carrying JuliaLang/libosxunwind@ca57a5b in addition to the patch already in this PR, but I'd be ok with dropping it if we can work with Apple on a more longterm solution.

@omus
Copy link
Member Author

omus commented Feb 16, 2021

I've looked into the warnings posted by @omus above and they are generally just caused by these issues. However, they are part of a debug print and should be disabled. There is probably some release flag missing somewhere to turn these off.

You are correct that these are debug messages and they were disabled in libosxunwind. I discovered that using -DCMAKE_BUILD_TYPE=Release leaves the debug messages enabled while RelWithDebInfo or MinSizeRel does disabled them. I suspect this may be a bug.

If it's easy to do, we can consider carrying JuliaLang/libosxunwind@ca57a5b in addition to the patch already in this PR, but I'd be ok with dropping it if we can work with Apple on a more longterm solution.

I've generated a patch that works with LLVM libunwind and will post it soon.

@omus
Copy link
Member Author

omus commented Feb 16, 2021

I've rebased these changes and have included the patch @Keno asked for. As I already need to update the BB build I'll update to use LLVM libunwind 11.0.1: JuliaPackaging/Yggdrasil#2557

@omus omus force-pushed the cv/llvm-libunwind branch from 829c5b5 to 8c1433b Compare February 16, 2021 22:34
deps/Makefile Outdated Show resolved Hide resolved
@Keno
Copy link
Member

Keno commented Feb 16, 2021

You are correct that these are debug messages and they were disabled in libosxunwind. I discovered that using -DCMAKE_BUILD_TYPE=Release leaves the debug messages enabled while RelWithDebInfo or MinSizeRel does disabled them. I suspect this may be a bug.

That does sound like a bug. RelWithDebInfo is more debug-y than Release, so this doesn't make any sense.

@omus
Copy link
Member Author

omus commented Feb 17, 2021

I found one more thing to take care of. After doing a make distcleanall I encountered:

/Users/omus/Development/Julia/x86/latest/src/stackwalk.c:555:9: error: implicit declaration of function 'unw_init_local_dwarf' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
    if (unw_init_local_dwarf(&cursor, context) != UNW_ESUCCESS)
        ^
/Users/omus/Development/Julia/x86/latest/src/stackwalk.c:555:9: note: did you mean 'unw_init_local'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/libunwind.h:104:12: note: 'unw_init_local' declared here
extern int unw_init_local(unw_cursor_t *, unw_context_t *) LIBUNWIND_AVAIL;
           ^
1 error generated.
make[1]: *** [stackwalk.o] Error 1

This is due to to not copying the the modified header files which include the unw_init_local_dwarf function. I previously fixed this with the source build but need to put in a solution when using BB.

@omus
Copy link
Member Author

omus commented Feb 17, 2021

I just need to update the build_tarballs.jl to copy the header files. Easy enough

name = "LibOSXUnwind_jll"
uuid = "a83860b7-747b-57cf-bf1f-3e79990d037f"
version = "0.0.6+1"
name = "LLVMLibUnwind_jll"
Copy link
Member

@fredrikekre fredrikekre Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have platform dependent stdlibs now? Or is this installed on all platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like LibOSXUnwind_jll this is installed on all platforms but only used on macOS. I will mention that we should be able to use LLVMLibUnwind_jllon all platforms in the future but that's beyond the scope of this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it's nothing that is changed in this PR. Who even uses/depends on this stdlib?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe only the Julia internals depend libunwind. I'm not sure but I suspect the BB infrastructure needs a stdlib in place for the build to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, seems bad to expose things as official stdlib packages just because BB needs it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat would know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All binary dependencies must be represented by JLLs for a few reasons:

  1. If client code wants to load the library, they should do so by expressing a dependency on a JLL. This is better than them blindly running dlopen(libname) as it won't get confused by LD_LIBRARY_PATH, etc....

  2. If a user package requires a particular version of a binary dependency, then this will prevent that package from loading on an older version of Julia if that binary dependency is locked to an incompatible versions by Julia itself. By having explicit JLLs that are stdlibs, we are able to inform the Pkg resolver about the versions of binaries that ship with Julia.

  3. For executables that ship with Julia (like 7z) we can provide the standard tools that users are being trained to use to avoid issues around environment variables, paths, etc...

@omus
Copy link
Member Author

omus commented Feb 17, 2021

I'm validating that these changes are working correctly on Apple Silicon which was the main motivation for me to get this switch done. Currently working through this issue when building LLVM libunwind from source on aarch64-apple-darwin:

ld: in '/Users/omus/Development/Julia/arm64/latest/usr/lib/libunwind.dylib', building for macOS-arm64 but attempting to link with file built for macOS-x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Update: I sorted this out. It was just an issue on my local system where the x86 brew's cmake was leaking into my aarch64 environment...

@omus
Copy link
Member Author

omus commented Feb 17, 2021

Stacktraces for Apple Silicon are kind of working:

julia> stacktrace()
Base.StackTraces.StackFrame[]

julia> stacktrace(backtrace())
13-element Vector{Base.StackTraces.StackFrame}:
 top-level scope at REPL[3]:1
 eval at boot.jl:369 [inlined]
 eval_user_input(ast::Any, backend::REPL.REPLBackend) at REPL.jl:139
 repl_backend_loop(backend::REPL.REPLBackend) at REPL.jl:200
 start_repl_backend(backend::REPL.REPLBackend, consumer::Any) at REPL.jl:185
 run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool) at REPL.jl:317
 run_repl(repl::REPL.AbstractREPL, consumer::Any) at REPL.jl:305
 (::Base.var"#913#915"{Bool, Bool, Bool})(REPL::Module) at client.jl:394
 #invokelatest#2 at essentials.jl:710 [inlined]
 invokelatest at essentials.jl:708 [inlined]
 run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool) at client.jl:379
 exec_options(opts::Base.JLOptions) at client.jl:309
 _start() at client.jl:492

@omus omus force-pushed the cv/llvm-libunwind branch from 9c514d7 to 5f51d69 Compare February 18, 2021 16:50
@omus omus changed the title WIP: Replace libosxunwind wth LLVM libunwind Replace libosxunwind wth LLVM libunwind Feb 18, 2021
@omus
Copy link
Member Author

omus commented Feb 18, 2021

Thinking about it more the original motivation for replacing libosxunwind was to get off a 7 year old version of libunwind. LLVM libunwind does support Apple silicon but I think adding that support into Julia is beyond the scope of this PR.

The PR is ready to merge from my end. I'll let someone else pull the trigger :)

@omus omus force-pushed the cv/llvm-libunwind branch from 5f51d69 to 47bc0e8 Compare February 18, 2021 17:05
@omus
Copy link
Member Author

omus commented Feb 18, 2021

Encountering a new failure after doing a rebase. I thought removing the StdlibArtifacts.toml maybe would have fixed it but there seems to be something else going on.

@omus omus changed the title Replace libosxunwind wth LLVM libunwind WIP: Replace libosxunwind wth LLVM libunwind Feb 18, 2021
@Keno
Copy link
Member

Keno commented Feb 18, 2021

I'm on board with merging this and I can look info the aarch64 unwind issue, but something broke here and this is now broken on mac CI.

@omus
Copy link
Member Author

omus commented Feb 18, 2021

I'm on board with merging this and I can look info the aarch64 unwind issue, but something broke here and this is now broken on mac CI.

Thanks @Keno. I'll be looking into the CI failure

@omus omus force-pushed the cv/llvm-libunwind branch from a4549e1 to 5da3e7e Compare February 19, 2021 18:38
@omus
Copy link
Member Author

omus commented Feb 19, 2021

I believe the CI issue had to do with some bad caching resulting from accidentally committing the StdlibArtifacts.toml. I made some superficial corrections as well and did a final rebase. Should see a green CI 🤞

@omus omus changed the title WIP: Replace libosxunwind wth LLVM libunwind Replace libosxunwind wth LLVM libunwind Feb 19, 2021
@omus
Copy link
Member Author

omus commented Feb 19, 2021

RTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies stdlib Julia's standard library system:mac Affects only macOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants