Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mono AOT full program analyses and interprocedural optimizations #80942

Closed
2 tasks
Tracked by #80938
kotlarmilos opened this issue Jan 20, 2023 · 11 comments
Closed
2 tasks
Tracked by #80938

Mono AOT full program analyses and interprocedural optimizations #80942

kotlarmilos opened this issue Jan 20, 2023 · 11 comments
Assignees
Milestone

Comments

@kotlarmilos
Copy link
Member

kotlarmilos commented Jan 20, 2023

Description

For target platforms which do not support dynamic code generation (e.g., iOS), programs are compiled in the full AOT mode. Currently, Mono AOT compiler compiles such programs by compiling managed assemblies one by one. One of the biggest advantages of this approach is that it is not necessary to recompile the whole application if there is a change in a single assembly.

On the other hand, the compiler does not have any knowledge about cross-assembly references and heavily relies on the passes performed by the ILLinker, the tool performing full program analysis in the AOT pipeline. This prevents the compiler to do a better job at removing the unreachable code and to perform better inter-procedural optimizations.

This experiment checks if inter-procedural optimizations can be achieved when all assemblies are compiled together.

Tasks

  • Conducting experiments and obtaining results
  • (optional) Integration with the main branch as an experimental feature

/cc: @SamMonoRT

@ghost
Copy link

ghost commented Jan 20, 2023

Tagging subscribers to 'os-ios': @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

For target platforms which do not support dynamic code generation (e.g., iOS), programs are compiled in the full AOT mode. Currently, Mono AOT compiler compiles such programs by compiling managed assemblies one by one. One of the biggest advantages of this approach is that it is not necessary to recompile the whole application if there is a change in a single assembly.

On the other hand, the compiler does not have any knowledge about cross-assembly references and heavily relies on the passes performed by the ILLinker, the tool performing full program analysis in the AOT pipeline. This prevents the compiler to do a better job at removing the unreachable code and to perform better inter-procedural optimizations.

This experiment checks if inter-procedural optimizations can be achieved when all assemblies are compiled together.

Tasks

  • Mono AOT compiler updated to produce single library in full AOT mode
  • Conducting experiments and obtaining results
  • (optional) Integration with the main branch as an experimental feature

/cc: @SamMonoRT

Author: kotlarmilos
Assignees: kotlarmilos, ivanpovazan, LeVladIonescu
Labels:

area-Codegen-AOT-mono, os-ios

Milestone: Future

@EgorBo
Copy link
Member

EgorBo commented Jan 20, 2023

I thought that in the end for the iOS case you mentioned it will be a set of static files (per assembly) which will be linked all together (presumably, with LTO?) into a single binary?

Also, I assume that Mono does inline methods from external assemblies if they're small, doesn't it?

@kotlarmilos
Copy link
Member Author

I thought that in the end for the iOS case you mentioned it will be a set of static files (per assembly) which will be linked all together (presumably, with LTO?) into a single binary?

@EgorBo good point! Currently, Mono AOT for iOS produces a set of static files (per assembly) that are then linked all together by a native linker into a single binary. It means that assemblies are passed into the AOT compiler separately.

By passing them together into the AOT compiler (somewhat similar to dedup feature in #80419) we want to check if the ILLinker "pre-pass" will bring inter-procedural optimizations.

Also, I assume that Mono does inline methods from external assemblies if they're small, doesn't it?

Not sure about that, maybe @vargaz or @ivanpovazan can confirm.

Ideas or feedback on how it can be improved are more than welcome :)

@EgorBo
Copy link
Member

EgorBo commented Jan 20, 2023

Well, I assume generally if you combine the whole thing in one piece of LLVM IR - LLVM will be able to perform cross-managed-assembly inlining by its own so it should be a good for perf anyway? 🙂

@AndyAyersMS
Copy link
Member

You might find this prototype we did to add IPA to crossgen 2 interesting (note it was never merged): https://github.com/erozenfeld/runtime/commits/Crossgen2WPO

@kotlarmilos
Copy link
Member Author

Well, I assume generally if you combine the whole thing in one piece of LLVM IR - LLVM will be able to perform cross-managed-assembly inlining by its own so it should be a good for perf anyway? 🙂

Good, we will figure it out.

You might find this prototype we did to add IPA to crossgen 2 interesting (note it was never merged): https://github.com/erozenfeld/runtime/commits/Crossgen2WPO

Thanks for sharing it. I see changes in ILCompiler that might be aligned with #80941.

@LeVladIonescu
Copy link
Contributor

With the assumption that during wasm's AOT compilation all the assemblies are passed to the compiler in the same time to produce a single output file I've started investigating how this works in order to do something similar for iOS. Here's the outcome:

Steps:

  • Enable AOT compilation by setting _WasmShouldAOT and RunAOTCompilation parameters to true
  • Build sample browser using : ./../../../../../dotnet.sh build /p:TargetOS=browser /p:TargetArchitecture=wasm /p:Configuration=Debug /p:RunAOTCompilation=true /bl (from project directory)

After inspecting the binlog I've noticed that all assemblies are passed separately to the AOT compiler, which means that it's the same principle on how it's done for iOS.

For example:

  • /runtime/artifacts/bin/mono/browser.wasm.Debug/cross/browser-wasm/mono-aot-cross --debug --llvm "--aot=no-opt,static,direct-icalls,deterministic,dwarfdebug,llvm-path=/runtime/src/mono/wasm/emsdk/upstream/bin/,static,dedup-skip,llvmonly,interp,asmonly,llvm-outfile=/runtime/artifacts/obj/mono/Wasm.Browser.Sample/wasm/Debug/browser-wasm/wasm/for-build/System.Private.CoreLib.dll.bc.tmp" "System.Private.CoreLib.dll"
  • /runtime/artifacts/bin/mono/browser.wasm.Debug/cross/browser-wasm/mono-aot-cross --debug --llvm "--aot=no-opt,static,direct-icalls,deterministic,dwarfdebug,llvm-path=/runtime/src/mono/wasm/emsdk/upstream/bin/,static,dedup-skip,llvmonly,interp,asmonly,llvm-outfile=/runtime/artifacts/obj/mono/Wasm.Browser.Sample/wasm/Debug/browser-wasm/wasm/for-build/Wasm.Browser.Sample.dll.bc.tmp" "Wasm.Browser.Sample.dll"

Question : Could it be there another flag which can be set in order to enable passing all assemblies in the same time?

Proposal for next step:

Try changing how mono_aot_assemblies() works. Currently, this function is calling aot_assembly() for every assembly and is also emitting the AOT image of it. Instead of this, we can emit only one AOT image for all the assemblies in mono_aot_assemblies() after all the assemblies have been compiled.

We can split the AOT compilation in 3 phases:

  • collect phase – aot_assembly() without compile_methods() and emit_aot_image()
  • compilation phase – compile_methods() having full program analysis and with linkonce enabled – here we should have all the methods from all the assemblies in the MonoAotCompile acfg variable
  • emit phase – emit_aot_image()

One concern with this strategy would be how are duplicates handled before emitting the AOT image, will llvm-linkonce take care of those?
Another one is regarding MonoAotCompile struct. In order to collect all the methods in the same acfg MonoAotCompile variable we will need to append the date collected and processed by aot_assembly() to the acfg variable which will store all those methods. Would this be a good approach?

@vargaz
Copy link
Contributor

vargaz commented Feb 2, 2023

This will require a large amount of changes. Both the aot compiler, and the aot runtime expects a one-to-one mapping between assemblies and aot images.

@kotlarmilos
Copy link
Member Author

@LeVladIonescu good progress!

This will require a large amount of changes. Both the aot compiler, and the aot runtime expects a one-to-one mapping between assemblies and aot images.

I agree that it would require a large amount of changes and would be hard to test properly. What we want to achieve by having a single output file is a full program analyses by LLVM. It might not necessarily mean that the code should be emitted in single AOT image. It might be possible to collect methods from all assemblies and provide them during the LLVM compilation but to emit only a subset from a corresponding assembly.

@LeVladIonescu
Copy link
Contributor

LeVladIonescu commented Feb 3, 2023

After offline sync we agreed to change the strategy.

Now, the next step is to try to not allow the AOT compiler add methods into method_id -> method table in order to let LLVM better optimize those methods.
We will target this for the HelloiOS app and will compare the results.

@kotlarmilos kotlarmilos modified the milestones: Future, 8.0.0, 9.0.0 Jul 13, 2023
@ivanpovazan ivanpovazan modified the milestones: 9.0.0, Future Feb 9, 2024
@kotlarmilos
Copy link
Member Author

Obsolete, lower priority compared to other tasks.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants