Introduce `-Zvirtual-function-elimination` codegen flag #96285

flip1995 · 2022-04-21T13:44:08Z

This PR adds a codegen flag -Zvirtual-function-elimination to enable the VFE optimization in LLVM. To make this work, additonal information has to be added to vtables (!vcall_visibility metadata and a typeid of the trait). Furthermore, instead of just loading functions, the llvm.type.checked.load intrinsic has to be used to map functions to vtables.

For technical details of the changes, see the commit messages.

I also tested this flag on https://github.com/tock/tock on different boards to verify that this fixes the issue tock/tock#2594. This flag is able to improve the size of the resulting binary by about 8k-9k bytes by removing the unused debug print functions.

Rendered documentation update

rust-highfive · 2022-04-21T13:44:11Z

Some changes occured to rustc_codegen_gcc

cc @antoyo

rust-highfive · 2022-04-21T13:44:13Z

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

tschuett · 2022-04-21T17:00:13Z

Did you mean -Z virtual-function-elimination?

flip1995 · 2022-04-21T17:29:04Z

I think this flag fits better as a -C flag because it influences codegen and is not a debugging/nightly only flag. I can change it to a -Z flag, if it should start out as a nightly only flag though.

Mark-Simulacrum · 2022-04-21T17:49:40Z

Is there a reason why we shouldn't (or LLVM shouldn't) enable this optimization by default with full LTO? Is there an expectation that it might miscompile something or otherwise cause us unexpected trouble?

tmiasko · 2022-04-21T18:55:08Z

compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs

+    let trait_ref_self = trait_ref.with_self_ty(cx.tcx, ty);
+    let trait_ref_self = cx.tcx.erase_regions(trait_ref_self);
+    let trait_def_id = trait_ref_self.def_id();
+    let trait_vis = cx.tcx.visibility(trait_def_id);


I think that the closest existing concept similar to vcall_visibility would be reachability, rather than visibility.

Example with an incorrectly optimized vtables showing why visibility is insufficient

trait Foo { fn foo(&self) { println!("foo") } } trait Bar { fn bar(&self) { println!("bar") } } trait Baz { fn baz(&self) { println!("baz") } } impl Foo for usize {} impl Bar for usize {} impl Baz for usize {} pub struct FooBox(Box<dyn Foo>); pub struct BarBox(Box<dyn Bar>); pub struct BazBox(Box<dyn Baz>); pub fn make_foo() -> FooBox { FooBox(Box::new(0)) } pub fn make_bar() -> BarBox { BarBox(Box::new(0)) } pub fn make_baz() -> BazBox { BazBox(Box::new(0)) } #[inline] pub fn f(a: FooBox) { a.0.foo() } pub fn g<T>(b: BarBox) { b.0.bar() } #[inline] fn h(c: BazBox) { c.0.baz() } pub const H: fn(BazBox) = h;

Thanks for pointing this out. I don't really have a solution for this right now. I tried looking at the reachability of the trait_ref and the members of the traits with the privacy_access_levels query. But for all of the traits and the functions in the traits (queried with tcx.vtable_entries) this just returns None, meaning it is not reachable IIUC.

I guess I would somehow have to check the reachability for the dyn Trait types. But I couldn't figure out how to do that.

Do you have ideas on how to best address this issue?

nikic · 2022-04-21T21:08:14Z

I think this flag fits better as a -C flag because it influences codegen and is not a debugging/nightly only flag. I can change it to a -Z flag, if it should start out as a nightly only flag though.

Yes, new flags are always added as unstable -Z flags first, and may later be stabilized as a -C flag. The "debugging options" terminology is an anachronism, the actual distinction between -C and -Z is stability.

bjorn3 · 2022-04-22T18:52:03Z

compiler/rustc_codegen_llvm/src/back/lto.rs

+                module.module_llvm.llmod(),
+                llvm::LLVMModFlagBehavior::Error,
+                "LTOPostLink\0".as_ptr().cast(),
+                1,


Should this be 11?

I believe the HasModuleFlag should be checking for 1, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.

In that context, it would be great if this code had a comment with context as to why this is being done.

In the mono repo, I found the value 1.

The LTOGenerator only adds the flag.
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/LTO/LTOCodeGenerator.cpp#L541

The code here is based on the file tschuett linked above. In the LLVM repo that code is responsible to set up LTO. Adding this flag was missing from the Rust code that should emulate the code in LTOCodeGenerator.cpp.

The 11 argument in the HasModuleFlag is the size of the string passed to this function, not the value of the flag. The 1 argument in the AddModuleFlag function is the value of the module flag. LLVM doesn't need a string size here, because it expects a \0 terminated string in this function.

compiler/rustc_symbol_mangling/src/v0.rs

compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs

src/test/codegen/virtual-function-elimination.rs

compiler/rustc_session/src/session.rs

nagisa · 2022-04-24T17:54:25Z

compiler/rustc_codegen_llvm/src/back/lto.rs

+                module.module_llvm.llmod(),
+                llvm::LLVMModFlagBehavior::Error,
+                "LTOPostLink\0".as_ptr().cast(),
+                1,


I believe the HasModuleFlag should be checking for 1, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.

In that context, it would be great if this code had a comment with context as to why this is being done.

compiler/rustc_codegen_llvm/src/back/lto.rs

flip1995 · 2022-04-29T13:06:58Z

Thanks for all the feedback. I think I addressed all the comments (I will clean up / squash the commits before this PR gets merged), except for the reachability issue, which needs a little bit more work/thinking on how to address this.

Is there a reason why we shouldn't (or LLVM shouldn't) enable this optimization by default with full LTO? Is there an expectation that it might miscompile something or otherwise cause us unexpected trouble?

Clang/LLVM doesn't set this flag by default, so I figured rustc shouldn't set this flag by default either. I don't know if there are any known issues with this optimization in LLVM

nagisa · 2022-05-01T17:02:04Z

This seems reasonable to me for a first shot at the functionality. Since it is unstable, I'm not super concerned about this being partially broken in some ways as e.g. pointed out by @tmiasko above, although we should definitely make sure the information about this limitation/bug is preserved in e.g. the tracking issue.

Another thing I'm not seeing is documentation for this feature the unstable book. Please write something about this feature here, and especially please note the known limitations and/or issues as well.

flip1995 · 2022-05-02T10:26:58Z

I added the documentation, cleaned up the commits and rebased on top of latest master.

insilications · 2022-05-05T07:00:00Z

Very nice work.

nagisa · 2022-05-10T18:53:55Z

@bors r+ Thanks!

bors · 2022-05-10T18:53:57Z

📌 Commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 has been approved by nagisa

bors · 2022-05-11T00:19:38Z

⌛ Testing commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 with merge 2c9812125037f96b4ba035183f237dfacc0f8d4a...

Adds the virtual-function-elimination unstable compiler flag and a check that this flag is only used in combination with -Clto. LLVM can only apply this optimization with fat LTO.

To apply the optimization the `Virtual Function Elim` module flag has to be set. To apply this optimization post-link the `LTOPostLink` module flag has to be set.

This function computes a Itanium-like typeid for a trait_ref. This is required for the VFE optimization in LLVM. It is used to map `llvm.type.checked.load` invocations, that is loading the function from a vtable, to the vtables this function could be from. It is important to note that `typeid`s are not unique. So multiple vtables of the same trait can share `typeid`s.

Add the intrinsic declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type) This is used in the VFE optimization when lowering loading functions from vtables to LLVM IR. The `metadata` is used to map the function to all vtables this function could belong to. This ensures that functions from vtables that might be used somewhere won't get removed.

This adds the typeid and `vcall_visibility` metadata to vtables when the -Cvirtual-function-elimination flag is set. The typeid is generated in the same way as for the `llvm.type.checked.load` intrinsic from the trait_ref. The offset that is added to the typeid is always 0. This is because LLVM assumes that vtables are constructed according to the definition in the Itanium ABI. This includes an "address point" of the vtable. In C++ this is the offset in the vtable where information for RTTI is placed. Since there is no RTTI information in Rust's vtables, this "address point" is always 0. This "address point" in combination with the offset passed to the `llvm.type.checked.load` intrinsic determines the final function that should be loaded from the vtable in the `WholeProgramDevirtualization` pass in LLVM. That's why the `llvm.type.checked.load` intrinsics are generated with the typeid of the trait, rather than with that of the function that is called. This matches what `clang` does for C++. The vcall_visibility metadata depends on three factors: 1. LTO level: Currently this is always fat LTO, because LLVM only supports this optimization with fat LTO. 2. Visibility of the trait: If the trait is publicly visible, VFE can only act on its vtables after linking. 3. Number of CGUs: if there is more than one CGU, also vtables with restricted visibility could be seen outside of the CGU, so VFE can only act on them after linking. To reflect this, there are three visibility levels: Public, LinkageUnit, and TranslationUnit.

The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms

flip1995 · 2022-06-14T13:12:03Z

I updated the 32-bit test with:

diff --git a/src/test/codegen/virtual-function-elimination-32bit.rs b/src/test/codegen/virtual-function-elimination-32bit.rs
index dd200278bb1..6f963363a99 100644
--- a/src/test/codegen/virtual-function-elimination-32bit.rs
+++ b/src/test/codegen/virtual-function-elimination-32bit.rs
@@ -31,5 +31,5 @@ pub fn main() {
     taking_t(&s);
 }

-// CHECK: ![[TYPE0]] = !{i64 0, !"[[MANGLED_TYPE0]]"}
+// CHECK: ![[TYPE0]] = !{i32 0, !"[[MANGLED_TYPE0]]"}
 // CHECK: ![[VCALL_VIS0]] = !{i64 2}

and rebased on latest master (to test locally again).

No other changes.

@bors r=nagisa

bors · 2022-06-14T13:12:05Z

📌 Commit 195f208 has been approved by nagisa

bors · 2022-06-14T21:37:14Z

⌛ Testing commit 195f208 with merge 2d1e075...

bors · 2022-06-15T00:17:59Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 2d1e075 to master...

rust-timer · 2022-06-15T01:35:09Z

Finished benchmarking commit (2d1e075): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	5.2%	5.2%	1
Regressions 😿 (secondary)	4.5%	4.5%	1
Improvements 🎉 (primary)	-2.0%	-2.0%	1
Improvements 🎉 (secondary)	-2.4%	-2.4%	1
All 😿🎉 (primary)	1.6%	5.2%	2

Cycles

This benchmark run did not return any relevant results for this metric.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩
number of relevant changes ↩

rust-highfive assigned nagisa Apr 21, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 21, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 21, 2022

tmiasko reviewed Apr 21, 2022

View reviewed changes

bjorn3 reviewed Apr 22, 2022

View reviewed changes

compiler/rustc_symbol_mangling/src/v0.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Apr 22, 2022

View reviewed changes

compiler/rustc_symbol_mangling/src/v0.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Apr 22, 2022

View reviewed changes

compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs Outdated Show resolved Hide resolved

nagisa reviewed Apr 24, 2022

View reviewed changes

tschuett reviewed Apr 24, 2022

View reviewed changes

compiler/rustc_codegen_llvm/src/back/lto.rs Outdated Show resolved Hide resolved

flip1995 force-pushed the pk-vfe branch from abf304f to 34aac0b Compare May 2, 2022 10:26

This comment has been minimized.

Sign in to view

flip1995 force-pushed the pk-vfe branch from 34aac0b to 706c537 Compare May 2, 2022 14:34

This comment has been minimized.

Sign in to view

flip1995 force-pushed the pk-vfe branch from 706c537 to 798a2d9 Compare May 2, 2022 15:25

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 10, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 7, 2022

This comment has been minimized.

Sign in to view

flip1995 added 8 commits June 14, 2022 14:50

Add -Zvirtual-function-elimination flag

def3fd8

Adds the virtual-function-elimination unstable compiler flag and a check that this flag is only used in combination with -Clto. LLVM can only apply this optimization with fat LTO.

Add LLVM module flags required for the VFE opt

20f597f

To apply the optimization the `Virtual Function Elim` module flag has to be set. To apply this optimization post-link the `LTOPostLink` module flag has to be set.

Add test for VFE optimization

996c6b7

Add user documentation for -Zvirtual-function-elimination

a93ea7e

Add VFE test for 32 bit

195f208

The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms

flip1995 force-pushed the pk-vfe branch from 78e9608 to 195f208 Compare June 14, 2022 13:11

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 15, 2022

bors merged commit 2d1e075 into rust-lang:master Jun 15, 2022

rustbot added this to the 1.63.0 milestone Jun 15, 2022

alistair23 mentioned this pull request Jun 15, 2022

Process trait implementation is not optimised out tock/tock#2594

Closed

This was referenced Jun 15, 2022

Add fine-grained LLVM CFI support to the Rust compiler #95548

Merged

[RFC] Support .comment section like GCC/Clang (!llvm.ident) #97550

Merged

hudson-ayers mentioned this pull request Jun 15, 2022

Support whole program devirtualization #68262

Open

flip1995 deleted the pk-vfe branch June 15, 2022 07:42

matthiaskrgr mentioned this pull request Jun 15, 2022

-Zvirtual-function-elimination miscompiles simple hello world program if other optimizations are turned on #98132

Open

ealmloff mentioned this pull request Aug 28, 2023

Rework Event System DioxusLabs/dioxus#1402

Merged

matthiaskrgr mentioned this pull request Apr 20, 2024

ICE: expected principal trait object #124092

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `-Zvirtual-function-elimination` codegen flag #96285

Introduce `-Zvirtual-function-elimination` codegen flag #96285

flip1995 commented Apr 21, 2022 •

edited

Loading

rust-highfive commented Apr 21, 2022

rust-highfive commented Apr 21, 2022

tschuett commented Apr 21, 2022

flip1995 commented Apr 21, 2022

Mark-Simulacrum commented Apr 21, 2022

tmiasko Apr 21, 2022 •

edited

Loading

flip1995 Apr 29, 2022

nikic commented Apr 21, 2022

bjorn3 Apr 22, 2022

nagisa Apr 24, 2022

tschuett Apr 24, 2022 •

edited

Loading

flip1995 Apr 29, 2022

nagisa Apr 24, 2022

flip1995 commented Apr 29, 2022

nagisa commented May 1, 2022

flip1995 commented May 2, 2022

This comment has been minimized.

This comment has been minimized.

insilications commented May 5, 2022

nagisa commented May 10, 2022

bors commented May 10, 2022

bors commented May 11, 2022

This comment has been minimized.

flip1995 commented Jun 14, 2022

bors commented Jun 14, 2022

bors commented Jun 14, 2022

bors commented Jun 15, 2022

rust-timer commented Jun 15, 2022

Introduce -Zvirtual-function-elimination codegen flag #96285

Introduce -Zvirtual-function-elimination codegen flag #96285

Conversation

flip1995 commented Apr 21, 2022 • edited Loading

rust-highfive commented Apr 21, 2022

rust-highfive commented Apr 21, 2022

tschuett commented Apr 21, 2022

flip1995 commented Apr 21, 2022

Mark-Simulacrum commented Apr 21, 2022

tmiasko Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic commented Apr 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschuett Apr 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flip1995 commented Apr 29, 2022

nagisa commented May 1, 2022

flip1995 commented May 2, 2022

This comment has been minimized.

This comment has been minimized.

insilications commented May 5, 2022

nagisa commented May 10, 2022

bors commented May 10, 2022

bors commented May 11, 2022

This comment has been minimized.

flip1995 commented Jun 14, 2022

bors commented Jun 14, 2022

bors commented Jun 14, 2022

bors commented Jun 15, 2022

rust-timer commented Jun 15, 2022

Footnotes

Introduce `-Zvirtual-function-elimination` codegen flag #96285

Introduce `-Zvirtual-function-elimination` codegen flag #96285

flip1995 commented Apr 21, 2022 •

edited

Loading

tmiasko Apr 21, 2022 •

edited

Loading

tschuett Apr 24, 2022 •

edited

Loading