Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: slow rustc startup #8859

Closed
vadimcn opened this issue Aug 29, 2013 · 24 comments
Closed

Windows: slow rustc startup #8859

vadimcn opened this issue Aug 29, 2013 · 24 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows

Comments

@vadimcn
Copy link
Contributor

vadimcn commented Aug 29, 2013

C:\Rust\build>timeit "i686-pc-mingw32\stage2\bin\rustc.exe -v"
i686-pc-mingw32\stage2\bin\rustc.exe 0.8-pre (6c548ce 2013-08-27 23:29:33 -0700)
host: i686-pc-mingw32

Elapsed time = 1.701 seconds

Okay, maybe Windows is slower, but not this slow.

Profiling shows that roughly 1.6 seconds of this time were spent in function "_pei386_runtime_relocator", which is invoked upon loading of rust runtime libraries. librustc accounts for more than 90% of this time, the rest is mostly in rustllvm.

Apparently this function comes from mingw runtime and performs "runtime pseudo-relocations". Note that it calls VirtualQuery once and VirtualProtect twice per relocated address, so no wonder it's slow!

This is the first time I'm coming across pseudo-relocations. Does anybody here know what exactly they are, and why does librustc have so many of them?

@alexcrichton
Copy link
Member

This might explain the regression on windows with the libuv upgrade/bindings to process spawning (http://huonw.github.io/isrustfastyet/buildbot/), I assume it wasn't this slow in the pass, but could you try to see what the time of this is right before the libuv upgrade (f22b4b1)?

I don't really know how libuv upgrading could be relevant to runtime relocation, I'm not even sure what that is, but perhaps it's semi-related?

@vadimcn
Copy link
Contributor Author

vadimcn commented Aug 29, 2013

Actually, it's always been this slow on Windows.
Also, note that I'm starting it from the command line.

@alexcrichton
Copy link
Member

Oh, nevermind then

@brson
Copy link
Contributor

brson commented Aug 29, 2013

Nominating production ready.

@vadimcn
Copy link
Contributor Author

vadimcn commented Sep 5, 2013

Ok, my findings so far:

  • On Windows, importing data symbols from dynamic libraries requires the compiler to generate code with an extra level of indirection via IAT, so all such variables need to be marked up with __declspec(dllimport) attribute.
  • Gnu ld plays tricks with COFF import tables that allow one to avoid the above requirement (in most cases). This feature is called "auto-import". Explained here.
  • In some cases this trick doesn't work, so ld falls back to "pseudo-relocations", which are a linker-generated table of fixups, processed just after DLL is loaded by code coming from mingw runtime library (that's the _pei386_runtime_relocator function I mentioned above).
  • Mingw's "relocator" code is pretty inefficient when it comes across fixups pointing into read-only image sections,- it calls VirtualProtect() twice for every such fixup.
  • Originally, pseudo-relocations were used only as a fall-back, so there were very few of them and relocator performance didn't matter.
  • Some time later, pseudo-relocations v2 were introduced. For reasons not completely clear to me, v2 code uses pseudo-relocs for all data imports, not just for the "hard" cases.
  • Later yet, again, for reasons not completely clear to me, pseudo-relocs v2 had been made the default in ld.
  • Now we have a problem: ld generates lots of pseudo-relocs, and mingw runtime is very slow in processing them. In librustc alone I counted more than 10,000 pseudo-reloc entries.

What can be done:

  • The quickest fix is to pass --enable-runtime-pseudo-reloc-v1 switch to the linker, which enables the original ld behavior. Empirically, it appears to work just as well as -v2, but reduces the number of pseudo-relocs 200-fold, so now rustc starts up on my machine in 0.06 seconds, as compared to 1.70s before. "make check" now takes ~360s instead of ~900s (60% reduction).
  • However the original author of pseudo-relocs-v2 ld patch advices against doing this. I actually emailed Kai about this, but so far I could not understand from his responses what exactly might go wrong.
    Anyways, he recommends using mingw-w64 instead, where the slow relocator issue apparently has already been fixed.

@jdm
Copy link
Contributor

jdm commented Sep 5, 2013

Thank you for the thorough explanation!

@emberian
Copy link
Member

emberian commented Jan 6, 2014

Visiting for triage. The plan is to migrate to mingw-w64, which should mitigate this.

@vadimcn
Copy link
Contributor Author

vadimcn commented Apr 21, 2014

Now that we've switched to mingw-w64 toolchain, rustc -v execution time went down to 0.52s.
80% of that time is still being spent in _pei386_runtime_relocator, though.

@alexcrichton
Copy link
Member

Should we start using --enable-runtime-pseudo-reloc-v1?

@vadimcn
Copy link
Contributor Author

vadimcn commented Apr 21, 2014

Unfortunately --enable-runtime-pseudo-reloc-v1 no longer works - the linker complains about "undefined reference to `_nm___get_output_format'". Perhaps this is the one case where v1 pseudo-relocs prove to be insufficient.

@Tobba
Copy link
Contributor

Tobba commented Jun 25, 2014

It looks like this became much, much worse sometime recently (was 3 seconds before I updated my 1-2 weeks old rust) on a warm start

$ time rustc --version
rustc 0.11.0-pre
host: i686-pc-mingw32

real    0m8.725s
user    0m0.000s
sys     0m0.000s

Can someone confirm and maybe mark this as high priority? This causes rust to be very awful to use on Windows due to the build times

@alexcrichton
Copy link
Member

Are you using mingw or mingw-w64? For me rustc --version takes about 70ms

@Tobba
Copy link
Contributor

Tobba commented Jun 26, 2014

Normal mingw, I tried to build it on mingw-w64 but was unsuccessful, trying again now

@Tobba
Copy link
Contributor

Tobba commented Jun 26, 2014

MinGW64 looks to be pretty much entirely broken, LLVM wont compile against it at all

@alexcrichton
Copy link
Member

I would recommend perhaps reinstalling mingw-w64. All the windows bots are running mingw-w64, so there's it's likely a local problem than a mingw-w64 problem.

@Tobba
Copy link
Contributor

Tobba commented Jun 26, 2014

I got it to work, it really needs a guide though, msys2 works fine but the sourceforge project listed as mingw-w64 lacks a bunch of C99 exception stuff, causing LLVM not to build properly (if using the i686-w64-mingw target, which I assume you should, since I get no improvement otherwise)

@vadimcn
Copy link
Contributor Author

vadimcn commented Aug 15, 2014

My latest result is ~250ms with pseudo-relocs / ~80ms without, i.e. about 66% of rustc startup time is spent doing memory fixups.

BTW, the largest offenders, in terms of generated pseudo-relocations, currently are:

  • FILE_LINE from fail!(),
  • vec::PTR_MARKER,
  • LOG_LEVEL from liblog, and
  • LOC from log!()
    These account for 98% of the 17898 pseudo-relocations generated in rustc and its' libraries.

@vadimcn
Copy link
Contributor Author

vadimcn commented Aug 15, 2014

The other 33% of startup time is spent loading various dlls. Here's a procmon profile of a rustc startup. Delay-loading libraries might help here, though it isn't clear how many of these would be used in a typical compilation anyway.

@vadimcn
Copy link
Contributor Author

vadimcn commented Sep 16, 2014

@thestinger, I wouldn't call this fixed just yet. vec::PTR_MARKER was just on the the sources of pseudo-relocs.

@thestinger
Copy link
Contributor

@vadimcn: The ones you listed as causing 98% of the relocations are either in the scope of #17081 (debug logging) or what I took care of, so I closed it in favour of that.

@thestinger
Copy link
Contributor

If it's not completely fixed by #17081 then we could re-open it. Removing line numbers from fail!() and relying on backtraces is another important optimization. It's not just a problem on Windows - just more of a problem due to this relocation issue at start-up for cross-crate globals.

@vadimcn
Copy link
Contributor Author

vadimcn commented Sep 16, 2014

It isn't obvious to me that fail!() and log!() also are in scope of #17081...

@thestinger
Copy link
Contributor

@vadimcn: The logging comes from the debug! macros which fall into the same group as assert!. I guess we can leave this open until I file another issue for fail!, but I'm not sure there's anything we can actually do here beyond waiting on other issues.

@vadimcn
Copy link
Contributor Author

vadimcn commented Feb 20, 2015

Slow pseudo-relocs should be fixed now.

@vadimcn vadimcn closed this as completed Feb 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows
Projects
None yet
Development

No branches or pull requests

7 participants