-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler: enable parallel codegen with MT #14748
Compiler: enable parallel codegen with MT #14748
Conversation
This looks great. But it also seems to be a mix of different changes. Could we extract the independent refactorings (such as extracting |
3a08d9e
to
ac91f7c
Compare
Rebased on top of #14760. |
Refactors `Crystal::Compiler`: 1. extracts `#sequential_codegen`, `#parallel_codegen` and `#fork_codegen` methods; 2. merges `#codegen_many_units` into `#codegen` directly; 3. stops collecting reused units: `#fork_codegen` now updates `CompilationUnit#reused_compilation_unit?` state as reported by the forked processes, and `#print_codegen_stats` now counts & filters the reused units. Prerequisite for #14748 that will introduce `#mt_codegen`.
When compiled with -Dpreview_mt the compiler will take advantage of the MT environment to codegen the compilation units in parallel, avoiding fork (that's not supported with MT) and allowing parallel codegen on Windows.
5303d59
to
96b6f77
Compare
Rebased from master that merged #14760 (prerequisite) and ready for review. |
Isn't this addressed (using |
@beta-ziliani this could be a compile time ENV to change the default number of threads/schedulers. It's tangential to this pull request. |
The RFC-2 link in the OP here points to a non-existing URL. Think it was supposed to be crystal-lang/rfcs#2? |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: |
Hi, i didn't understand, can i know how to use this new feature in 1.14.0?
Thanks |
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: |
Hi, i do some test, it's seem like no any performance improvement, following is reproduce:
The latter even slower, did I do something wrong? Thanks. |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: |
@zw963 From the OP:
|
Okay, i saw a few performance improve when try to build one of my web project. old:
new:
Time has been reduced by about 10%. the reduced time almost come from the sys, I guess the project more larger, the effects more obviously. BTW: Not see multi-core be used even parallel codegen enabled, maybe this stage is very quickly, there is no chance to see it in htop? |
Implements parallel codegen of object files when MT is enabled in the compiler (
-Dpreview_mt
).It only impacts codegen for compilations with more than one compilation unit (module), that is when neither of
--single-module
,--release
or--cross-compile
is specified. This behavior is identical to the fork based codegen.Advantages:
The main points are increased portability and simpler logic, despite having to take care of LLVM thread safety quirks (see comments).
Issues:
The
threads
arg actually depicts the number of fibers, not threads, which is confusing and problematic: increasingthreads
but notCRYSTAL_WORKERS
will lead to more fibers than threads, with fibers being sheduled on the same threads, which won't bring any improvement.In fact
CRYSTAL_WORKERS
defaults to 4, whenthreads
defaulted to 8. With this patch it defaults toCRYSTAL_WORKERS
, so MT can end up being slower if we don't specifyCRYSTAL_WORKERS=8
.This is still not as efficient as it could be. The main fiber (that feeds the worker fibers) can get blocked by a worker fiber doing codegen, leading the other workers to starve. This is easily noticeable when compiling with
-O1
for example.Both issues will be fixable with RFC 2 where we can start an explicit context to run the worker fibers or start N isolated contexts (maybe a better idea). Until then, one should increase
CRYSTAL_WORKERS
.Supersedes #14227 and doesn't segfault (so far) with LLVM 12 or LLVM 18.1 🤞
TODO:
mt_parallel(units, n_threads)
CRYSTAL_CONFIG_WORKERS
to configure the default number of workers at compile time instead of the hardcoded 4 (in a distinct PR)