-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[async] Support constant folding in async mode #1778
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1778 +/- ##
==========================================
+ Coverage 42.58% 42.61% +0.02%
==========================================
Files 44 44
Lines 6185 6183 -2
Branches 1072 1071 -1
==========================================
+ Hits 2634 2635 +1
+ Misses 3397 3394 -3
Partials 154 154
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a random comment. I'll push my local commit later.
taichi/program/async_engine.cpp
Outdated
@@ -175,7 +175,7 @@ void ExecutionQueue::enqueue(KernelLaunchRecord &&ker) { | |||
auto config = kernel->program.config; | |||
auto ir = stmt; | |||
offload_to_executable( | |||
ir, config, /*verbose=*/false, | |||
ir, config, /*verbose=*/config.print_ir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately setting this to true
makes parallel IR printing crash :-( #1750
Let's stick to the original version for now. I'll revert this in my commit in a minute.
a4db06d
to
2bd534a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the force push... I fixed a small issue on CUDA. Please feel free to merge if my changes are reasonable. Thanks!
Constant folding (CF) was previously disabled in async mode because it has caused deadlock. The problem is that when a user kernel gets compiled, it does the CF pass, which adds the evaluator kernel to the queue, which never gets scheduled because the queue is blocked waiting for the user kernel to finish...
We can fix this by making the CF kernels always run synchronously (see
Kernel::operator()
). However, we still need two more changes:Program::device_synchronize()
, which is more primitive thansynchronize()
, and always calls into the targeted GPU backend's device/stream sync method.synchronize()
fromProgram::fetch_result
. The implication of this is that, whenever we fetch the result, we must make sure a proper sync is done. The only exception to this that I could find is inProgram:: initialize_runtime_system
:taichi/taichi/program/program.cpp
Lines 292 to 298 in 5c39e84
FYI, here are the
fetch_result
calls I could find:Kernel::get_ret_int
,Kernel::get_ret_float
.taichi/taichi/ir/snode.cpp
Lines 159 to 160 in 5c39e84
kernel.py
so that a sync is done before we fetch the result.Program::check_runtime_error
: sync is done in the beginning:taichi/taichi/program/program.cpp
Lines 403 to 404 in 5c39e84
Program::initialize_runtime_system
: has explained above.Related issue = #N/A
[Click here for the format server]
Update 08/26/2020:
yuanming discovered that I missed the
fetch_result
inProgram::runtime_query
, which has caused failures during GC on CUDA:taichi/taichi/program/program.h
Lines 247 to 260 in 139ff5d
As a result, we added back
device_synchronize()
tofetch_result
.