[async] Support constant folding in async mode #1778

k-ye · 2020-08-25T13:01:27Z

Constant folding (CF) was previously disabled in async mode because it has caused deadlock. The problem is that when a user kernel gets compiled, it does the CF pass, which adds the evaluator kernel to the queue, which never gets scheduled because the queue is blocked waiting for the user kernel to finish...

We can fix this by making the CF kernels always run synchronously (see Kernel::operator()). However, we still need two more changes:

I added Program::device_synchronize(), which is more primitive than synchronize(), and always calls into the targeted GPU backend's device/stream sync method.

We have to remove synchronize() from Program::fetch_result. The implication of this is that, whenever we fetch the result, we must make sure a proper sync is done. The only exception to this that I could find is in Program:: initialize_runtime_system:

taichi/taichi/program/program.cpp

Lines 292 to 298 in 5c39e84

    
           llvm_runtime = fetch_result<void *>(taichi_result_buffer_ret_value_id); 
        
           TI_TRACE("LLVMRuntime pointer fetched"); 
        
           if (arch_use_host_memory(config.arch) || config.use_unified_memory) { 
        
             runtime->call<void *>("runtime_get_mem_req_queue", llvm_runtime); 
        
             auto mem_req_queue = 
        
                 fetch_result<void *>(taichi_result_buffer_ret_value_id);

. But I don't think we need a sync here anyway @yuanming-hu ?

FYI, here are the fetch_result calls I could find:

Kernel::get_ret_int, Kernel::get_ret_float.
- If this is called from within an SNode reader kernel, then it's already synced in
  
  taichi/taichi/ir/snode.cpp
  
  Lines 159 to 160 in 5c39e84
  
  get_current_program().synchronize();
  
  auto ret = reader_kernel->get_ret_float(0);
- Otherwise, this is called from a kernel that returns value. I've reordered kernel.py so that a sync is done before we fetch the result.
Program::check_runtime_error: sync is done in the beginning:

taichi/taichi/program/program.cpp

Lines 403 to 404 in 5c39e84

void Program::check_runtime_error() {

synchronize();
Program::initialize_runtime_system: has explained above.
the constant folding pass: this has been fixed in this PR

Related issue = #N/A

[Click here for the format server]

Update 08/26/2020:

yuanming discovered that I missed the fetch_result in Program::runtime_query, which has caused failures during GC on CUDA:

taichi/taichi/program/program.h

Lines 247 to 260 in 139ff5d

    
           T runtime_query(const std::string &key, Args... args) { 
        
             TI_ASSERT(arch_uses_llvm(config.arch)); 
        
             TaichiLLVMContext *tlctx = nullptr; 
        
             if (llvm_context_device) { 
        
               tlctx = llvm_context_device.get(); 
        
             } else { 
        
               tlctx = llvm_context_host.get(); 
        
             } 
        
             auto runtime = tlctx->runtime_jit_module; 
        
             runtime->call<void *, Args...>("runtime_" + key, llvm_runtime, 
        
                                            std::forward<Args>(args)...); 
        
             return fetch_result<T>(taichi_result_buffer_runtime_query_id);

As a result, we added back device_synchronize() to fetch_result.

codecov · 2020-08-25T13:14:20Z

Codecov Report

Merging #1778 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1778      +/-   ##
==========================================
+ Coverage   42.58%   42.61%   +0.02%     
==========================================
  Files          44       44              
  Lines        6185     6183       -2     
  Branches     1072     1071       -1     
==========================================
+ Hits         2634     2635       +1     
+ Misses       3397     3394       -3     
  Partials      154      154

Impacted Files	Coverage Δ
python/taichi/lang/kernel.py	`71.03% <100.00%> (+0.07%)`	⬆️
python/taichi/code_format.py	`12.37% <0.00%> (+0.37%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6937b1e...d17391f. Read the comment docs.

yuanming-hu

Just a random comment. I'll push my local commit later.

yuanming-hu · 2020-08-25T22:18:05Z

taichi/program/async_engine.cpp

@@ -175,7 +175,7 @@ void ExecutionQueue::enqueue(KernelLaunchRecord &&ker) {
        auto config = kernel->program.config;
        auto ir = stmt;
        offload_to_executable(
-            ir, config, /*verbose=*/false,
+            ir, config, /*verbose=*/config.print_ir,


Unfortunately setting this to true makes parallel IR printing crash :-( #1750

Let's stick to the original version for now. I'll revert this in my commit in a minute.

yuanming-hu

Sorry about the force push... I fixed a small issue on CUDA. Please feel free to merge if my changes are reasonable. Thanks!

k-ye requested review from yuanming-hu, xumingkuan and taichi-gardener August 25, 2020 13:01

k-ye and others added 2 commits August 25, 2020 18:08

[async] Support constant folding in async mode

a0c772c

fix CUDA tests

af0063b

yuanming-hu reviewed Aug 26, 2020

View reviewed changes

finalize

2bd534a

yuanming-hu force-pushed the constant-fold branch from a4db06d to 2bd534a Compare August 26, 2020 04:11

yuanming-hu approved these changes Aug 26, 2020

View reviewed changes

add comment

d17391f

k-ye merged commit d38e83c into taichi-dev:master Aug 26, 2020

k-ye deleted the constant-fold branch August 26, 2020 10:31

yuanming-hu mentioned this pull request Sep 1, 2020

[release] v0.6.31 #1818

Merged

yolo2themoon mentioned this pull request Nov 17, 2021

[perf] Add async_mode restriction to ti.sync() of external_arrays in class Kernel #3535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[async] Support constant folding in async mode #1778

[async] Support constant folding in async mode #1778

k-ye commented Aug 25, 2020 •

edited

Loading

codecov bot commented Aug 25, 2020 •

edited

Loading

yuanming-hu left a comment

yuanming-hu Aug 25, 2020

yuanming-hu left a comment

	llvm_runtime = fetch_result<void *>(taichi_result_buffer_ret_value_id);
	TI_TRACE("LLVMRuntime pointer fetched");

	if (arch_use_host_memory(config.arch) \|\| config.use_unified_memory) {
	runtime->call<void *>("runtime_get_mem_req_queue", llvm_runtime);
	auto mem_req_queue =
	fetch_result<void *>(taichi_result_buffer_ret_value_id);

	get_current_program().synchronize();
	auto ret = reader_kernel->get_ret_float(0);

	T runtime_query(const std::string &key, Args... args) {
	TI_ASSERT(arch_uses_llvm(config.arch));

	TaichiLLVMContext *tlctx = nullptr;
	if (llvm_context_device) {
	tlctx = llvm_context_device.get();
	} else {
	tlctx = llvm_context_host.get();
	}

	auto runtime = tlctx->runtime_jit_module;
	runtime->call<void *, Args...>("runtime_" + key, llvm_runtime,
	std::forward<Args>(args)...);
	return fetch_result<T>(taichi_result_buffer_runtime_query_id);

	void Program::check_runtime_error() {
	synchronize();

[async] Support constant folding in async mode #1778

[async] Support constant folding in async mode #1778

Conversation

k-ye commented Aug 25, 2020 • edited Loading

codecov bot commented Aug 25, 2020 • edited Loading

Codecov Report

yuanming-hu left a comment

Choose a reason for hiding this comment

yuanming-hu Aug 25, 2020

Choose a reason for hiding this comment

yuanming-hu left a comment

Choose a reason for hiding this comment

k-ye commented Aug 25, 2020 •

edited

Loading

codecov bot commented Aug 25, 2020 •

edited

Loading