Implement parallel `for_range()` for easier multithreading (reverted) #72784

myaaaaaaaaa · 2023-02-06T05:34:04Z

This makes writing multithreaded code as easy as changing a for loop, without the need to refactor loop bodies into separate functions or to create intermediate structs for passing data. See the diffs of modules/raycast/raycast_occlusion_cull.cpp and tests/core/threads/test_worker_thread_pool.h for examples.

Technically, I've named it for_range() here since it accepts integer indices.

Calinou · 2023-02-10T15:11:58Z

This reminds me, is parallel for something that would be feasible to implement in GDScript?

myaaaaaaaaa · 2023-02-10T17:48:53Z

It's fairly straightforward to write a foreach-style wrapper around a threading system, see the defunct thread_process_array() as an example:

godot/core/os/threaded_array_processor.h

Lines 52 to 85 in cd015d9

    
           template <class T> 
        
           void process_array_thread(void *ud) { 
        
           	T &data = *(T *)ud; 
        
           	while (true) { 
        
           		uint32_t index = data.index.increment(); 
        
           		if (index >= data.elements) { 
        
           			break; 
        
           		} 
        
           		data.process(index); 
        
           	} 
        
           } 
        
           template <class C, class M, class U> 
        
           void thread_process_array(uint32_t p_elements, C *p_instance, M p_method, U p_userdata) { 
        
           	ThreadArrayProcessData<C, U> data; 
        
           	data.method = p_method; 
        
           	data.instance = p_instance; 
        
           	data.userdata = p_userdata; 
        
           	data.index.set(0); 
        
           	data.elements = p_elements; 
        
           	data.process(0); //process first, let threads increment for next 
        
           	int thread_count = OS::get_singleton()->get_processor_count(); 
        
           	Thread *threads = memnew_arr(Thread, thread_count); 
        
           	for (int i = 0; i < thread_count; i++) { 
        
           		threads[i].start(process_array_thread<ThreadArrayProcessData<C, U>>, &data); 
        
           	} 
        
           	for (int i = 0; i < thread_count; i++) { 
        
           		threads[i].wait_to_finish(); 
        
           	} 
        
           	memdelete_arr(threads); 
        
           }

However, like with the traditional function pointers that thread_process_array() accepts, GDScript closures don't seem to be able to reference variables outside their scope (they get copied instead), which is a gotcha that users would need to keep in mind.

It would still be possible, it just wouldn't be a simple drop-in replacement for for loops

Mickeon · 2023-02-12T12:16:50Z

Wow! Just wow. With this PR the doors could open to quite a lot of editor optimisations as well.

myaaaaaaaaa · 2023-02-28T20:53:49Z

Combined with #72716 , the overhead for this should now be minimal even when used with tiny loop bodies like SAXPY

for_range(0, n, true, String(), [&](int i) {
	y[i] = a * x[i] + y[i];
});

RandomShaper · 2023-06-09T11:20:32Z

I think this is great. I just wonder if we lose something important with the removal of if (vertices_size > 1024).

modules/raycast/raycast_occlusion_cull.cpp

YuriSizov · 2023-07-14T17:12:24Z

Thanks!

reduz · 2023-07-27T10:44:43Z

I really, really think this should not have been merged. You don't need the syntactic sugar for something very seldom used that all it does is make readability worse.

This is not a general purpose function, it is meant for large tasks. if you use it in any array all you will achieve is it probably being much slower than just doing a regular for loop and users will not understand why.

I strongly ask to reconsider this and revert the merge.

RandomShaper · 2023-07-27T11:26:03Z

After some discussion, the decision is to keep this with the appropriate warnings and a more explicit API:
#79952

reduz · 2023-07-27T11:33:11Z

As a note, this PR is not a simple refactor. The original code only used the amount of CPUs available as thread count, this one uses a thread per element, which is much slower.

RandomShaper · 2023-07-27T11:37:52Z

For the records, there's #72716 seemigly addressing that. However, the WTP changes will be reverted until the topic can be discussed in its entirety, with some possibly coming back when there's a stronger agreement.

myaaaaaaaaa requested review from a team as code owners February 6, 2023 05:34

Calinou added enhancement topic:core labels Feb 6, 2023

Calinou added this to the 4.x milestone Feb 6, 2023

This was referenced Feb 28, 2023

Refactor _scene_cull() to use two stages #74118

Closed

Decrease overhead of WorkerThreadPool task processing #72716

Closed

myaaaaaaaaa mentioned this pull request May 12, 2023

Fix multiple issues in WorkerThreadPool #76945

Merged

myaaaaaaaaa mentioned this pull request Jun 8, 2023

Convert _scene_cull() to use parallel foreach() #78016

Closed

RandomShaper reviewed Jun 9, 2023

View reviewed changes

modules/raycast/raycast_occlusion_cull.cpp Outdated Show resolved Hide resolved

RandomShaper approved these changes Jun 9, 2023

View reviewed changes

Implement parallel foreach() for easier multithreading

e28868e

akien-mga modified the milestones: 4.x, 4.2 Jun 9, 2023

YuriSizov merged commit 2a595c2 into godotengine:master Jul 14, 2023

myaaaaaaaaa deleted the parallel-foreach branch July 14, 2023 17:10

clayjohn mentioned this pull request Jul 27, 2023

Convert simple uses of WorkerThreadPool to parallel for_range() #79490

Closed

YuriSizov changed the title ~~Implement parallel foreach() for easier multithreading~~ Implement parallel for_range() for easier multithreading Jul 27, 2023

YuriSizov mentioned this pull request Jul 27, 2023

Revert "Implement parallel foreach() for easier multithreading" #79953

Merged

YuriSizov changed the title ~~Implement parallel for_range() for easier multithreading~~ Implement parallel for_range() for easier multithreading (reverted) Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement parallel `for_range()` for easier multithreading (reverted) #72784

Implement parallel `for_range()` for easier multithreading (reverted) #72784

myaaaaaaaaa commented Feb 6, 2023 •

edited

Loading

Calinou commented Feb 10, 2023

myaaaaaaaaa commented Feb 10, 2023

Mickeon commented Feb 12, 2023

myaaaaaaaaa commented Feb 28, 2023

RandomShaper commented Jun 9, 2023

YuriSizov commented Jul 14, 2023

reduz commented Jul 27, 2023 •

edited

Loading

RandomShaper commented Jul 27, 2023

reduz commented Jul 27, 2023

RandomShaper commented Jul 27, 2023

Implement parallel for_range() for easier multithreading (reverted) #72784

Implement parallel for_range() for easier multithreading (reverted) #72784

Conversation

myaaaaaaaaa commented Feb 6, 2023 • edited Loading

Calinou commented Feb 10, 2023

myaaaaaaaaa commented Feb 10, 2023

Mickeon commented Feb 12, 2023

myaaaaaaaaa commented Feb 28, 2023

RandomShaper commented Jun 9, 2023

YuriSizov commented Jul 14, 2023

reduz commented Jul 27, 2023 • edited Loading

RandomShaper commented Jul 27, 2023

reduz commented Jul 27, 2023

RandomShaper commented Jul 27, 2023

Implement parallel `for_range()` for easier multithreading (reverted) #72784

Implement parallel `for_range()` for easier multithreading (reverted) #72784

myaaaaaaaaa commented Feb 6, 2023 •

edited

Loading

reduz commented Jul 27, 2023 •

edited

Loading