Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel for_range() for easier multithreading (reverted) #72784

Merged
merged 1 commit into from
Jul 14, 2023
Merged

Implement parallel for_range() for easier multithreading (reverted) #72784

merged 1 commit into from
Jul 14, 2023

Conversation

myaaaaaaaaa
Copy link
Contributor

@myaaaaaaaaa myaaaaaaaaa commented Feb 6, 2023

This makes writing multithreaded code as easy as changing a for loop, without the need to refactor loop bodies into separate functions or to create intermediate structs for passing data. See the diffs of modules/raycast/raycast_occlusion_cull.cpp and tests/core/threads/test_worker_thread_pool.h for examples.

Technically, I've named it for_range() here since it accepts integer indices.

@myaaaaaaaaa myaaaaaaaaa requested review from a team as code owners February 6, 2023 05:34
@Calinou Calinou added this to the 4.x milestone Feb 6, 2023
@Calinou
Copy link
Member

Calinou commented Feb 10, 2023

This reminds me, is parallel for something that would be feasible to implement in GDScript?

@myaaaaaaaaa
Copy link
Contributor Author

It's fairly straightforward to write a foreach-style wrapper around a threading system, see the defunct thread_process_array() as an example:

template <class T>
void process_array_thread(void *ud) {
T &data = *(T *)ud;
while (true) {
uint32_t index = data.index.increment();
if (index >= data.elements) {
break;
}
data.process(index);
}
}
template <class C, class M, class U>
void thread_process_array(uint32_t p_elements, C *p_instance, M p_method, U p_userdata) {
ThreadArrayProcessData<C, U> data;
data.method = p_method;
data.instance = p_instance;
data.userdata = p_userdata;
data.index.set(0);
data.elements = p_elements;
data.process(0); //process first, let threads increment for next
int thread_count = OS::get_singleton()->get_processor_count();
Thread *threads = memnew_arr(Thread, thread_count);
for (int i = 0; i < thread_count; i++) {
threads[i].start(process_array_thread<ThreadArrayProcessData<C, U>>, &data);
}
for (int i = 0; i < thread_count; i++) {
threads[i].wait_to_finish();
}
memdelete_arr(threads);
}

However, like with the traditional function pointers that thread_process_array() accepts, GDScript closures don't seem to be able to reference variables outside their scope (they get copied instead), which is a gotcha that users would need to keep in mind.

It would still be possible, it just wouldn't be a simple drop-in replacement for for loops

@Mickeon
Copy link
Contributor

Mickeon commented Feb 12, 2023

Wow! Just wow. With this PR the doors could open to quite a lot of editor optimisations as well.

@myaaaaaaaaa
Copy link
Contributor Author

Combined with #72716 , the overhead for this should now be minimal even when used with tiny loop bodies like SAXPY

for_range(0, n, true, String(), [&](int i) {
	y[i] = a * x[i] + y[i];
});

@RandomShaper
Copy link
Member

I think this is great. I just wonder if we lose something important with the removal of if (vertices_size > 1024).

@akien-mga akien-mga modified the milestones: 4.x, 4.2 Jun 9, 2023
@YuriSizov YuriSizov merged commit 2a595c2 into godotengine:master Jul 14, 2023
@myaaaaaaaaa myaaaaaaaaa deleted the parallel-foreach branch July 14, 2023 17:10
@YuriSizov
Copy link
Contributor

Thanks!

@reduz
Copy link
Member

reduz commented Jul 27, 2023

I really, really think this should not have been merged. You don't need the syntactic sugar for something very seldom used that all it does is make readability worse.

This is not a general purpose function, it is meant for large tasks. if you use it in any array all you will achieve is it probably being much slower than just doing a regular for loop and users will not understand why.

I strongly ask to reconsider this and revert the merge.

@YuriSizov YuriSizov changed the title Implement parallel foreach() for easier multithreading Implement parallel for_range() for easier multithreading Jul 27, 2023
@RandomShaper
Copy link
Member

After some discussion, the decision is to keep this with the appropriate warnings and a more explicit API:
#79952

@reduz
Copy link
Member

reduz commented Jul 27, 2023

As a note, this PR is not a simple refactor. The original code only used the amount of CPUs available as thread count, this one uses a thread per element, which is much slower.

@RandomShaper
Copy link
Member

For the records, there's #72716 seemigly addressing that. However, the WTP changes will be reverted until the topic can be discussed in its entirety, with some possibly coming back when there's a stronger agreement.

@YuriSizov YuriSizov changed the title Implement parallel for_range() for easier multithreading Implement parallel for_range() for easier multithreading (reverted) Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants