Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease overhead of WorkerThreadPool task processing #72716

Closed
wants to merge 1 commit into from
Closed

Decrease overhead of WorkerThreadPool task processing #72716

wants to merge 1 commit into from

Conversation

myaaaaaaaaa
Copy link
Contributor

@myaaaaaaaaa myaaaaaaaaa commented Feb 4, 2023

Changes threads to accept and process work in batches so that they synchronize less often.

As a bonus, this also hoists the loop for add_template_group_task() into the header file so that the loop body can be inlined.

This way, add_template_group_task() and its derivatives (namely, parallel foreach()) can have optimal performance, so there is no longer any need to do manual batching like in RaycastOcclusionCull::Scenario::_transform_vertices_thread():

void RaycastOcclusionCull::Scenario::_update_dirty_instance(int p_idx, RID *p_instances) {
OccluderInstance *occ_inst = instances.getptr(p_instances[p_idx]);
if (!occ_inst) {
return;
}
Occluder *occ = raycast_singleton->occluder_owner.get_or_null(occ_inst->occluder);
if (!occ) {
return;
}
int vertices_size = occ->vertices.size();
// Embree requires the last element to be readable by a 16-byte SSE load instruction, so we add padding to be safe.
occ_inst->xformed_vertices.resize(vertices_size + 1);
const Vector3 *read_ptr = occ->vertices.ptr();
Vector3 *write_ptr = occ_inst->xformed_vertices.ptr();
if (vertices_size > 1024) {
TransformThreadData td;
td.xform = occ_inst->xform;
td.read = read_ptr;
td.write = write_ptr;
td.vertex_count = vertices_size;
td.thread_count = WorkerThreadPool::get_singleton()->get_thread_count();
WorkerThreadPool::GroupID group_task = WorkerThreadPool::get_singleton()->add_template_group_task(this, &Scenario::_transform_vertices_thread, &td, td.thread_count, -1, true, SNAME("RaycastOcclusionCull"));
WorkerThreadPool::get_singleton()->wait_for_group_task_completion(group_task);
} else {
_transform_vertices_range(read_ptr, write_ptr, occ_inst->xform, 0, vertices_size);
}
occ_inst->indices.resize(occ->indices.size());
memcpy(occ_inst->indices.ptr(), occ->indices.ptr(), occ->indices.size() * sizeof(int32_t));
}
void RaycastOcclusionCull::Scenario::_transform_vertices_thread(uint32_t p_thread, TransformThreadData *p_data) {
uint32_t vertex_total = p_data->vertex_count;
uint32_t total_threads = p_data->thread_count;
uint32_t from = p_thread * vertex_total / total_threads;
uint32_t to = (p_thread + 1 == total_threads) ? vertex_total : ((p_thread + 1) * vertex_total / total_threads);
_transform_vertices_range(p_data->read, p_data->write, p_data->xform, from, to);
}
void RaycastOcclusionCull::Scenario::_transform_vertices_range(const Vector3 *p_read, Vector3 *p_write, const Transform3D &p_xform, int p_from, int p_to) {
for (int i = p_from; i < p_to; i++) {
p_write[i] = p_xform.xform(p_read[i]);
}
}

@myaaaaaaaaa myaaaaaaaaa requested review from a team as code owners February 4, 2023 16:11
@Chaosus Chaosus added this to the 4.x milestone Feb 6, 2023
@Chaosus Chaosus requested review from RandomShaper and reduz February 6, 2023 04:40
@myaaaaaaaaa myaaaaaaaaa changed the title Decrease granularity of WorkerThreadPool task processing Decrease overhead of WorkerThreadPool task processing Feb 16, 2023
Copy link
Member

@RandomShaper RandomShaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall.

core/object/worker_thread_pool.cpp Outdated Show resolved Hide resolved
core/object/worker_thread_pool.cpp Outdated Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants