-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tracking #105
Comments
I'm open to these optimizations, but I would recommend avoiding too much code complexity, especially as related to Thrust. I'd prefer to switch from Thrust to C++17 parallel algorithms, which will allow us to remove the |
The main performance improvement comes from these two patches: 168fd07 and a3dbe15 , which are not thrust specific, so this should be fine. Will submit a PR soon. For the parallel STL, I've tried using them with clang, which uses TBB for parallel execution and the performance seems pretty good (for the case that I've tested). It seems that nvcc also supports parallel stl with their nvc++ compiler? but not sure if that is in cudatoolkit or something else. |
Another interesting behavior I just found: Compiling with |
Yeah, I quit using it because it seemed like OMP was not getting much love from Thrust. Maybe file an issue on their repo and see if they have any insights? |
Seems like you've addressed most of this and more, and the lazy boolean has its own issue. Should we call this fixed? |
Yes I think we can call this fixed. |
As this library is intended to be fast, I guess we should probably open an issue to track its performance?
I did some microbenchmark using the
perfTest
binary and aunionPerfTest
code that I wrote to test lazy union (union 100 spheres with diameter=2.5 with varying separation distance, i.e. may or may not overlap). I've tested single thread using CPP backend, OpenMP backend and CUDA backend all on the same laptop with i5-8300H and GTX1050 Mobile. Here is my spreadsheet, and here is my branch for some optimizations that looks quite effective for CUDA and small meshes :). I will open a separate PR for the branch, after the build script PR is merged.From the results, we can see that:
VecDH
and provide an explicit copy method.The text was updated successfully, but these errors were encountered: