-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add utility to perform timings and some performance improvements #1237
Conversation
This is convenient but also means it is the same as HighResWallClockTimer
Thanks Kris. This looks super interesting. Unfortuantely, I am currently busy relocating to Leuven and moving to a new house. @markus-jehl could you have a first look? |
WARNING: This branch will be subject to rebases etc and force-pushed occasionally to keep history clean. |
TimedObject is not thread-safe, and timing results were incorrect. Currently just remove the calls. work-around UCL#1238
The loop to construct xstart/end etc is now multi-threaded (although a little bit uglier!). Testing shows a speed-up of about 2-3. Using too many threads is counterproductive, so I limited to 8 (not necessarily optimal!).
Timers were stopped too early due to nested calls. This is now checked by asserts (by adding HighResWallClockTimer), allowing me to catch these problems.
Example timings that I'm currently getting on my desktop (AMD Ryzen 9 5900 12-Core Processor, 3001 Mhz, 12 Core(s), 24 Logical Processor(s); 32GB RAM; GEForce RTX 3070; WSL2 with gcc 11.4.0 and nvcc 12.2) for a similar set-up as @gschramm https://arxiv.org/pdf/2212.12519v1.pdf, i.e. DMI 4-ring span=1 but only 8 views, 215x215x71 image
with first column CPU and 2nd wall-clock time, both in ms. For comparison with all 272 views
Currently, #1236 doesn't make a lot of difference (PP_forward_file_first is slower, PP_back_file_first is faster. No idea why). Template files (had to rename as .txt for GitHub upload) Running OSEM is still slow with subsets due to GPU projector set-up. That needs some thought. |
One factor slowing down the parallelproj projections is the call to In any case, loops in |
Very interesting comparison. Thanks a lot Kris! How do I interpret |
I don't remember 100% why we added that. The projectors themselves shouldn't care about the FOV. |
sorry. (note that it's the |
One good think to add would be an OSEM update to the timings. This should be done, but it might be different from what @gschramm reports, as we use the "additive term" normally (I guess I could run without). |
Hi Kris, Georg |
Running without
This is of course always going to be tricky. (Note sure if people ever report a "minimum wall clock" time to avoid this). |
Here are the timings on my machine (Intel Xeon CPU E5-2699 [email protected]; 18 cores; 256GB RAM; NVIDIA Quadro M4000; WSL2 with clang 14.0.0-1ubuntu1 and nvcc V12.0.140) for different templates. Unfortunately I still haven't found a solution for the extremely slow caching of the system matrix that happens in the first projection (most likely caused by WSL2/Docker memory allocation), and don't have a GPU on the native Ubuntu system to compare timings there. Interestingly, though, it doesn't appear to be as bad for the DMI geometry! DMI4_8v:
DMI4:
NeuroLF:
|
thanks @markus-jehl. Seems that my system is about twice as far as yours, also for parallelproj (could be that its performance is dominated by the CPU as well). Quite weird about your NeuroLF PMRT "first run" timings. Maybe you could compare memory usage. Aside from timing other things, I think we'll need some client code to be able to make some nice plots for different systems etc, as this will soon get unmanageable. |
also added extra options for friendlier usage
This seems clean enough to merge now. We can always add some more later. I've added a log-likelihood run (set-up: currently essentially computation of sensitivity; "grad_no_sens" essentially the MLEM computation |
Allow standardised timings. Could do with other tests of course. @gschramm @markus-jehl @NicoleJurjew want to have a look?