Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Comparison of Optical Photon Simulation on CPU vs GPU #32

Open
plexoos opened this issue Sep 30, 2024 · 34 comments
Open

Performance Comparison of Optical Photon Simulation on CPU vs GPU #32

plexoos opened this issue Sep 30, 2024 · 34 comments
Assignees

Comments

@plexoos
Copy link
Member

plexoos commented Sep 30, 2024

We need to benchmark the performance of optical photon simulations by comparing the computation times on CPU and GPU architectures. The key metric will be the simulation time as a function of the number of generated photons.

@buddhasystem
Copy link

CPU and GPU, both under Opticks, or in different frameworks?

@ggalgoczi
Copy link
Collaborator

There are essentially 3 options to run the optical photon simulation. We need to have the first two at least:
-- Running pure Geant4
-- Running Opticks on GPU
-- Running Opticks on CPU

@plexoos
Copy link
Member Author

plexoos commented Sep 30, 2024

Yes, we should focus on leveraging the existing Opticks code for GPU first. I did not get the impression that Mitsuba is easier to work with.

@buddhasystem
Copy link

Yes, we should focus on leveraging the existing Opticks code for GPU first. I did not get the impression that Mitsuba is easier to work with.

It's a bit different i.e. we don't have a working interface from G4 to Mitsuba yet, so it's not about the ease of use but even just feasibility. Hope to get to that in time. As to the previous comment, yes, plain G4 vs Opticks seems to be the most useful case to look at.

@plexoos
Copy link
Member Author

plexoos commented Oct 4, 2024

@ggalgoczi Do you have any tips on how to enable timing measurements in Opticks?

@ggalgoczi
Copy link
Collaborator

For cuda kernels nsight seems to be used by Opticks. Specifically bin/nsight.bash would produce a detailed report that includes the execution time of each GPU kernel and other system-wide performance metrics.

Also nsys is used:

nsys profile -o noprefetch --stats=true ./add_cuda

I did not test it yet. Should we take a look at it next week?

Also I found OpticksProfile class and it seems to profile time and memory usage but it is unclear to me how at this point.

For the total simulation time including overhead I would do something like this:

    auto start = std::chrono::high_resolution_clock::now();
    // Opticks stuff
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;

@plexoos
Copy link
Member Author

plexoos commented Oct 4, 2024

I've looked at these too and came to the same conclusion. OptickProfile and stime seem to be in the "dead" code and yet I can't see how they were used to get any useful information.

I did not test it yet. Should we take a look at it next week?

Yes, could you please take a look?
Do we need to install nsight-systems-cli to get nsys? Here is my attempt to install it in the container BNLNPPS/esi-shell#121 I just followed the instruction at https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#package-manager-installation

@ggalgoczi ggalgoczi self-assigned this Oct 16, 2024
@ggalgoczi
Copy link
Collaborator

In order to perform best profiling the following things are needed, @plexoos could you assist with these? I remember you dug into PTX stuff.

-- set the OptixModuleCompileOptions to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE

If that doesn’t fix it, try setting the environment variable OPTIX_FORCE_DEPRECATED_LAUNCHER only while profiling

From: https://forums.developer.nvidia.com/t/need-help-profiling-an-optix-application/265266/5

@ggalgoczi
Copy link
Collaborator

Additionally to nsight systems we need to install nsight compute too.

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Okay, it appears they have a GUI nsys-ui to visualize profiling results. Trying to install and run it...

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Argh... Unfortunately, it fails in the container:

$ nsys-ui 
Warning: Failed to get OpenGL version. OpenGL version 2.0 or higher is required.
OpenGL version is too low (0). Falling back to Mesa software rendering.
/opt/nvidia/nsight-systems/2024.6.1/host-linux-x64/CrashReporter: error while loading shared libraries: libGLX.so.0: cannot open shared object file: No such file or directory

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

In order to perform best profiling the following things are needed, @plexoos could you assist with these? I remember you dug into PTX stuff.

-- set the OptixModuleCompileOptions to OPTIX_COMPILE_DEBUG_LEVEL_MODERATE

If that doesn’t fix it, try setting the environment variable OPTIX_FORCE_DEPRECATED_LAUNCHER only while profiling

From: https://forums.developer.nvidia.com/t/need-help-profiling-an-optix-application/265266/5

Yes, I think I know where it should be set... But what exactly did not work for you? What have you tried to run?

@ggalgoczi
Copy link
Collaborator

I did not try anything yet on the docker. Could you install nsight compute and nsight systems there?

The GUI that you mentioned do not need to be there. That I tried and used on my own PC.

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Were you able to install nsight compute on your PC?

@ggalgoczi
Copy link
Collaborator

That one I did not try yet. I installed nsight systems. For nsight compute I downloaded a .run file for ubuntu but did not run yet.

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Yay! I think I figured out the dependencies for both the nsight-systems and nsight-compute. Both tools and their UIs can be used from the container:

esi-shell -t debug nsys-ui -- -e HOME=$HOME -w $HOME -e DISPLAY=$DISPLAY --net=host

@ggalgoczi
Copy link
Collaborator

When I try to open esi-shell with the command provided, the shell opens with the GUI. However once I close the GUI, the shell automatically closes. How can I keep the image open?

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Just don't specify the command and the shell will be interactive, e.g.

esi-shell -t debug -- -e HOME=$HOME -w $HOME -e DISPLAY=$DISPLAY --net=host

In this case you would need to type the command, of course.

@ggalgoczi
Copy link
Collaborator

Thanks, still getting errors. Once I open the image and run:

cd $HOME
cmake -S esi-g4ox -B build
cmake --build build

I get:

CMake Error at /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Failed to find XercesC (missing: XercesC_VERSION) (Required is at least
  version "3.2.4")
Call Stack (most recent call first):
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/FindXercesC.cmake:112 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/cmake-3.27.7-kd6xihnhlzyfwafeilow6t2mvolyryor/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /opt/spack/opt/spack/linux-ubuntu22.04-x86_64_v3/gcc-11.4.0/geant4-11.1.2-djnxepkcbdv5nknlkhjcn2fs7uqkn5zy/lib/cmake/Geant4/Geant4Config.cmake:311 (find_dependency)
  CMakeLists.txt:31 (find_package)

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Does this work?

esi-shell -t debug 'cd $HOME && cmake -S esi-g4ox -B build && cmake --build build' -- -e HOME=$HOME -w $HOME

@ggalgoczi
Copy link
Collaborator

Unfortunately no, I sent the trace in e-mail, do not want to put it here, too long.

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Works for me. I assume you create a new build directory and use the current HEAD in esi-g4ox 666cc0f

@ggalgoczi
Copy link
Collaborator

I repeat the steps I did, maybe I did something wrong, let me know.

I ssh into my account on npps0 and just call the command you shared:

esi-shell -t debug 'cd $HOME && cmake -S esi-g4ox -B build && cmake --build build' -- -e HOME=$HOME -w $HOME

Do I also have to pull the newest github directory to my npps0 home folder? Or what step did I miss?

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Try to delete the existing 'build' directory or use a different name in the above command. Also, I don't know if you have any local changes in your esi-g4ox directory, I am just assuming that your esi-g4ox is at the current HEAD of the main branch.

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Maybe it is not clear from the command but your entire HOME is mounted in the container.

@ggalgoczi
Copy link
Collaborator

Tried deleting build and pull newest github repo. Still does not work. Thanks for the idea, I guess when I did some install in my $HOME a while back it interferes with the image. No idea, will resort not mounting it and using:

esi-shell -t debug -- -e DISPLAY=$DISPLAY --net=host

@plexoos
Copy link
Member Author

plexoos commented Oct 21, 2024

Here is a completely isolated test:

esi-shell -t debug 'cd /tmp && git clone https://github.com/BNLNPPS/esi-g4ox.git && cmake -S esi-g4ox -B build && cmake --build build'

It must work for everyone with an account on npps0 😕

Also, you can make sure your esi-shell executable matches mine:

[dmitri@npps0:~] 
$ esi-shell --version
1.0.0-583deac
[dmitri@npps0:~] 
$ which esi-shell
/usr/local/bin/esi-shell

@ggalgoczi
Copy link
Collaborator

The isolated test works!

Also I get the same:

[galgoczi@npps0 ~]$ esi-shell --version
1.0.0-583deac
[galgoczi@npps0 ~]$ which esi-shell
/usr/local/bin/esi-shell

@ggalgoczi
Copy link
Collaborator

Managed to get Opticks running on CPU instead of GPU. The magic trick is to call

G4CXOpticks::NoGPU = true;

I put this here for later use when performing testing.

@ggalgoczi
Copy link
Collaborator

It seems G4CXOpticks photon simulation can not run on the CPU, since in
G4CXOpticks::simulate

we have

if(NoGPU) return ;

what confused me was that the geometry translation is done even in this case :)

@ggalgoczi
Copy link
Collaborator

ggalgoczi commented Nov 24, 2024

Very useful info from Simon:

Comparing A:Opticks and B:Geant4 simulations when using input photons (i.e. the exact same CPU generated photons in both A and B) is a powerful way to find geometry and other issues.

The so-called "record" array records every step point of the photon history. This detailed step history can also be recorded from the Geant4 side using the U4Recorder, allowing recording of the photon histories from Geant4 within Opticks SEvt format NumPy arrays.

Statistical comparisons between the A and B NumPy arrays is the first thing to do for validation.

Going further it is possible to arrange for Geant4 to provide the same set of precooked randoms that curand generates
(by replacing the Geant4 "engine" see u4/U4Random.hh)
I call that aligned running : it means scatters, reflections, transmissions etc.. all happen at the same places between the simulations. So the resulting arrays can be compared directly, unclouded by statistics.

https://groups.io/g/opticks/message/542

@plexoos
Copy link
Member Author

plexoos commented Dec 21, 2024

@ggalgoczi Did you see these calls to SProf?

https://github.com/BNLNPPS/esi-opticks/blob/83742b32d42e6f25447374ae49331213b47ae3e7/qudarap/QSim.cc#L352

The output goes into ALL0/run_meta.txt from where it should be possible to extract the deltas for each event.

@ggalgoczi
Copy link
Collaborator

That is very interesting! An example output I get when running the InputPhotonList (main branch):

SEvt__BeginOfRun:1734917594862311,12545804,381984
SEvt__beginOfEvent_FIRST_ECPU:1734917594862342,12545804,381984
SEvt__setIndex_B000:1734917594862368,12545804,381984
SEvt__beginOfEvent_FIRST_EGPU:1734917594866044,12546484,382776
SEvt__setIndex_A000:1734917594866064,12546484,382776
SEvt__endIndex_A000:1734917594876498,12550252,392772
SEvt__EndOfRun:1734917594877216,12550252,392772

@ggalgoczi
Copy link
Collaborator

First comparison on a real life application:
-- simulating 50k 5 GeV electrons in pfrich_min_added_parameters.gdml geometry
-- the electrons created more than 10 million photons
-- the number of hits in G4 and Opticks match
-- metric: strictly the optical photon stacking and simulation time for Geant4 and for Opticks the GPU time needed for the "simulate" call

Observed performance:
-- Geant4 single thread: ~ 60 s
-- Geant4 40 threads: ~ 3 s
-- Opticks: ~ 0.4 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants