Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing #486

Draft
wants to merge 169 commits into
base: master
Choose a base branch
from
Draft

Multiprocessing #486

wants to merge 169 commits into from

Conversation

nkrah
Copy link
Collaborator

@nkrah nkrah commented Oct 9, 2024

Enable GATE 10 to split a simulation into multiple parallel processes.
THIS IS WORK IN PROGRESS

First implemented items:

  • split run timing intervals
  • adapt dynamic objects (run-based)
  • spawn processes via Pool
  • write output into a separate subfolder per process

Still missing:

  • merge actor output from different processes

output = se.run_engine()
return output

def run(self, start_new_process=False):
def generate_run_timing_interval_map(self, number_of_processes):
if number_of_processes % len(self.run_timing_intervals) != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? I thought we just divide ALL time_interval by the number_of_processes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but letting the user define the total number of processes rather than the process per run is more intuitive and will not require an API change if we implement a more advanced splitting scheme in the future. So I think it's better this way.

@nkrah
Copy link
Collaborator Author

nkrah commented Oct 11, 2024 via email

@nkrah
Copy link
Collaborator Author

nkrah commented Oct 11, 2024

I figured out a flexible mechanism to merge data back into one single actor output (if data is mergeable: true for images, not true yet for ROOT).
We will need a new type of method, common to all actors, namely FinalizeSimulation(), to be triggered from the Simulation after all processes have finished. Writing the combined output (from the processes) to disk will be done in FinalizeSimulation(). The EndOfSimulation(), where writing currently takes place, is called inside the process and therefore before combining the output. We can also add an option to not store intermediate, i.e. per process, output on disk if not needed. For example: images are accessible directly via memory and can be merged that way. No need to access data from disk.

Note: FinalizeSimulation() will not have access to engines because they do not exist any more outside of the subprocess.

@nkrah
Copy link
Collaborator Author

nkrah commented Oct 28, 2024

New:
The following actors now work in multiprocessing (local machine):

  • SimulationStatisticsActor: data is merged in memory and accessible after the simulation; written to disk if requested

  • Actors with ROOT output: root files (from subdirectories per process) are merged into new root file in main output folder structure. Event IDs are automatically incremented. RunIDs are recreated as per the original simulation.

Works with test019_phsp_actor -> created a new variant of the test.

Still need to create variants of other tests that use ROOT output to check.

@BishopWolf
Copy link

BishopWolf commented Nov 13, 2024

@nkrah I think all actors shall have atomic variables, this way all actors will be thread safe by default, watch this library https://pypi.org/project/atomicx/ . It already implemented the atomic doubles on my suggestion

from atomicx import AtomicFloat

# Create an atomic float with an initial value of 0.0
atom = AtomicFloat()
print(f"Initial Value: {atom.load()}")

# Perform atomic operations
atom.store(3.14)
value = atom.load()
print(f"Value: {value}")

# See docs for more operations

@nkrah
Copy link
Collaborator Author

nkrah commented Nov 13, 2024

@BishopWolf Thanks for the suggestion. I think atomic doubles will be useful for certain parts of the actors.

Bear in mind that this PR is about multiprocessing, i.e. running a (independent) simulation in a newly spawn process. There is no issue with shared memory handling in this case.

Concerning multithreading: We are actually using the multithreading architecture from Geant4 which means that not every part of a simulation runs in separate threads, only certain methods. Therefore, only certain shared data structures, e.g. images into which all threads write, need to be thread safe. Currently, there is no python-side function that accesses shared data on a per-thread basis, only C++ functions. In case this changes in the future, I think the package you suggest could be a good option.

@nkrah
Copy link
Collaborator Author

nkrah commented Nov 28, 2024

I will pick this up again once PR #599 is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants