-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scr: add init/finalize and start/complete output #4233
Conversation
Do I understand it correctly that this PR and the PR at ADIOS2 would both be necessary? So we would still write our IO/checkpointing logic in terms of openPMD/ADIOS2, but use SCR-enabled ADIOS2? And these changes inside PIConGPU are then necessary to have SCR running in the background? I can have a look at the best place to put these calls inside the openPMD plugin, for the rest of PIConGPU René will be the better person to say this. |
See adammoody#1 (Note: I'm on holidays for the coming two weeks) |
Thanks @franzpoeschel ! Correct. Both PRs are necessary, and this PR has a dependency on having an SCR-enabled ADIOS2 to work properly. The existing SCR API is such that some calls go in ADIOS2 and some go in openPMD/PIConGPU. The SCR calls in PIConGPU are there mainly to distinguish which datasets are checkpoints and which datasets are not checkpoints. The SCR calls that are added to ADIOS provide it with the file system path to open each physical file that ADIOS creates to store the dataset. This might be a temporary path like a node-local SSD. SCR will then migrate the dataset from that temporary path to the parallel file system. |
Thanks for your PR. I changed the destination branch to |
These are the places where we call picongpu/include/pmacc/Environment.hpp Lines 416 to 438 in 9b83a0e
To mark output of our plugins this should be the right place. We have different output "plugins" all these will be called from this lines:
|
What's the status of this, @franzpoeschel, @adammoody, @psychocoderHPC? |
I like the way how SCR describs which data must be stored in a checkpoint. Never the less there is no time from our side to try SCR :-( |
@franzpoeschel, @psychocoderHPC
This is getting ahead of things, since it's not yet apparent that SCR would be needed/useful. However, I had started the integration work just in case in might be. I wanted to open this PR, since I thought it'd be easier to discuss things than an email thread.
First, I've got a PR open that adds SCR calls into ADIOS2. That includes the commands to download and build ADIOS2 with SCR. I need to work with the ADIOS team to iron things out there, but it does work well enough that we can do testing.
ornladios/ADIOS2#3294
I think there are also spots in PiConGPU where we'll want to add some SCR calls.
First, we'll want to call SCR_Init and SCR_Finalize. These calls typically are placed near MPI_Init() and MPI_Finalize(), respectively. Really, I often end up placing SCR_Init after any application option processing that the user might specify. It's common to configure SCR according to user settings, like the output directory for checkpoints or to allow the user to name a particular checkpoint to read during restart. Most of those settings need to be applied before one calls SCR_Init. For now, I've just added it near the MPI_Init().
Second, we need to add SCR_Start_output and SCR_Complete_output calls. These two calls should bookend the checkpoint logic such that all files belonging to a given checkpoint are created and written in between those two calls. The SCR_Start_output call also takes a name that is meaningful to the application/user, and some bit flags that the application would set.
All four of these calls are implicitly collective over MPI_COMM_WORLD.
Are there good spots that you'd recommend for adding these?
I've found a few potential spots that look interesting, but I don't know whether I'm headed down the right path.
If you have time, would you please take a look and let me know?
Thanks!