-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BackTransformDiagnostics: position offset does not have consistent number of macroparticle #3389
Comments
I still see this issue running WarpX on Summit, even with the new fix:
I rebuilt with the latest branch as of today, 9/23/2022. The input file is attached |
Thanks for reporting this @RTSandberg. I just tried the above script with a recent version of WarpX (git hash
Nevermind, I was actually able to reproduce this bug when running on 6 GPUs. |
I took at closer look at this issue (by adding lots of
here: https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/WarpXOpenPMD.cpp#L985
here: https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/WarpXOpenPMD.cpp#L992
|
Let's try |
Thanks @ax3l ! |
Note that this error still occurs even when turning load-balancing by commenting this line:
|
It seems that when I do:
then the problem disappears. However, I am a bit confused: why does turning the fields off result in a fix on the particle metadata? |
Thank you for this hint and test. |
@RevathiJambunathan That's a good suggestion, but it does not seem that there is any issue with the |
I have another input deck that sees this issue running on 1 node, 6 GPUs on Summit:
|
On one of the production simulations, I just noticed that using:
allows to circumvent the problem. @RTSandberg Are you able to reproduce this observation with the above input deck? |
Thanks Remi! I'll take this on in the 2nd half of the week with the inputs set reported by Ryan: Update: could not dig into this during the week - will pick up after my vacations. |
@ax3l Yes, it seems that the following work-around did work in my case. Here is the corresponding input script: So in summary, there seems to be 2 potential work-arounds:
|
The HDF5 workaround works for me. At one time I thought I had a case where the split diagnostic workaround didn't work, but at the present it works on every case I have tried lately |
I haven't observed this issue when using HDF5. However, with ADIOS the split diagnostic workaround is not guaranteed; I still get incomplete particle metadata sometimes. from openpmd_viewer.addons import LpaDiagnostics
btd = LpaDiagnostics('diags/diag_btd')
btd.get_particle(iteration=0, var_list=['id','x','y','z','ux','uy','uz']) results in
Here are the batch and input script for running on 1 node, 6 GPUs on Summit: It is very interesting to note that this issue does not arise if the field BTD is removed, for example by changing the line I observed both behaviors on Perlmutter as well |
that is indeed quite weird! |
I am running a even smaller version of the above inputs on my laptop: inputs.txt cmake -S . -B build -DWarpX_PSATD=ON -DWarpX_DIMS=RZ
cmake --build build -j 3 This finishes after 512 steps and does, e.g. for lab station 1, not write all the meta-data for particles (e.g., charge & mass) properly, even though it does shut down cleanly. That means |
ah I see. thanks. I can trace this with the input file u shared and see whats happening with |
@ax3l I used your input file and ran this locally. At 512th timestep, the snapshots are not full and hence
So I ran the simulation upto 2850 timesteps and I am able to visualize the data using openpmd-viewer |
I think that's an orthogonal issue to fix - I think we should close out BTD steps (finalize particle meta-data & zero-out partly filled fields) on simulation end. With our restart strategy, we would copy and keep the partly filled lab stations in checkpoints open (un-finalized) in case we want to continue. Will continue focusing on the above issue for now posted by Ryan. |
@RTSandberg I cannot reproduce the issue with the latest openPMD-viewer + openPMD-api and running on CPU. (Works w/o an error.) Will try again on GPU on Summit... |
Can reproduce on Summit 👍
|
Since this shows up in the first lab station, we can simplify to:
and run faster: inputs.txt The problem disappears if I remove Seems important to run it from 6 MPI ranks (and/or with CUDA) to trigger with the inputs. |
I am generally able to use the particle BTD in RZ with ADIOS as the openPMD backend without issue as long as I don't also get field output, i.e. set |
Met with @RemiLehe and @n01r to discuss the issue further. We realized that it is not the metadata that is necessarily inconsistent, but that the particle arrays themselves are corrupted. For example, if we run input and batch scripts: |
Yes, that is correct. I forgot to post this here and only mentioned it on Slack: That means, meta-data that is set in that last append of particles is correct but the variable (data) is too small. |
I used this input file, and I saw same num of particles is correct: ../bpls -l diag_btd/openpmd_000000.bp ../bpls -A -l diag_btd/openpmd_000000.bp |grep shape |
Testing a fix in #3657 |
Idea for a fast test case of the issue (if we need it again):
|
Please note that ADIOS2 master now has support for Joined Arrays (in BP4 only for now), where you don't need to specify Shape on the writing side (nor Starts), and the reader will put all blocks into a Global Array. This is basically Local Arrays where the blocks are joined in one dimension together to form a Global Array, and is a good match for this use case. |
Thanks a lot Norbert, I'll check this out next week! |
When running this input file:
inputs.txt
it seems that the metadata of the BackTransformed diagnostics is inconsistent. For instance, running the following Python code:
results in the following error:
The text was updated successfully, but these errors were encountered: