-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
null pointer in MinMax #1745
Comments
I don't see any errors before this one. |
This is on the /projects directory of Theta, so it is a Lustre file system. |
@philip-davis Is this happening only with BP4 engine? Or SST as well? If it's only BP4 then probably either @pnorbert or @lwan86 is the best person to ask. |
But even if it's only using BP4 engine, I would still like to ask some further questions just for them to consider. Were you doing staging through files, or purely files? Could you help verify if it happens only in one of the two cases? Or does it happen in both cases? |
@philip-davis thanks for reporting this. If it helps, the error states that the variable is null (false in our API). It'd be good practice to check the variable status after a call to |
@JasonRuonanWang Sorry to assign you incorrectly. Only BP4. I do not see this if I do post-processing, i.e. let the simulation run to completion, then run the analysis. @williamfgc Unfortunately, I can't inspect the bp file with bpls (I don't get any output when I try, other than |
@philip-davis If the bp4 output is not too large, do you mind sharing it so I can take a look why bpls crashes? |
The data is too big (1.3TB). Would the metadata be useful? It's only about 15MB total. |
Yes, the metadata would be useful.
Also, can you tell us the setup? # of processors, settings.json, timing
(adios2.xml)
Thanks
…On Mon, Sep 16, 2019 at 7:19 PM Philip Davis ***@***.***> wrote:
The data is too big (1.3TB). Would the metadata be useful? It's only about
15MB total.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1745?email_source=notifications&email_token=AAYYYLM2CHQKMFWCOP7JQKDQKAIBZA5CNFSM4IXFXBG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD62ZPSQ#issuecomment-531994570>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAYYYLICYME7REDPYNTVHJ3QKAIBZANCNFSM4IXFXBGQ>
.
|
Here is a tar of all the relevant configuration/metadata files. This is 4096 simulation processes on 128 nodes and 128 analysis processes on 32 nodes. The simulation domain is 2048^3.
|
512 substreams |
I have now seen the same thing in Cori with 256 writers and 8 readers. |
Hi @philip-davis, please open a separate issue as this seems to be related to how the MPI changes affect bpls on Cori. We'll probably need a lot more info to understand and debug this.
Good question, the API object has the |
Just for a note: the BP file is fine, so the writer is okay. Some unexpected condition arises in the reader. |
@philip-davis Can you please try #1772 on Theta or Cori? I wonder if this fix for not handling timeouts correctly already fixes this bug. I saw the same error on non-first steps on my VM and this PR fixes that. But my job on Theta is not going anywhere, so I cannot test whether it fixes your issue for first-step. #1773 should fix this issue problem once and for all, but it is a complicated implementation to fix a problem that I imagined and may or may not be an actual problem. I am curious if #1772 is enough in itself to make your issues go away. Thanks. |
@pnorbert On Theta, I am seeing:
This goes on for millions of lines. I see this with a non-streaming run as well. I am going to try cori as well, but I am having trouble building this branch on cori for some reason. |
@pnorbert @philip-davis please try on your end if the current release branch solves the issue. Thanks! |
This issue was caused by multiple bugs. After #1773 the gray-scott example works fine in BP4 streaming mode. I tested it on Cori. Job on Theta has been in the queue for long. |
Yes this is working with #1773. |
I'm doing a large streaming file run with gray-scott coupled to pdf_calc on Theta, and I am sometimes (not always) seeing pdf_calc crash with many of the following strangely-formatted errors:
This is using BP4 with the following configuration (I have combined SST and BP4 parameters for reuse:
This occurs on the first timestep of the reader. The writer runs to completion. For reference, here is some of the pdf_calc code that precedes the MinMax call:
https://github.com/pnorbert/adiosvm/blob/75bf69b13638f7c67981f43d269d2a19e269da20/Tutorial/gray-scott/analysis/pdf_calc.cpp#L182-L216
The text was updated successfully, but these errors were encountered: