Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nc_put_vars_double (or float) fails in parallel #448

Open
6 of 12 tasks
gsjaardema opened this issue Aug 1, 2017 · 7 comments
Open
6 of 12 tasks

nc_put_vars_double (or float) fails in parallel #448

gsjaardema opened this issue Aug 1, 2017 · 7 comments

Comments

@gsjaardema
Copy link
Contributor

Environment Information

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No
    • Not at this time

Summary of Issue

NOTE: my dvarput.c is modified from 4.5.1-devel as described in #447 -- the early return if nels==0 has been removed.

If nc_put_vars_double is called in parallel with stride != 1 and some processors have data to output and some do not and netcdf-4 (hdf5-based) output is being used in a collective mode, then the code will hang since only the processors with data to output will call down in to the H5Dwrite function. This function assumes that all processors will call whether they have data or not and uses a PMPI_Allreduce down in the call stack.

The issue arises in NCDEFAULT_put_vars. If stride is 1, then everything works ok since all processors call NC_put_vars at line 246 of dvarput.c (4.5.1-devel)

However, if the stride is not 1, then the code falls down to the odometer code below that. All processors call odom_init, but then the while is only called by the processors that have data (some lines deleted below):

  odom_init(&odom,rank,mystart,myedges,mystride);
  while(odom_more(&odom)) {
      int localstatus = NC_NOERR;
      localstatus = NC_put_vara(ncid,varid,odom.index,nc_sizevector1,memptr,memtype);
      memptr += memtypelen;
      odom_next(&odom);
   }

If netcdf-4 (hdf5-based) collective output is being done, then the code will hang down below H5Dwrite due to hdf5 library calling PMPI_Allreduce.

I don't have a suggested fix for this issue. I tried rewriting my code to use nc_put_vara_double instead, but that is not easily done for this particular call.

This does work if I use pnetcdf non-collective output and probably also netcdf-4 non-collective

@ckhroulev
Copy link
Contributor

ckhroulev commented Sep 30, 2017

I think I reported this bug back in 2012; please see the first message and the follow-up.

(I had to use the Wayback Machine to dig up the corresponding ticket (NCF-152)...)

A fix would have to use the method described in the HDF5 FAQ.

@edwardhartnett
Copy link
Contributor

@gsjaardema is this issue still active or should it be closed?

If it's active, what should we do to fix it?

@ckhroulev
Copy link
Contributor

@edwardhartnett After looking at the code in main (just now) I believe it is still active, but I did not check this today.

Note that my original bug report mentioned above includes a minimal example you can use to check this yourself and to create an automatic test.

To fix it you would need to add a block of code to NC4_put_vars that is analogous to this block in NC4_get_vars.

@ckhroulev
Copy link
Contributor

@edwardhartnett Hmm. I may have written the comment above too soon. Sorry. Let me run that minimal example myself -- I'll report when I actually feel like I have something to say.

@ckhroulev
Copy link
Contributor

@edwardhartnett All right. I re-built NetCDF 4.8.1 (with HDF5 1.12.0) with debugging symbols, built my minimal example (see the link above; it needs one more line (#include <netcdf.h>) near the top, but otherwise it's still in good shape), then ran mpiexec -n 2 a.out varm collective and attached gdb to both processes.

One of the two processes was waiting in an MPI_Barrier trying to close the file and the other one was blocked in MPI_Allreduce inside H5DWrite.

I was wrong about the way to fix this... but I wasn't too far off: you do need to create "empty" write requests to use with H5DWrite, similar to H5DRead in the block I mentioned above).

One fix I can imagine would alter this loop:

  • If the current rank has data to write, write it.
  • If the current rank is done writing data but there is a rank that is not, issue an "empty" write request.
  • If the current rank is done writing and all other ranks are done, exit the loop.

I realize that this happens at the dispatch level (i.e. this code does not know that we're writing to an HDF5 file), so it may be necessary to alter code for other backends to make sure they can handle "empty" requests.

I explained what is going on in my follow up e-mail from 10 year ago (some code locations changed and H5DRead is no longer an issue, but reasons of this failure did not change):

When the collective parallel access mode is selected all processors in
a communicator have to call H5Dread() (or H5Dwrite()) the same number
of times.

In nc_put_varm_*, NetCDF breaks data into contiguous segments that can
be written one at a time (see NCDEFAULT_put_varm(...) in
libdispatch/var.c, lines 479 and on). In some cases the number of
these segments varies from one processor to the next.

As a result as soon as one of the processors in a communicator is done
writing its data the program locks up, because now only a subset of
processors in this communicator are calling H5Dwrite(). (Unless all
processors have the same number of "data segments" to write, that is.)

@ckhroulev
Copy link
Contributor

I feel like an idiot (and I may need more coffee). I finally realized that I keep talking about an issue with ..._varm_... while this issue is about ..._vars_.... (However, the varm issue is real.)

Sorry about all this noise.

@edwardhartnett
Copy link
Contributor

OK, note that the varm functions are deprecated.

Basically they are so complicated that no one here even understands what they do or how they are supposed to work. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants