Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: uninitialized MPI_Status object #2332

Closed
mpichbot opened this issue Oct 14, 2016 · 5 comments
Closed

bug: uninitialized MPI_Status object #2332

mpichbot opened this issue Oct 14, 2016 · 5 comments

Comments

@mpichbot
Copy link

mpichbot commented Oct 14, 2016

Originally by jhammond on 2016-03-28 13:14:51 -0500


Message 1```

I notice an uninitialized MPI_Status object can make MPI_Get_count return a wrong result,
when the data amount in the MPI operation is zero (although I only checked MPI-IO).
Attached is a test program that uses MPI collective read where only root process has
non-zero length data to read. The expected result from MPI_Get_count is 0 for all
non-root process. To mimic an uninitialized MPI_Status object, I call memset to make
the object non-zero.

Here is the code fragment.

if (rank =# 0) len10;
else len = 0;

MPI_File_read_all(fh, buf, len, MPI_BYTE, &status);

MPI_Get_count(&status, MPI_BYTE, &get_size);

For process rank > 0, get_size may not be 0.

Message 2```

My test program can be compiled with command "mpicc get_count.c -o get_count"
and run "mpiexec -n 4 get_count". For correct (expected) result, nothing
will be printed on stdout. Otherwise, error messages will be printed.

My point is MPI_Get_count does not report correct result because
MPI collective I/O call fails to initialize the MPI_Status object.

I found that OpenMPI and earlier version of MPICH (2-1.2.1) can run
this test code correctly.

The test failed when I ran MPICH 3.1.4 and the current from GIT repo.

@mpichbot
Copy link
Author

Originally by jhammond on 2016-03-28 13:15:02 -0500


Attachment added: get_count.c (1.4 KiB)

@mpichbot mpichbot self-assigned this Oct 14, 2016
@mpichbot
Copy link
Author

Originally by robl on 2016-03-28 16:31:19 -0500


Wei-keng continues his investigation:

I found the cause for the MPI_Get_count warning message.

Valgrind considers the argument status passed to MPI_Get_count
has not been initialized. However, in PnetCDF, status is returned
from a call to either MPI_File_write_all or MPI_File_read_all,
which should initialize the status object "entirely". Entirely means
all members defined in the C struct MPI_Status. I guess MPICH
fails to do that and hence valgrid complains it.

typedef struct MPI_Status {
    int count_lo;
    int count_hi_and_cancelled;
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
} MPI_Status;

It appears count_lo and count_hi_and_cancelled are not set in MPI-IO calls
or MPI_Iprobe. I tested this on MPI_Iprobe and it has the same problem.

To verify, I added the following 2 lines before calling MPI-IO and MPI_Iprobe
and valgrind stops complaining.

    mpistatus.count_lo = 0;
    mpistatus.count_hi_and_cancelled = 0;


@mpichbot
Copy link
Author

Originally by balaji on 2016-03-29 12:45:36 -0500


This needs a similar patch as what Yanfei did for valgrind warnings.

http://git.mpich.org/mpich.git/commitdiff/b7db27e2b52595ce089f0cd81a1ed89e4e31056e

@raffenet raffenet assigned roblatham00 and unassigned mpichbot Oct 17, 2016
@wkliao
Copy link
Contributor

wkliao commented Nov 19, 2018

I am wondering the status of this issue.

@pavanbalaji pavanbalaji changed the title uninitialized MPI_Status object bug: uninitialized MPI_Status object Jan 17, 2019
@roblatham00 roblatham00 removed their assignment Jan 17, 2019
@raffenet
Copy link
Contributor

This was fixed in 4b7d553.

raffenet added a commit to raffenet/mpich that referenced this issue Nov 22, 2019
Until [4b7d553], ROMIO was failing to fill in the status object
for zero-byte operations. See pmodels#2332.

Co-authored-by: Jeff Hammond <[email protected]>
raffenet added a commit to raffenet/mpich that referenced this issue Nov 22, 2019
Until [4b7d553], ROMIO was failing to fill in the status object
for zero-byte operations. See pmodels#2332.

Co-authored-by: Jeff Hammond <[email protected]>
raffenet added a commit to raffenet/mpich that referenced this issue Nov 22, 2019
Until [4b7d553], ROMIO was failing to fill in the status object
for zero-byte operations. Add test to confirm the fix. See
pmodels#2332.

Co-authored-by: Jeff Hammond <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants