May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79

edhartnett · 2016-06-20T17:07:22Z

NetCDF uses size_t for sizes of things, like the size of data arrays,

PIO uses PIO_Offset for this purpose. PIO_Offset is defined in pio.h to be MPI_OFFSET, or MPI_LONG_LONG if MPI_OFFSET is not defined.

As far as I can tell from the MPICH mpi.h header file, it is using long long as MPI_OFFSET.

The size_t type is an unsigned long long on every platform I'm familiar with.

If ever the size of MPI_OFFSET and size_t are different, then the PIO code will fail quite dramatically. But that is unlikely. It's hard to imagine an MPI platform that is not going to use a 64-bit int of some kind. Similarly size_t is a 64-bit int in every modern C compiler.

However, the real problem is that size_t is unsigned, and MPI_OFFSET is signed. Can it be that someone could be tryping to set things to sizes larger than the maximum signed 64-bit integer? (2^^63 − 1 or 9,223,372,036,854,775,807). I don't know, that's a pretty big number.

An additional source of confusion is that both PIO_OFFSET and PIO_Offset are defined, as are MPI_OFFSET and MPI_Offset.

It is PIO_Offset that is actually used in the code, and that is defined to be MPI_Offset, which is defined in pio_internal.h to be a long long. So MPI_OFFSET is not actually used.

Some possible approaches:

Would be nice to remove PIO_OFFSET and MPI_Offset, leaving MPI_OFFSET (an MPI defined macro, in most cases) and PIO_Offset, which is used all over the code. We should probably set PIO_Offset to MPI_OFFSET, since that is what the user will expect.
Would be nice to confirm that there is even a possibility of an issue. Some things (such as the ndims) cannot be larger than other number NC_MAX_DIM, so by carefully looking at the netCDF api perhaps we can find that there is no possibility of even having numbers larger than 2^15-1.
Right now we are just overriding all the warnings by casting MPI_OFFSET pointers to size_t pointers. This is a bit sleazy - we should find a better way.
We could create (in all functions that currently cast) size_t arrays, copy values from MPI_OFFSET arrays, and then we don't have to cast. Probably this is the best solution, though somewhat laborious.
We could change PIO_OFFSET to size_t, but that would introduce these warnings in all existing user codes. :-(
We could check the size of MPI_OFFSET vs. size_t in the CMake step. If they are different, then we notify the user and fail to build. That way we can rely on safely casting between these types. (But if anyone ever does build a system where MPI_OFFSET and size_t are different sizes, PIO is hosed and needs major editing to work there.)

Any suggestions welcome. I wanted to capture this in an issue because I will probably forget all about it when I go on vacation next week. ;-)

jayeshkrishna · 2016-06-20T18:24:09Z

Another possibility is a safe typecast macro that checks corner cases for debug builds (and does a plain cast for non-debug builds).

edhartnett · 2016-08-01T04:10:41Z

I'm not a huge fan of macros with any kind of logic in them. That makes the C code work very poorly with a debugger.

jedwards4b · 2016-08-01T12:35:56Z

NetCDF uses size_t for sizes of things
How does this work for parallel netcdf? Is it using size_t and then
casting back to mpi_offset when interfacing mpi-io? I would say option 2
is best but worry about too many copies slowing things down.

On Sun, Jul 31, 2016 at 10:10 PM, Imperial Fleet Commander Tarth <
[email protected]> wrote:

I'm not a huge fan of macros with any kind of logic in them. That makes
the C code work very poorly with a debugger.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#79 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AF16GFirDb7CnIONZfdo6UldGKSSLFnbks5qbXHBgaJpZM4I56hh
.

Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO

edhartnett · 2016-08-01T15:40:49Z

I need to check carefully to see if it is ever even possible for this to cause a problem, before any action is taken. So I will do that. It may be just fine as it is, in which case I will document my results and move on. I just didn't want to forget this issue. ;-)

The parallel-netcdf library does not call the netCDF library directly, so does not face the same problem. They use MPI_Offset (instead of size_t) in their prototypes. I see at least one cases where they are casting a size_t to an MPI_Offset, but using a macro to check that it is not too large.

In terms of performance, it will only be a factor in the data reads and writes. Metadata operations all take place once, whereas data operations take place over and over. As far as I can see there is no way to have a valid start, count, or stride which is longer than NC_MAX_DIM, so we can just check at build time that this is OK and we're good. That leaves the metadata functions to check for potential issues.

jedwards4b · 2016-08-01T15:53:30Z

I meant to say netcdf4 hdf5.

On Mon, Aug 1, 2016 at 9:40 AM, Imperial Fleet Commander Tarth <
[email protected]> wrote:

I need to check carefully to see if it is ever even possible for this to
cause a problem, before any action is taken. So I will do that. It may be
just fine as it is, in which case I will document my results and move on. I
just didn't want to forget this issue. ;-)

The parallel-netcdf library does not call the netCDF library directly, so
does not face the same problem. They use MPI_Offset (instead of size_t) in
their prototypes. I see at least one cases where they are casting a size_t
to an MPI_Offset, but using a macro to check that it is not too large.

In terms of performance, it will only be a factor in the data reads and
writes. Metadata operations all take place once, whereas data operations
take place over and over. As far as I can see there is no way to have a
valid start, count, or stride which is longer than NC_MAX_DIM, so we can
just check at build time that this is OK and we're good. That leaves the
metadata functions to check for potential issues.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#79 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AF16GC0P2k7LCmj9WMB0wC0N2ugIAeqrks5qbhOCgaJpZM4I56hh
.

Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO

edhartnett · 2016-08-01T16:01:39Z

I will investigate that question.

edhartnett changed the title ~~not correct to cast MPI_Offset pointers to size_t pointers~~ May not be correct to cast MPI_Offset pointers to size_t pointers in some cases Aug 4, 2016

edhartnett self-assigned this Aug 4, 2016

edhartnett added the question label Oct 19, 2016

edhartnett closed this as completed Nov 29, 2016

alan-parry mentioned this issue Nov 27, 2019

Building PIO with IntelMPI #1613

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79

May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79

edhartnett commented Jun 20, 2016 •

edited

Loading

jayeshkrishna commented Jun 20, 2016

edhartnett commented Aug 1, 2016

jedwards4b commented Aug 1, 2016

edhartnett commented Aug 1, 2016

jedwards4b commented Aug 1, 2016

edhartnett commented Aug 1, 2016

May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79

May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79

Comments

edhartnett commented Jun 20, 2016 • edited Loading

jayeshkrishna commented Jun 20, 2016

edhartnett commented Aug 1, 2016

jedwards4b commented Aug 1, 2016

edhartnett commented Aug 1, 2016

jedwards4b commented Aug 1, 2016

edhartnett commented Aug 1, 2016

edhartnett commented Jun 20, 2016 •

edited

Loading