-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May not be correct to cast MPI_Offset pointers to size_t pointers in some cases #79
Comments
Another possibility is a safe typecast macro that checks corner cases for debug builds (and does a plain cast for non-debug builds). |
I'm not a huge fan of macros with any kind of logic in them. That makes the C code work very poorly with a debugger. |
On Sun, Jul 31, 2016 at 10:10 PM, Imperial Fleet Commander Tarth <
Jim Edwards CESM Software Engineer |
I need to check carefully to see if it is ever even possible for this to cause a problem, before any action is taken. So I will do that. It may be just fine as it is, in which case I will document my results and move on. I just didn't want to forget this issue. ;-) The parallel-netcdf library does not call the netCDF library directly, so does not face the same problem. They use MPI_Offset (instead of size_t) in their prototypes. I see at least one cases where they are casting a size_t to an MPI_Offset, but using a macro to check that it is not too large. In terms of performance, it will only be a factor in the data reads and writes. Metadata operations all take place once, whereas data operations take place over and over. As far as I can see there is no way to have a valid start, count, or stride which is longer than NC_MAX_DIM, so we can just check at build time that this is OK and we're good. That leaves the metadata functions to check for potential issues. |
I meant to say netcdf4 hdf5. On Mon, Aug 1, 2016 at 9:40 AM, Imperial Fleet Commander Tarth <
Jim Edwards CESM Software Engineer |
I will investigate that question. |
NetCDF uses size_t for sizes of things, like the size of data arrays,
PIO uses PIO_Offset for this purpose. PIO_Offset is defined in pio.h to be MPI_OFFSET, or MPI_LONG_LONG if MPI_OFFSET is not defined.
As far as I can tell from the MPICH mpi.h header file, it is using long long as MPI_OFFSET.
The size_t type is an unsigned long long on every platform I'm familiar with.
If ever the size of MPI_OFFSET and size_t are different, then the PIO code will fail quite dramatically. But that is unlikely. It's hard to imagine an MPI platform that is not going to use a 64-bit int of some kind. Similarly size_t is a 64-bit int in every modern C compiler.
However, the real problem is that size_t is unsigned, and MPI_OFFSET is signed. Can it be that someone could be tryping to set things to sizes larger than the maximum signed 64-bit integer? (2^^63 − 1 or 9,223,372,036,854,775,807). I don't know, that's a pretty big number.
An additional source of confusion is that both PIO_OFFSET and PIO_Offset are defined, as are MPI_OFFSET and MPI_Offset.
It is PIO_Offset that is actually used in the code, and that is defined to be MPI_Offset, which is defined in pio_internal.h to be a long long. So MPI_OFFSET is not actually used.
Some possible approaches:
Any suggestions welcome. I wanted to capture this in an issue because I will probably forget all about it when I go on vacation next week. ;-)
The text was updated successfully, but these errors were encountered: