Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF error on reading back NC_VLEN variable with fill value and chunking #2212

Open
krisfed opened this issue Feb 1, 2022 · 13 comments
Open

Comments

@krisfed
Copy link

krisfed commented Feb 1, 2022

We are using netcdf-c 4.8.1 and seeing an HDF error when using nc_get_var on an NC_VLEN variable that has (1) a fill value set, (2) not all elements filled, and (3) some chunking applied.

Not sure if this is expected behavior and the applied chunking or some other part of the process is incorrect (but then shouldn't it error out on writing, not reading?). Or does this look like a bug?

Here is some simplistic reproduction code. Here I have an NC_VLEN (of NC_DOUBLEs) variable with one dimension of size 4, and I am only writing the first 2 elements of it. There is fill value (set as {0, 101}) and chunking (set to 1).

#include <iostream>
#include "netcdf.h"

void checkErrorCode(int status, const char* message){
    if (status != NC_NOERR){
        std::cout << "Error code: " << status << " from " << message << std::endl;
        std::cout << nc_strerror(status) << std::endl << std::endl;
    }
}

int main(int argc, const char * argv[]) {
    
    // ================ WRITE ==================
    
    // Setup data
    const size_t DATA_LENGTH = 4;
    nc_vlen_t data[DATA_LENGTH];
    
    const int first_size = 2;
    double first[first_size] = {2, 5};
    data[0].p = first;
    data[0].len = first_size;
    
    const int second_size = 3;
    double second[second_size] = {88, 96, 42};
    data[1].p = second;
    data[1].len = second_size;

    // Open file
    int ncid;
    int retval;
    
    retval = nc_create("vlenFillValue.nc", NC_NETCDF4, &ncid);
    checkErrorCode(retval, "nc_create");
    
    // Define vlen type named RAGGED_DOUBLE
    nc_type vlen_typeID;
    retval = nc_def_vlen(ncid, "RAGGED_DOUBLE", NC_DOUBLE, &vlen_typeID);
    checkErrorCode(retval, "nc_def_vlen");
    
    // Define dimension
    int dimid;
    retval = nc_def_dim(ncid, "xdim", DATA_LENGTH, &dimid);
    checkErrorCode(retval, "nc_def_dim");
    
    // Define vlen variable
    int varid;
    retval = nc_def_var(ncid, "var", vlen_typeID, 1, &dimid, &varid);
    checkErrorCode(retval, "nc_def_var");
    
    // Define chunking
    const size_t chunk = 1; //error also with 3
    retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, &chunk);
    checkErrorCode(retval, "nc_def_var_chunking");
    
    // Define fill value
    nc_vlen_t fillValue;
    double fv[2] = {0, 101};
    fillValue.p = fv;
    fillValue.len = 2;
    retval = nc_def_var_fill(ncid, varid, NC_FILL, &fillValue);
    checkErrorCode(retval, "nc_def_var_fill");
    
    // Write vlen variable
    size_t start = 0;
    size_t count = 2;
    retval = nc_put_vara(ncid, varid, &start, &count, data);
    checkErrorCode(retval, "nc_put_vara");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (1)");
    
    
    // ================ READ ==================
    
    // open file
    retval = nc_open("vlenFillValue.nc", NC_NOWRITE, &ncid);
    checkErrorCode(retval, "nc_open");
    
    nc_vlen_t* data_read = new nc_vlen_t[DATA_LENGTH];
    
    retval = nc_get_var(ncid, varid, data_read);
    checkErrorCode(retval, "nc_get_var");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (2)");
    
    return retval;
}

Here is the output (this was run on macOS 11.2.3, but we see the issue on other OS's too):

$ ./a.out 
Error code: -101 from nc_get_var
NetCDF: HDF error

I see that ncdump also errors out on the produced file:

$ ncdump vlenFillValue.nc 
netcdf vlenFillValue {
types:
  double(*) RAGGED_DOUBLE ;
dimensions:
	xdim = 4 ;
variables:
	RAGGED_DOUBLE var(xdim) ;
		RAGGED_DOUBLE var:_FillValue = {0, 101} ;
data:

NetCDF: HDF error

@DennisHeimbigner
Copy link
Collaborator

You might look at the conversation associated with this: #2179
In particular, your example looks like the known bug #1.

@krisfed
Copy link
Author

krisfed commented Feb 2, 2022

Thank you Dennis! It does look potentially related to the mentioned known bug about NC_VLEN and fill values in #2179, although I am not sure what kind of failures were observed there (errors/crash, on reading/writing, etc). Another issue possibly related to the same NC_VLEN/FillValue problem is how the crash in #2181 persists even when the data was "zeroed out" if a fill value was used.

But it is good to know that this looks like a bug, and we will keep monitoring it.

@DennisHeimbigner
Copy link
Collaborator

My current hypothesis is that HDF5 is not doing a deep copy in some place
involving fill values. So, this leads to freeing data that is shared between the client
and the HDF5 library and that causes some kind of failure. But I cannot prove it.

@krisfed
Copy link
Author

krisfed commented Feb 24, 2022

It sounds a little bit similar to #1985 too...

@krisfed
Copy link
Author

krisfed commented Aug 25, 2022

Hi Dennis! Just wanted to check - is this still an active issue? I know there were a bunch of NC_VLEN fixes in v4.9.0, but I think you mentioned above that this specific scenario might not be addressed...

@DennisHeimbigner
Copy link
Collaborator

There has been no progress on this.

@krisfed
Copy link
Author

krisfed commented Feb 7, 2023

Hi Dennis! Just checking in - any recent work on this?

@krisfed
Copy link
Author

krisfed commented Aug 22, 2023

Hi,

Sorry, just checking in again - are there any plans to possibly address this (admittedly low-frequency) issue in the future?

@DennisHeimbigner
Copy link
Collaborator

No progress. And it is unlikely to get solved since I believe it is an HDF5 error.

@krisfed
Copy link
Author

krisfed commented Aug 23, 2023

Thank you for a prompt response, @DennisHeimbigner ! I have reached out to The HDF Group about this issue in case it is something that can be addressed from their side. If needed, hopefully you would be able to help with narrowing down how to reproduce the issue with HDF5 code, outside of netCDF? I imagine here it would be especially relevant to know how exactly netCDF is using HDF5 functionality to store NC_VLEN data and to access it back.

@DennisHeimbigner
Copy link
Collaborator

I tried to do that when I first encountered the error -- even debugging down into the HDF5 code,
but had no success. My guess is that the HDF5 group will say they need a pure HDF5 program showing
the problem, and I was never able to produce such a program (I do not even know for sure that the problem
is in HDF5 rather than netcdf-c).

@krisfed
Copy link
Author

krisfed commented Aug 25, 2023

Thanks, Dennis! Ah, it's too bad that creating HDF5 stand alone code turned out not as straightforward as simply converting the above reproduction steps to calls to the underlying HDF5 APIs... We will see what The HDF Group says.

@DennisHeimbigner
Copy link
Collaborator

Frankly, I hope that the HDF5 people can show that the problem is in netcdf-c, because then I can eventually find a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants