-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blosc HDF5 codec sometimes fails via nc_def_var_blosc() and nccopy #2458
Comments
It would be better if you attached the .nc files (if they are not too large). |
|
This very odd. Apparently the chunksize values are being passed in an |
I figured out the problem, but fixing that produced this error, which I have
|
Ok, this is weird. |
Further notes: I tried it using libblosc directly and it report that the data is in-compressible
I could get it to compress if I turned on shuffle. |
Thanks much for looking into this, Dennis. Always turning on Blosc-shuffle when any Blosc is required is an easy change to make in NCO (though ideally it would not be necessary). Above you mention "fixing some things". Have you made or do you foresee making any changes to netCDF to get Blosc codecs working better? I'm still not sure whether the problems I'm having with Blosc are due to NCO, netCDF, the Blosc filter itself, or some combination. |
I found and fixed a couple of errors in the HDF5 blosc filter. I will put up a PR |
Great. Looking forward to it. I will re-test for robustness across datasets and sub-compressors when it lands in main branch. |
I ran across this also: Blosc/c-blosc#307 |
Ahhh. Quite relevant. This suggests that the calling application skip invoking the Blosc codec for block sizes < 4 kb. Sound reasonable? |
re: Issue Unidata#2458 The above Github Issue revealed some bugs in the file netcdf-c/plugins/H5Zblosc.c. Fixed and added a testcase. Also discovered that the Blosc LZ sub-compressors do not work well with small datasets. Misc. Other Change(s): I noticed that the file "dap4_test/baselinethredds/GOES16_CONUS_20170821_020218_0.47_1km_33.3N_91.4W.nc4.thredds" is still causing tar errors during "make distcheck", so I made some changes to do rename at test-time.
Fixed by #2461 |
All of this was done with today's main trunk of netcdf-c on the latest MacOS. The input file
in.nc
and successful outputfoo.nc
files are attached as CDL text (because I could not figure out how to attach .nc files directly) in.txt and foo.txt, respectively.I get mixed results with the HDF5 Blosc filter in netCDF 4.9.X.
First,
nc_def_var_[deflate,zstd,bzip2]
work fine in the same framework.Yay! Blosc is more complex and I'm not sure if the problems are due to my
NCO invocation of the filter or possibly something else.
The symptoms are that Blosc often works fine for me with one or a few
variables on small test datasets, yet always fails on more complex datasets.
For example, this invokes the default Blosc subcompressor at level 1
using the
nc_def_var_blosc()
API:The output file (attached, along with input file) appears to be valid and the data are good. I will use this output file below to show inexplicable behavior with
nccopy
.Problem #1: I can pick any number of other variables in this same file that NCO fails to compress with Blosc via
nc_def_var_blosc()
. I do not expect Unidata to debug NCO. Below I'll show somenccopy
behavior that results in similar, though not identical failures. This first problem is more intended to demonstrate the capriciousness of the Blosc filter behavior. Blosc works for me onthree_dmn_rec_var
(above) why not ontime_bnds
?:Problem #2:
nccopy
does not work for me with the above_Filter
string. Given the complexity of the Blosc filter, I'm not sure it should work with that filter string. Any clarification would be helpful to my debugging this issue.Problem #3:
nccopy
also fails to copy the (apparently valid) output file from above. This time I include the logging info because it mentions some_Quantize
attributes that are not employed at all in this output file (perhaps that is a red herring, but I thought it might be relevant):That's enough to start this thread. My immediate goal is to help isolate whether the problem(s) are in my code, my understanding of how to invoke Blosc, and/or in the netCDF-C implementation. Any guidance welcome on any of these three problems. Thanks for reading this far!
The text was updated successfully, but these errors were encountered: