Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError attempting to open an NCO-generated netcdf file #442

Open
durack1 opened this issue Feb 16, 2022 · 4 comments
Open

UnicodeDecodeError attempting to open an NCO-generated netcdf file #442

durack1 opened this issue Feb 16, 2022 · 4 comments

Comments

@durack1
Copy link
Member

durack1 commented Feb 16, 2022

Describe the bug
cdms 3.1.5 fails with a UnicodeDecodeError when attempting to open an NCO-generated netcdf file

To Reproduce
Steps to reproduce the behavior:

In [2]: import cdms2 as cdm
In [3]: f = '/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc'
In [4]: fH = cdm.open(f)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 11: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/tmp/ipykernel_70886/1508437202.py", line 1, in <module>
    fH = cdm.open(f)
  File "/home/durack1/anaconda3/envs/cdms315spy515cart020/lib/python3.9/site-packages/cdms2/dataset.py", line 523, in openDataset
    file = CdmsFile(path, mode, hostObj)
  File "/home/durack1/anaconda3/envs/cdms315spy515cart020/lib/python3.9/site-packages/cdms2/dataset.py", line 1295, in __init__
    self._file_ = Cdunif.CdunifFile(path, mode)
SystemError: <built-in function CdunifFile> returned a result with an error set

Expected behavior
I would have expected cdms could open the file, ncdump -h works fine

Screenshots or traceback
reproducible steps above should provide enough info

Desktop (please complete the following information):

  • OS: RHEL 7.9
@durack1
Copy link
Member Author

durack1 commented Feb 16, 2022

The same issue occurs for the /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc file

@durack1
Copy link
Member Author

durack1 commented Feb 16, 2022

And /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Emon/wtd/gn/v20181012/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_185005-185912.nc, if I find more, I'll just edit this comment.

Also
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/fx/areacellr/gn/v20181012/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn_18500701-18591231.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/Emon/wtd/gn/v20181012/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn_185007-185912.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/fx/areacellr/gn/v20181012/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Eday/rivo/gn/v20180705/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_19500101-19991231.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Eday/rivo/gn/v20180705/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_19500101-19991231.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Emon/wtd/gn/v20180705/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_185001-199912.nc
/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/fx/areacellr/gn/v20180705/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn.nc

Plus more, I gave up adding to this list, should be enough to debug the same behaviour

@jypeter
Copy link
Member

jypeter commented Feb 21, 2022

This reminds me of #432 ...

@durack1
Copy link
Member Author

durack1 commented Feb 22, 2022

@jypeter I think you're right, I wonder if a case of adding string.encode('utf-8') might be a way of getting around the problem? Although I think this is buried in the c code that calls the netcdf-c, somewhere in Src/Cdunifmodule.c, not sure there is an equivalent .encode in c?

Just dropping an ncdump of one of the problem files below, I can't see the character issue after a quick scan:

ncdump -h /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc
netcdf rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231 {
dimensions:
	lat = 360 ;
	lon = 720 ;
	time = UNLIMITED ; // (3532 currently)
	axis_nbounds = 2 ;
variables:
	float lat(lat) ;
		lat:axis = "Y" ;
		lat:standard_name = "latitude" ;
		lat:long_name = "Latitude" ;
		lat:units = "degrees_north" ;
	float lon(lon) ;
		lon:axis = "X" ;
		lon:standard_name = "longitude" ;
		lon:long_name = "Longitude" ;
		lon:units = "degrees_east" ;
	double time(time) ;
		time:axis = "T" ;
		time:standard_name = "time" ;
		time:long_name = "Time axis" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1850-01-01 00:00:00" ;
		time:time_origin = "1850-01-01 00:00:00" ;
		time:bounds = "time_bounds" ;
	double time_bounds(time, axis_nbounds) ;
	float rivo(time, lat, lon) ;
		rivo:long_name = "River Discharge" ;
		rivo:units = "m3 s-1" ;
		rivo:online_operation = "average" ;
		rivo:cell_methods = "area: mean where land time: mean" ;
		rivo:interval_operation = "1800 s" ;
		rivo:interval_write = "1 d" ;
		rivo:_FillValue = 1.e+20f ;
		rivo:missing_value = 1.e+20f ;
		rivo:coordinates = "" ;
		rivo:standard_name = "water_flux_to_downstream" ;
		rivo:description = "water_flux_from_upstream" ;
		rivo:history = "none" ;
		rivo:cell_measures = "area: areacellr" ;

// global attributes:
		:name = "/scratch/work/voldoire/outputs/CMIP6/DECK/CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2/18500501/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_%start_date%-%end_date%" ;
		:Conventions = "CF-1.7 CMIP-6.2" ;
		:creation_date = "2018-07-23T14:10:00Z" ;
		:description = "DECK: abrupt-4xCO2" ;
		:title = "CNRM-CM6-1 model output prepared for CMIP6 / CMIP abrupt-4xCO2" ;
		:activity_id = "CMIP" ;
		:contact = "[email protected]" ;
		:data_specs_version = "01.00.21" ;
		:dr2xml_version = "1.13" ;
		:experiment_id = "abrupt-4xCO2" ;
		:experiment = "abrupt quadrupling of CO2" ;
		:external_variables = "areacellr" ;
		:forcing_index = 2 ;
		:frequency = "day" ;
		:further_info_url = "https://furtherinfo.es-doc.org/CMIP6.CNRM-CERFACS.CNRM-CM6-1.abrupt-4xCO2.none.r3i1p1f2" ;
		:grid = "regular 1/2? lat-lon grid" ;
		:grid_label = "gn" ;
		:nominal_resolution = "50 km" ;
		:initialization_index = 1 ;
		:institution_id = "CNRM-CERFACS" ;
		:institution = "CNRM (Centre National de Recherches Meteorologiques, Toulouse 31057, France), CERFACS (Centre Europeen de Recherche et de Formation Avancee en Calcul Scientifique, Toulouse 31057, France)" ;
		:license = "CMIP6 model data produced by CNRM-CERFACS is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at http://www.umr-cnrm.fr/cmip6/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
		:mip_era = "CMIP6" ;
		:parent_experiment_id = "piControl" ;
		:parent_mip_era = "CMIP6" ;
		:parent_activity_id = "CMIP" ;
		:parent_source_id = "CNRM-CM6-1" ;
		:parent_time_units = "days since 1850-01-01 00:00:00" ;
		:parent_variant_label = "r3i1p1f2" ;
		:branch_method = "standard" ;
		:branch_time_in_parent = 0. ;
		:branch_time_in_child = 0. ;
		:physics_index = 1 ;
		:product = "model-output" ;
		:realization_index = 3 ;
		:realm = "land" ;
		:references = "http://www.umr-cnrm.fr/cmip6/references" ;
		:source = "CNRM-CM6-1 (2017):  aerosol: prescribed monthly fields computed by TACTIC_v2 scheme atmos: Arpege 6.3 (T127; Gaussian Reduced with 24572 grid points in total distributed over 128 latitude circles (with 256 grid points per latitude circle between 30degN and 30degS reducing to 20 grid points per latitude circle at 88.9degN and 88.9degS); 91 levels; top level 78.4 km) atmosChem: OZL_v2 land: Surfex 8.0c ocean: Nemo 3.6 (eORCA1, tripolar primarily 1deg; 362 x 294 longitude/latitude; 75 levels; top grid cell 0-1 m) seaIce: Gelato 6.1" ;
		:source_id = "CNRM-CM6-1" ;
		:source_type = "AOGCM" ;
		:sub_experiment_id = "none" ;
		:sub_experiment = "none" ;
		:table_id = "Eday" ;
		:variable_id = "rivo" ;
		:variant_label = "r3i1p1f2" ;
		:EXPID = "CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2" ;
		:CMIP6_CV_version = "cv=6.2.3.0-7-g2019642" ;
		:dr2xml_md5sum = "92ddb3d0d8ce79f498d792fc8e559dcf" ;
		:xios_commit = "1442-shuffle" ;
		:nemo_gelato_commit = "49095b3accd5d4c_6524fe19b00467a" ;
		:arpege_minor_version = "6.3.2" ;
		:tracking_id = "hdl:21.14100/21b7ff0f-63f7-4702-a965-fa94b6fa6ad1" ;
		:history = "Tue Jul 24 19:23:12 2018: ncatted -O -a tracking_id,global,m,c,hdl:21.14100/21b7ff0f-63f7-4702-a965-fa94b6fa6ad1 /scratch/work/voldoire/outputs/CMIP6/DECK/CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2/assembled/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc\nnone" ;
		:NCO = "\"4.5.5\"" ;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants