Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray attributes not present in DataSet. Coherency problem between DataSet and NetCDF file #5208

Open
oliviermarti opened this issue Apr 22, 2021 · 4 comments

Comments

@oliviermarti
Copy link

When I create a DataSet from DataArrays, attributes are lost.

When are create attributes in a DataSet, they are know shown by print (DataSet), but are written in the NetCDF file.

Below is python code showing the xarray behaviour in details.

My requests :

  • When creating a DataSet from DataArrays, DataArrays attributes should be incorporated in the DataSet. (maybe optional)
  • Attributes present in a DataSet should appear with a print (DataSet). Like for DataArrays.

Thanks,

Olivier

#!/usr/bin/env python
# coding: utf-8
import numpy  as np
import xarray as xr

# Creates DataArrays
nt = 4
time = np.arange (nt) * 86400.0
time = xr.DataArray (time, coords=[time,], dims=["time",])
aa   = time * 2.0

# Adding attributes to DataArrays
time.attrs['units'] = "second"
aa.attrs['units']   = "whatever"

# Attributes are visible in the DataArrays
print ('----------> time DataArray: ')
print (time)
print ('----------> aa DataArray : ' )
print (aa)
print ('----------> aa attributes : ')
print (aa.attrs )

# Creating a Dataset
ds = xr.Dataset(
    { "aa": (["time",], aa),  },
    coords={"time": (["time",], time), },   ) 

# Attributes are not visible in the Dataset
print ('----------> DataSet before setting attributes')
print (ds)
# My request #1 : attributes of the DataArrays should be added to the DataSet (may be optional)
print ('----------> Attributes of aa in DataSet : none')
print ( ds['aa'].attrs )
print ('----------> Attributes of aa outside DataSet : still here')
print ( aa.attrs )

print ('----------> Attributes are not written to the NetCDF file')
ds.to_netcdf ('sample1.nc')

# Adding attributes directly to the Dataset
ds['time'].attrs['units'] = "second"
ds['aa'].attrs['units']   = "whatever"

# Attributes are still not visible in the Dataset
print ('----------> DataSet after setting attributes : attributes not shown' )
print (ds)
# My request #2 : attributes added to the DataSet should be printed

print ('----------> But they are written in the NetCDF file')
ds.to_netcdf ('sample2.nc')
# MyRequest : coherency between the DataSet and the NetCDF file

# What if I read a NetCDF file
dt = xr.open_dataset ( 'sample2.nc')

print ('----------> DataSet read in a NetCDF file : Attributes are not shown')
print (dt)

print  ('----------> Attributes of aa in DataSet : present')
print ( dt['aa'].attrs )

@dcherian dcherian added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Apr 22, 2021
@dcherian
Copy link
Contributor

# Creating a Dataset
ds = xr.Dataset(
    { "aa": (["time",], aa),  },
    coords={"time": (["time",], time), },   ) 

This syntax recreates the DataArray aa. Since you don't provide attributes in (["time",], aa), the resulting DataArray has no attributes

Instead try

ds = xr.Dataset({"aa": aa})  # works when aa is already a DataArray

@dcherian dcherian added usage question and removed topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) labels Apr 22, 2021
@oliviermarti
Copy link
Author

dcherian, thank you for your help.

But a problem (feature ?) remains : when a Dataset is created, the command print(Dataset) doesn't show variables attributes. But attributes are shown when the Dataset is readed in a NetCDF file. Is there a reason for this difference a behaviour ?

Olivier

@shoyer
Copy link
Member

shoyer commented Apr 28, 2021

You can see variable attributes if you write dataset.info(). They aren't included in the default for print(dataset) because variable attributes can be very long for datasets with many variables.

@JavierRuano
Copy link

JavierRuano commented Apr 29, 2021

import numpy  as np
import xarray as xr

# Creates DataArrays
nt = 4
time = np.arange (nt) * 86400.0
time = xr.DataArray (time, coords=[time,], dims=["time",])
aa   = time * 2.0

# Adding attributes to DataArrays
time.attrs['units'] = "second"
aa.attrs['units']   = "whatever"

# Attributes are visible in the DataArrays
print ('----------> time DataArray: ')
print (time)
print ('----------> aa DataArray : ' )
print (aa)
print ('----------> aa attributes : ')
print (aa.attrs )

# Creating a Dataset
ds = xr.Dataset(
    { "aa": (["time",], aa),  },
    coords={"time": (["time",], time), },   ) 

# Attributes are not visible in the Dataset
print ('----------> DataSet before setting attributes')
print (ds)
# My request #1 : attributes of the DataArrays should be added to the DataSet (may be optional)
print ('----------> Attributes of aa in DataSet : none')
print ( ds['aa'].attrs )
print ('----------> Attributes of aa outside DataSet : still here')
print ( aa.attrs )

print ('----------> Attributes are not written to the NetCDF file')
ds.to_netcdf ('sample1.nc')

# Adding attributes directly to the Dataset


# Attributes are still not visible in the Dataset
print ('----------> DataSet after setting attributes : attributes not shown' )
ds=ds.assign_attrs({'Visible':'NotInvisibleMan'})
ds['time'].attrs['units']="second"
ds['aa'].attrs['units']="whatever"    
ds.to_netcdf('safeReturn.nc')
print(xr.open_dataset('safeReturn.nc').attrs)
print(xr.open_dataset('safeReturn.nc')['aa'].attrs)
print(xr.open_dataset('safeReturn.nc')['time'].attrs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants