Skip to content
This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Not possible to download SOILMOISTURE dataset #546

Closed
JanisGailis opened this issue Mar 7, 2018 · 20 comments
Closed

Not possible to download SOILMOISTURE dataset #546

JanisGailis opened this issue Mar 7, 2018 · 20 comments
Assignees
Milestone

Comments

@JanisGailis
Copy link
Member

Trying to download the SOILMOISTURE dataset with CLI, API or GUI.

Expected behavior

Cate downloads the dataset and makes it local.

Actual behavior

Cate errors out.

Steps to reproduce the problem

In CLI:

cate ds copy esacci.SOILMOISTURE.day.L3S.SSMV.multi-sensor.multi-platform.COMBINED.03-2.r1 --name SOIL_2007 --time "2007-01-01,2007-12-31" --region "72,8,85,17" --vars "sm,sm_uncertainty"

Or use the GUI with the same constraints.

Specifications

cate --version
cate 2.0.0.dev2
@JanisGailis
Copy link
Member Author

Stack trace from the GUI:

Cate Desktop, version 2.0.0-dev.2

set_workspace_resource() call raised exception: "free variable 'remote_dataset' referenced before assignment in enclosing scope"

An error (code 20) occurred in Cate Core:

Traceback (most recent call last):
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 169, in decode_cf_datetime
    pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\pandas\core\tools\timedeltas.py", line 97, in to_timedelta
    box=box, errors=errors)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\pandas\core\tools\timedeltas.py", line 143, in _coerce_scalar_to_timedelta_type
    result = tslib.convert_to_timedelta64(r, unit)
  File "pandas/_libs/tslib.pyx", line 3092, in pandas._libs.tslib.convert_to_timedelta64
  File "pandas/_libs/tslib.pyx", line 3136, in pandas._libs.tslib.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 96, in pandas._libs.tslibs.timedeltas.cast_from_unit
OverflowError: int too big to convert

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 416, in __init__
    result = decode_cf_datetime(example_value, units, calendar)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 183, in decode_cf_datetime
    calendar)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 114, in _decode_datetime_with_netcdf4
    dates = np.asarray(nc4.num2date(num_dates, units, calendar))
  File "netCDF4\_netCDF4.pyx", line 5744, in netCDF4._netCDF4.num2date
  File "netcdftime\_netcdftime.pyx", line 887, in netcdftime._netcdftime.utime.num2date
  File "netcdftime\_netcdftime.pyx", line 274, in netcdftime._netcdftime.DateFromJulianDay
ValueError: Julian Day must be positive

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\cate\cate\ds\esa_cci_odp.py", line 729, in _make_local
    remote_dataset = xr.open_dataset(dataset_uri)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\backends\api.py", line 305, in open_dataset
    return maybe_decode_store(store, lock)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\backends\api.py", line 225, in maybe_decode_store
    drop_variables=drop_variables)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 1155, in decode_cf
    decode_coords, drop_variables=drop_variables)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 1088, in decode_cf_variables
    stack_char_dim=stack_char_dim)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 1023, in decode_cf_variable
    data = DecodedCFDatetimeArray(data, units, calendar)
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 425, in __init__
    raise ValueError(msg)
ValueError: unable to decode time units 'days since 1970-01-01 00:00:00 UTC' with the default calendar. Try opening your dataset with decode_times=False.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\cate\cate\util\web\jsonrpchandler.py", line 192, in send_service_method_result
    result = future.result()
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\concurrent\futures\_base.py", line 425, in result
    return self.__get_result()
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\janis\Miniconda3\envs\cate-env\lib\concurrent\futures\thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\cate\cate\util\web\jsonrpchandler.py", line 269, in call_service_method
    result = method(*method_params, monitor=monitor)
  File "c:\cate\cate\webapi\websocket.py", line 284, in set_workspace_resource
    monitor=monitor)
  File "c:\cate\cate\core\wsmanag.py", line 323, in set_workspace_resource
    workspace.execute_workflow(res_name=res_name, monitor=monitor)
  File "c:\cate\cate\core\workspace.py", line 599, in execute_workflow
    self.workflow.invoke_steps(steps, context=self._new_context(), monitor=monitor)
  File "c:\cate\cate\core\workflow.py", line 627, in invoke_steps
    steps[0].invoke(context=context, monitor=monitor)
  File "c:\cate\cate\core\workflow.py", line 318, in invoke
    self._invoke_impl(_new_context(context, step=self), monitor=monitor)
  File "c:\cate\cate\core\workflow.py", line 980, in _invoke_impl
    return_value = self._op(monitor=monitor, **input_values)
  File "c:\cate\cate\core\op.py", line 215, in __call__
    return_value = self._wrapped_op(**input_values)
  File "c:\cate\cate\ops\io.py", line 81, in open_dataset
    monitor=monitor)
  File "c:\cate\cate\core\ds.py", line 510, in open_dataset
    monitor=monitor)
  File "c:\cate\cate\ds\esa_cci_odp.py", line 897, in make_local
    raise e
  File "c:\cate\cate\ds\esa_cci_odp.py", line 890, in make_local
    self._make_local(local_ds, time_range, region, var_names, monitor=monitor)
  File "c:\cate\cate\ds\esa_cci_odp.py", line 762, in _make_local
    local_ds.meta_info['variables'] = [var_info for var_info in variables_info
  File "c:\cate\cate\ds\esa_cci_odp.py", line 764, in <listcomp>
    in remote_dataset.variables.keys() and
NameError: free variable 'remote_dataset' referenced before assignment in enclosing scope

@JanisGailis
Copy link
Member Author

Possibly related:
#444

@kbernat
Copy link
Collaborator

kbernat commented Mar 7, 2018

@JanisGailis
There is a data problem, it looks like t0 variable cause a problem.
Other thing is unclear Cate message, I will take care of it.

@kbernat
Copy link
Collaborator

kbernat commented Mar 7, 2018

Definitely it's not regression - previous product release (product version 02-2) works perfectly fine.

@kbernat
Copy link
Collaborator

kbernat commented Mar 7, 2018

@JanisGailis @forman
should we try to open dataset without time decoding? it's only option at the moment.

@JanisGailis
Copy link
Member Author

I prefer not to open it then, and remove that version from the white-list. It's not usable for the UC6 anyway then.

That, or, implementing something that lets the user add a workable time dimension to it after having read it manually.

@kbernat
Copy link
Collaborator

kbernat commented Mar 8, 2018

Data issue has been already reported in #326

@JanisGailis
Copy link
Member Author

But that means it is a regression after all. Because from that thread it is clear that we could open that dataset with the CLI command above.

I'm a bit confused now.

Can you reproduce it?

@forman
Copy link
Member

forman commented Mar 8, 2018

I can:

An error (code 20) occurred in Cate Core:

Traceback (most recent call last):
  File "D:\Miniconda3\envs\cate-env\lib\site-packages\xarray\conventions.py", line 155, in decode_cf_datetime
	pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
  File "D:\Miniconda3\envs\cate-env\lib\site-packages\pandas\core\tools\timedeltas.py", line 89, in to_timedelta
	box=box, errors=errors)
  File "D:\Miniconda3\envs\cate-env\lib\site-packages\pandas\core\tools\timedeltas.py", line 134, in _coerce_scalar_to_timedelta_type
	result = tslib.convert_to_timedelta64(r, unit)
  File "pandas/_libs/tslib.pyx", line 3526, in pandas._libs.tslib.convert_to_timedelta64 (pandas\_libs\tslib.c:62190)
  File "pandas/_libs/tslib.pyx", line 3570, in pandas._libs.tslib.convert_to_timedelta64 (pandas\_libs\tslib.c:61660)
  File "pandas/_libs/tslib.pyx", line 4028, in pandas._libs.tslib.cast_from_unit (pandas\_libs\tslib.c:68471)
OverflowError: int too big to convert

In GUI:

image

@JanisGailis
Copy link
Member Author

@forman, thanks for checking this too!

So it seems like a pandas regression now? Because we were able to open exactly that dataset before. Apparently.

@forman
Copy link
Member

forman commented Mar 8, 2018

Haven't looked into code so. Is it the missing time dim?
If so, I really suggest relaxing the contraint to have a time dim in normalize().
Many operations should work also without time. Visualisation does too.

@JanisGailis
Copy link
Member Author

I haven't looked into the dataset myself. But as far as I understand we do have a time dimension, but we can't decode it, where we used to be able to.

Without a time dimension, or with an undecoded (wow, is that a word?) one, we can't use it for UC6 anyway. So, 'letting it through' should only be done if we have a new operation that lets the user to 'fix' the dataset by constructing the time dimension for it.

@JanisGailis
Copy link
Member Author

JanisGailis commented Mar 8, 2018

I'd prefer downgrading temporarily pandas though and filing a bug with them, if this is a pandas issue.

@kbernat
Copy link
Collaborator

kbernat commented Mar 8, 2018

@forman it's known issue - wrong data in 't0' variable - data can't be decoded as a time representation.

@JanisGailis
Copy link
Member Author

@kbernat But what changed? We made a work-around for it by excluding t0 automatically, and we used that dataset for UC6, which included different operations relaying on having a decoded time dimension. It used to work, as you see in #326, and it doesn't now.

@kbernat
Copy link
Collaborator

kbernat commented Mar 8, 2018

@JanisGailis, datasource name has changed, hotfix use hardcoded names.. I will update names and later I'm going to work on more flexible solution
EDIT: it's not a name, but I will try to fix it tomorrow anyway.

@JanisGailis
Copy link
Member Author

But... I think the fix was to exclude t0. If you check the CLI call I used, I explicitly exclude it anyway. So, it should have worked.

kbernat pushed a commit that referenced this issue Mar 9, 2018
kbernat pushed a commit that referenced this issue Mar 9, 2018
@forman
Copy link
Member

forman commented Mar 9, 2018

@kbernat could you please perform all those dataset fixes in a dedicated function, e.g. fix_known_dataset_issues(ds) and then comment each fix with its issue number and other context info so we are able to trace back why we do what we do. Likely, the data providers will fix their dataset one day.

@VPriemer
Copy link

Current error in CLI (cate 2.0.0.dev3):

cate ds: error: Data source "esacci.SOILMOISTURE.day.L3S.SSMV.multi-sensor.multi-platform.COMBINED.03-2.r1": Copying remote data source failed: unable to decode time units 'days since 1970-01-01 00:00:00 UTC' with the default calendar. Try opening your dataset with decode_times=False.

I need help how to open the dataset with decode_times=FALSE.

@HelenClifton
Copy link

Confirmed is fixed in cate-2.0.0-dev.7 GUI

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants