-
Notifications
You must be signed in to change notification settings - Fork 15
Too many open files when opening large datasets #102
Comments
It seems that the file limit is global, I can open datasets consisting of X files separately, but not one after another. >>> from cate.core.ds import DATA_STORE_REGISTRY
>>> from cate.core.monitor import ConsoleMonitor
>>> import cate.ops as ops
>>> monitor = ConsoleMonitor()
>>> data_store = DATA_STORE_REGISTRY.get_data_store('esa_cci_odp')
>>> sm = ops.open_dataset('esacci.SOILMOISTURE.day.L3S.SSMV.multi-sensor.multi-platform.COMBINED.02-2.r1','2000-01-01','2001-12-31', sync=True, monitor=monitor)
>>> sst = ops.open_dataset('esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.1-1.r1','2000-01-01','2001-12-31', sync=True, monitor=monitor)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ccitbx/Development/cate-core/cate/ops/io.py", line 53, in open_dataset
File "/home/ccitbx/Development/cate-core/cate/core/ds.py", line 396, in open_dataset
File "/home/ccitbx/Development/cate-core/cate/ds/esa_cci_odp.py", line 510, in open_dataset
File "/home/ccitbx/Development/cate-core/cate/core/ds.py", line 413, in open_xarray_dataset
File "/home/ccitbx/miniconda3/envs/ect_env/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/backends/api.py", line 300, in open_mfdataset
File "/home/ccitbx/miniconda3/envs/ect_env/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/backends/api.py", line 300, in <listcomp>
File "/home/ccitbx/miniconda3/envs/ect_env/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/backends/api.py", line 210, in open_dataset
File "/home/ccitbx/miniconda3/envs/ect_env/lib/python3.5/site-packages/xarray-0.8.2-py3.5.egg/xarray/backends/netCDF4_.py", line 188, in __init__
File "netCDF4/_netCDF4.pyx", line 1811, in netCDF4._netCDF4.Dataset.__init__ (netCDF4/_netCDF4.c:12262)
OSError: Too many open files |
See also #118 |
@JanisGailis, can you please see if pydata/xarray#1198 fixes your problem above? |
@pwolfram Yes it does, great work! These are really good news for us. from cate.core.ds import DATA_STORE_REGISTRY
from cate.util.monitor import ConsoleMonitor
import cate.ops as ops
monitor = ConsoleMonitor()
sst = ops.open_dataset('esacci.SST.day.L4.SSTDepth.multi-sensor.multi-platform.OSTIA.1-1.r1','2000-01-01','2002-12-31',sync=True, monitor=monitor)
print(sst)
sm = ops.open_dataset('esacci.SOILMOISTURE.day.L3S.SSMV.multi-sensor.multi-platform.COMBINED.02-2.r1','2000-01-01','2001-12-31', sync=True, monitor=monitor)
sst = ops.open_dataset('esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.1-1.r1','2000-01-01','2001-12-31', sync=True, monitor=monitor)
print(sm)
print(sst)
sm = ops.open_dataset('esacci.SOILMOISTURE.day.L3S.SSMV.multi-sensor.multi-platform.COMBINED.02-2.r1','2000-01-01','2003-12-31', sync=True, monitor=monitor)
print(sm) yields:
All in all it opened ~4k files. Also, these are pretty 'difficult' datasets with high compression (SST uncompresses to 1GB per file), with a lot of CF decoding to do (NaN and stuff). Also, I have to mention that it seemed to me that open_mfdataset from the master branch works faster than the one in xarray 9.0.1. Maybe it's just an impression, didn't do any tests! When can we expect 9.0.2? :) All in all, great job! Thanks a lot! |
Good news. Thanks @pwolfram for fixing this and @JanisGailis for testing. |
Thanks obviously go to @shoyer too who provided clutch help! Can this issue be closed now @mzuehlke and @JanisGailis? |
@pwolfram The xarray issue, sure! This one I guess we'll close when we'll have bumped xarray version! |
Fixed upstream |
When opening large datasets that consist of 1000+ files using the 'esa_cci_odp' data store, the software crashes with an OSError.
There seems to be an open xarray issue on this.
pydata/xarray#463
The text was updated successfully, but these errors were encountered: