Fixes OS error arising from too many files open #1198

pwolfram · 2017-01-10T18:37:41Z

Previously, DataStore did not judiciously close files, resulting in opening a large number of files that
could result in an OSError related to too many files being open. This merge provides a solution for the netCDF, scipy, and h5netcdf backends.

pwolfram · 2017-01-10T18:41:36Z

The intent of this PR is to address (or at least partially address) the following issues:

shoyer

It's very nice to see some progress on this!

shoyer · 2017-01-10T21:55:56Z

xarray/backends/netCDF4_.py

+    # netCDF4 only allows closing the root group
+    while ds.parent is not None:
+        ds = ds.parent
+    if ds.isopen():


Maybe put this while loop in a helper function, something like _find_root?

Sure, sounds good.

shoyer · 2017-01-10T21:57:00Z

xarray/backends/api.py

@@ -249,6 +248,8 @@ def maybe_decode_store(store, lock=False):
        else:
            ds2 = ds

+        store.close()


Probably guard this behind an option?

shoyer · 2017-01-10T21:59:27Z

xarray/backends/netCDF4_.py

        self._filename = filename
        self._mode = 'a' if mode == 'w' else mode
+        self._opener = functools.partial(_open_netcdf4_group, filename,


can we reuse the same partial created above for opener, maybe just with a different value for the mode argument? (would be nice to have less code duplication)

Changed to self._opener = functools.partial(opener, mode=self._mode). Is this the best way to do this from previous usage of

self._opener = opener(mode=self._mode) mode=self._mode, group=group, clobber=clobber, diskless=diskless, persist=persist, format=format)

?

shoyer · 2017-01-10T22:04:39Z

xarray/test/test_backends.py

@@ -492,6 +492,21 @@ def create_tmp_file(suffix='.nc', allow_cleanup_failure=False):
            if not allow_cleanup_failure:
                raise

+@contextlib.contextmanager
+def create_tmp_files(nfiles, suffix='.nc', allow_cleanup_failure=False):


Any reason why you can't reuse create_tmp_file internally here?

This would be most straightforwardly done with contextlib.ExitStack, though we would need a backport to Python 2.7:
https://docs.python.org/3.6/library/contextlib.html#contextlib.ExitStack

Thanks @shoyer, this makes the code cleaner. However, it may be operating slower because there could be more contexts in play. It does remove redundant code, which is always a plus.

pwolfram · 2017-01-11T18:44:51Z

@shoyer, all the checks "pass" but there are still errors in the "allowed" list. If you get a change could you please provide me some perspective on whether these are errors on my end or not? I'm not exactly sure how to interpret them.

Once I know I have correctness in this code I plan to fix the inlines you graciously highlighted above. I think we are getting close here, assuming that I have enough testing to demonstrate we have accurately fixed the too many open file issue. Any additional ideas you have for tests would be really helpful too.

shoyer · 2017-01-11T18:57:58Z

@pwolfram the allowed failures are pre-existing, not related to this change.

pwolfram · 2017-01-11T19:01:42Z

Thanks @shoyer, Does that mean if the checks pass the code is at least minimally correct in terms of not breaking previous design choices? E.g., does this imply that we are ok except for cleanup / implementation details on this PR?

shoyer · 2017-01-11T21:16:35Z

Does that mean if the checks pass the code is at least minimally correct in terms of not breaking previous design choices? E.g., does this imply that we are ok except for cleanup / implementation details on this PR?

If the checks pass, it means that it doesn't directly break anything that we have tests for. Which should cover most functionality. However, we'll still need to be careful not to introduce performance regressions -- we don't have any automated performance tests yet.

pwolfram · 2017-01-12T04:54:27Z

@shoyer, I just realized this might conflict with #1087. Do you foresee this causing problems and what order do you plan to merge this PR and #1087 (which obviously predates this one...)? We are running into the snag with #463 in our analysis and my personal preference would be to get some type of solution into place sooner than later. Thanks for considering this request.

Also, I'm not sure exactly the best way to test performance either. Could we potentially use something like the "toy" test cases for this purpose? Ideally we would have a test case with O(100) files to gain a clearer picture of the performance cost of this PR.

Please let me know what you want me to do with this PR-- should I clean it up in anticipation of a merge or just wait for now to see if there are extra things that need fixed via additional testing? Note I have the full scipy, h5netcdf and pynio implementations that can also be reviewed because they weren't available when you did your review yesterday.

shoyer · 2017-01-12T05:12:53Z

Don't worry about #1087 -- I can rebase it.

…

On Wed, Jan 11, 2017 at 8:54 PM Phillip Wolfram ***@***.***> wrote: @shoyer <https://github.com/shoyer>, I just realized this might conflict with #1087 <#1087>. Do you foresee this causing problems and what order do you plan to merge this PR and #1087 <#1087> (which obviously predates this one...)? We are running into the snag with #463 <#463> in our analysis and my personal preference would be to get some type of solution into place sooner than later. Thanks for considering this request. Also, I'm not sure exactly the best way to test performance either. Could we potentially use something like the "toy" test cases for this purpose? Ideally we would have a test case with O(100) files to gain a clearer picture of the performance cost of this PR. Please let me know what you want me to do with this PR-- should I clean it up in anticipation of a merge or just wait for now to see if there are extra things that need fixed via additional testing? Note I have the full scipy, h5netcdf and pynio implementations that can also be reviewed because they weren't available when you did your review yesterday. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1198 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1o6yrNYYulbAMkLTHaVKLQA0u3Kjks5rRbIEgaJpZM4LfwBQ> .

shoyer · 2017-01-12T19:08:33Z

This should be totally fine without performance or compatibility concerns as long as we set autoclose=False by the default.

In the long term, it would be nice to handle autoclosing automatically (invoking it when the number of open files exceeds some limit), but we should probably be a little more clever for that.

pwolfram · 2017-01-13T00:03:24Z

Thanks @shoyer. This makes sense. I think the path forward on the next round of edits should include making sure existing tests using open_mfdataset use both options to autoclose. If we do this we could future-proof ourselves against loss due to accidental breaking of this new functionality and avoid potentially contaminating existing workflows via performance concerns.

Documentation is also obviously required.

FYI as a heads up, I probably won't be able to get to this mid-week at the earliest but it appears we are close to a viable solution.

PeterDSteinberg · 2017-01-25T22:22:52Z

I appreciate your work on this too-many-files-open error - I think your fixes will add a lot of value to the NetCDF multi-file functionality. In this notebook using K-Means clustering on multi-file NetCDF data sets I have repeatedly experienced the too-many-open files error, even with attempts to adjust via ulimit. I can test out the notebook again as this PR is finalized.

pwolfram · 2017-01-25T22:32:17Z

@PeterDSteinberg, did this PR fix the issue for you? I obviously need to update it but just wanted to confirm that the current branch resolved the too-many-open files error issue. Also, do you have any idea of the performance impact of these changes I'm proposing?

pwolfram · 2017-01-31T19:59:30Z

@shoyer and @PeterDSteinberg I've updated this PR to reflect requested changes.

pwolfram · 2017-02-02T17:42:41Z

There are still a few more issues that need ironed out. I'll let you know when I've resolved them.

vnoel · 2017-02-03T15:27:04Z

I'm just chiming in to signify my interest in seeing this issue solved. I have just hit "OSError: Too many open files". The data itself is not even huge, but it's scattered across many files and it's a PITA to revert to manual concatenation -- I've grown used to dask doing the work for me ;-)

shoyer · 2017-02-04T00:36:15Z

@pwolfram this looks pretty close to me now -- let me know when it's ready for review.

pwolfram · 2017-02-05T05:20:22Z

@shoyer, the pushed code represents my progress. The initial PR had a bug-- essentially a calculation couldn't be performed following the load. This fixes that bug and provides a test to ensure that this doesn't happen. However, I'm having trouble with h5netcdf, which I'm not very familiar with compared to netcdf. This represents my current progress, I just need some more time (or even inspiration from you) to sort out this last key issue...

I'm getting the following error:

================================================================================================================== FAILURES ==================================================================================================================
___________________________________________________________________________________________ OpenMFDatasetTest.test_4_open_large_num_files_h5netcdf ___________________________________________________________________________________________

self = <xarray.tests.test_backends.OpenMFDatasetTest testMethod=test_4_open_large_num_files_h5netcdf>

    @requires_dask
    @requires_h5netcdf
    def test_4_open_large_num_files_h5netcdf(self):
>       self.validate_open_mfdataset_large_num_files(engine=['h5netcdf'])

xarray/tests/test_backends.py:1040: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xarray/tests/test_backends.py:1018: in validate_open_mfdataset_large_num_files
    self.assertClose(ds.foo.sum().values, np.sum(randdata))
xarray/core/dataarray.py:400: in values
    return self.variable.values
xarray/core/variable.py:306: in values
    return _as_array_or_item(self._data)
xarray/core/variable.py:182: in _as_array_or_item
    data = np.asarray(data)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/numpy/core/numeric.py:482: in asarray
    return array(a, dtype, copy=False, order=order)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/array/core.py:1025: in __array__
    x = self.compute()
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/base.py:79: in compute
    return compute(self, **kwargs)[0]
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/base.py:179: in compute
    results = get(dsk, keys, **kwargs)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:537: in get_sync
    raise_on_exception=True, **kwargs)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:500: in get_async
    fire_task()
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:476: in fire_task
    callback=queue.put)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:525: in apply_sync
    res = func(*args, **kwds)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:268: in execute_task
    result = _execute_task(task, data)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:248: in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:248: in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:245: in _execute_task
    return [_execute_task(a, cache) for a in arg]
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:245: in <listcomp>
    return [_execute_task(a, cache) for a in arg]
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:249: in _execute_task
    return func(*args2)
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/array/core.py:52: in getarray
    c = a[b]
xarray/core/indexing.py:401: in __getitem__
    return type(self)(self.array[key])
xarray/core/indexing.py:376: in __getitem__
    return type(self)(self.array, self._updated_key(key))
xarray/core/indexing.py:354: in _updated_key
    for size, k in zip(self.array.shape, self.key):
xarray/core/indexing.py:364: in shape
    for size, k in zip(self.array.shape, self.key):
xarray/core/utils.py:414: in shape
    return self.array.shape
xarray/backends/netCDF4_.py:37: in __getattr__
    return getattr(self.datastore.ds.variables[self.var], attr)
../../anaconda/envs/test_env_xarray35/lib/python3.5/contextlib.py:66: in __exit__
    next(self.gen)
xarray/backends/h5netcdf_.py:105: in ensure_open
    self.close()
xarray/backends/h5netcdf_.py:190: in close
    _close_ds(self.ds)
xarray/backends/h5netcdf_.py:70: in _close_ds
    find_root(ds).close()
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/h5netcdf/core.py:458: in close
    self._h5file.close()
../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/h5py/_hl/files.py:302: in close
    self.id.close()
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2840)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2798)
    ???
h5py/h5f.pyx:282: in h5py.h5f.FileID.close (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/h5f.c:3905)
    ???
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2840)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2798)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: dictionary changed size during iteration

h5py/_objects.pyx:119: RuntimeError
============================================================================================ 1 failed, 1415 passed, 95 skipped in 116.54 seconds =============================================================================================
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x10f16e598>
Traceback (most recent call last):
  File "/Users/pwolfram/anaconda/envs/test_env_xarray35/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

shoyer · 2017-02-05T08:14:51Z

I'll take a look tomorrow. Getting all these backends to behave correctly and consistently is a constant battle.

…

On Sat, Feb 4, 2017 at 9:20 PM Phillip Wolfram ***@***.***> wrote: @shoyer <https://github.com/shoyer>, the pushed code represents my progress. The initial PR had a bug-- essentially a calculation couldn't be performed following the load. This fixes that bug and provides a test to ensure that this doesn't happen. However, I'm having trouble with h5netcdf, which I'm not very familiar with compared to netcdf. This represents my current progress, I just need some more time (or even inspiration from you) to sort out this last key issue... I'm getting the following error: ================================================================================================================== FAILURES ================================================================================================================== ___________________________________________________________________________________________ OpenMFDatasetTest.test_4_open_large_num_files_h5netcdf ___________________________________________________________________________________________ self = <xarray.tests.test_backends.OpenMFDatasetTest testMethod=test_4_open_large_num_files_h5netcdf> @requires_dask @requires_h5netcdf def test_4_open_large_num_files_h5netcdf(self):> self.validate_open_mfdataset_large_num_files(engine=['h5netcdf']) xarray/tests/test_backends.py:1040: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xarray/tests/test_backends.py:1018: in validate_open_mfdataset_large_num_files self.assertClose(ds.foo.sum().values, np.sum(randdata)) xarray/core/dataarray.py:400: in values return self.variable.values xarray/core/variable.py:306: in values return _as_array_or_item(self._data) xarray/core/variable.py:182: in _as_array_or_item data = np.asarray(data) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/numpy/core/numeric.py:482: in asarray return array(a, dtype, copy=False, order=order) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/array/core.py:1025: in __array__ x = self.compute() ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/base.py:79: in compute return compute(self, **kwargs)[0] ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/base.py:179: in compute results = get(dsk, keys, **kwargs) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:537: in get_sync raise_on_exception=True, **kwargs) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:500: in get_async fire_task() ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:476: in fire_task callback=queue.put) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:525: in apply_sync res = func(*args, **kwds) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:268: in execute_task result = _execute_task(task, data) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:248: in _execute_task args2 = [_execute_task(a, cache) for a in args] ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:248: in <listcomp> args2 = [_execute_task(a, cache) for a in args] ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:245: in _execute_task return [_execute_task(a, cache) for a in arg] ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:245: in <listcomp> return [_execute_task(a, cache) for a in arg] ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/async.py:249: in _execute_task return func(*args2) ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/dask/array/core.py:52: in getarray c = a[b] xarray/core/indexing.py:401: in __getitem__ return type(self)(self.array[key]) xarray/core/indexing.py:376: in __getitem__ return type(self)(self.array, self._updated_key(key)) xarray/core/indexing.py:354: in _updated_key for size, k in zip(self.array.shape, self.key): xarray/core/indexing.py:364: in shape for size, k in zip(self.array.shape, self.key): xarray/core/utils.py:414: in shape return self.array.shape xarray/backends/netCDF4_.py:37: in __getattr__ return getattr(self.datastore.ds.variables[self.var], attr) ../../anaconda/envs/test_env_xarray35/lib/python3.5/contextlib.py:66: in __exit__ next(self.gen) xarray/backends/h5netcdf_.py:105: in ensure_open self.close() xarray/backends/h5netcdf_.py:190: in close _close_ds(self.ds) xarray/backends/h5netcdf_.py:70: in _close_ds find_root(ds).close() ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/h5netcdf/core.py:458: in close self._h5file.close() ../../anaconda/envs/test_env_xarray35/lib/python3.5/site-packages/h5py/_hl/files.py:302: in close self.id.close() h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2840) ??? h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2798) ??? h5py/h5f.pyx:282: in h5py.h5f.FileID.close (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/h5f.c:3905) ??? h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2840) ??? h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/work/h5py-2.6.0/h5py/_objects.c:2798) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E RuntimeError: dictionary changed size during iteration h5py/_objects.pyx:119: RuntimeError ============================================================================================ 1 failed, 1415 passed, 95 skipped in 116.54 seconds ============================================================================================= Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x10f16e598> Traceback (most recent call last): File "/Users/pwolfram/anaconda/envs/test_env_xarray35/lib/python3.5/weakref.py", line 117, in remove TypeError: 'NoneType' object is not callable — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1198 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1he81D4jBPR8uq8EpiC1dHMBlTbnks5rZVwXgaJpZM4LfwBQ> .

shoyer · 2017-03-22T19:32:53Z

Subclasses also work in place of fixtures in many cases (we use them in much of the existing code).

…

On Wed, Mar 22, 2017 at 12:30 PM Phillip Wolfram ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In xarray/tests/test_backends.py <#1198 (comment)>: > with self.assertRaisesRegexp(IOError, 'no files to open'): - open_mfdataset('foo-bar-baz-*.nc') + for close in [True, False]: I think I understand how to use fixtures for both of these arguments now... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1198 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1tp6OBPnEY9sseevAKirVhHD0XR1ks5roXbrgaJpZM4LfwBQ> .

pwolfram · 2017-03-22T22:26:40Z

@shoyer, if we generally cover test_backends for autoclose=True, then we should get the pickle testing for free:

xarray/tests/test_backends.py:181:    def test_pickle(self):
xarray/tests/test_backends.py:191:    def test_pickle_dataarray(self):
xarray/tests/test_backends.py:792:    def test_bytesio_pickle(self):

or was there some other test that is needed?

shoyer · 2017-03-22T22:27:36Z

@shoyer, if we generally cover test_backends for autoclose=True, then we should get the pickle testing for free

Agreed, that should do it.

pwolfram · 2017-03-23T16:44:44Z

@shoyer, that subclass-based approach you outlined worked (fixture parameters really don't work with classes as far as I could tell). We now have more comprehensive, named testing. Note, there was one minor point that required more explicitly specification that arose from the more rigorous testing:

The scipy backend can handle objects like BytesIO that really aren't file handles and there doesn't appear to be a clean way to close these types of objects. So, at present I'm explicitly setting _autoclose=False if they are encountered in the datastore. If this needs to be changed, particularly since it doesn't affect existing behavior, I'd prefer this be resolved in a separate issue / PR if possible.

pwolfram · 2017-03-23T17:08:41Z

@shoyer, I had a minor bug that is now removed. The last caveat no longer applicable:

The scipy backend can handle objects like BytesIO that really aren't file handles and there doesn't appear to be a clean way to close these types of objects. So, at present I'm explicitly setting _autoclose=False if they are encountered in the datastore. If this needs to be changed, particularly since it doesn't affect existing behavior, I'd prefer this be resolved in a separate issue / PR if possible.

I'll let you know when tests pass and this is ready for your final review.

pwolfram · 2017-03-23T17:22:11Z

@shoyer, this is ready for the final review now. Coveralls appears to have hung but other tests pass.

pwolfram · 2017-03-23T17:26:23Z

@shoyer, all tests (including coveralls) passed. Please let me know if you have additional concerns and if we could merge fairly soon, e.g., because of MPAS-Dev/MPAS-Analysis#151 I would really appreciate it.

shoyer · 2017-03-23T17:49:57Z

xarray/backends/h5netcdf_.py

 from .netCDF4_ import (_nc4_group, _nc4_values_and_dtype,
                       _extract_nc4_variable_encoding, BaseNetCDF4Array)


+class H5NetCDFFArrayWrapper(BaseNetCDF4Array):


spelling: extra F

shoyer · 2017-03-23T17:52:43Z

xarray/backends/scipy_.py

@@ -96,28 +110,38 @@ def __init__(self, filename_or_obj, mode='r', format=None, group=None,
            raise ValueError('invalid format for scipy.io.netcdf backend: %r'
                             % format)

+        # if the string ends with .gz, then gunzip and open as netcdf file
+        if type(filename_or_obj) is str and filename_or_obj.endswith('.gz'):


Use isinstance(filename_or_obj, basestring) instead of type(filename_or_obj) is str.

Also, move this logic into _open_scipy_netcdf -- otherwise it won't work to reopen a gzipped file.

shoyer · 2017-03-23T17:54:47Z

xarray/backends/scipy_.py

+                                  version=version)
+    except TypeError as e:
+        # TODO: gzipped loading only works with NetCDF3 files.
+        if 'is not a valid NetCDF 3 file' in e.message:


This should only be triggered when reading a gzipped file. Right now, it can be triggered whenever an invalid netCDF3 file is read, gzipped or not.

This should be easier to fix when you move gzip.open into this helper function (see my comment below).

Thanks for pointing this out-- sorry for this sloppiness.

shoyer · 2017-03-23T17:57:21Z

xarray/tests/test_backends.py

+#    autoclose = True
+
+class OpenMFDatasetTest(TestCase):
+    autoclose = True


I don't think you need this class variable, since this test class does not use inheritance.

shoyer · 2017-03-23T17:57:44Z

xarray/tests/test_backends.py

+#class H5NetCDFDataTestAutocloseTrue(H5NetCDFDataTest):
+#    autoclose = True
+
+class OpenMFDatasetTest(TestCase):


rename to something like OpenMFDatasetManyFilesTest (we have other tests of open_mfdataset)

shoyer · 2017-03-23T17:59:18Z

xarray/tests/test_backends.py

@@ -1139,6 +1257,8 @@ def test_dataarray_compute(self):
        self.assertTrue(computed._in_memory)
        self.assertDataArrayAllClose(actual, computed)

+class DaskTestAutocloseTrue(DaskTest):
+    autoclose=True


Please run some sort of PEP8 check, e.g., git diff upstream/master | flake8 --diff. There should be extra spaces around the = sign here.

I'm going to ignore PEP8 to xarray/core/pycompat.py because code in there is essentially a copy / paste.

sounds good

shoyer · 2017-03-23T17:59:40Z

xarray/tests/test_backends.py

+
+                # check that calculation on opened datasets works properly
+                ds = open_mfdataset(tmpfiles, engine=readengine,
+                                    autoclose=self.autoclose)


just set autoclose=True here.

shoyer · 2017-03-23T18:00:16Z

xarray/tests/test_backends.py

+                # split into multiple sets of temp files
+                for ii in original.x.values:
+                    (
+                     original.isel(x=slice(ii, ii+1))


I think this would be slightly more readable with an intermediate variable, or removing the line break on the line above.

@shoyer

Includes testing to demonstrate an OSError associated with opening too many files as encountered using open_mfdataset. Fixed for the following backends: * netCDF4 backend * scipy backend * pynio backend Open/close operations on h5netcdf appear to have an error associated with the h5netcdf library following correspondence with @shoyer. Thus, there are still challenges with h5netcdf; hence, support for h5netcdf is currently disabled. Note, by default `autoclose=False` for open_mfdataset so standard behavior is unchanged unless `autoclose=True`. This choice of default is to select standard xarray performance over general removal of the OSError associated with opening too many files as encountered using open_mfdataset.

shoyer · 2017-03-23T19:19:59Z

OK, in it goes!

pwolfram · 2017-03-23T19:21:25Z

Thanks a bunch @shoyer!

pwolfram changed the title ~~Fixes OS error arrising from too many files open (netCDF and scripy backends)~~ WIP: Fixes OS error arrising from too many files open Jan 10, 2017

shoyer reviewed Jan 10, 2017

View reviewed changes

pwolfram changed the title ~~WIP: Fixes OS error arrising from too many files open~~ WIP: Fixes OS error arising from too many files open Jan 10, 2017

pwolfram force-pushed the fix_too_many_open_files branch 4 times, most recently from 7070fd7 to 349e7db Compare January 11, 2017 16:39

pwolfram changed the title ~~WIP: Fixes OS error arising from too many files open~~ Fixes OS error arising from too many files open Jan 12, 2017

pwolfram force-pushed the fix_too_many_open_files branch from 5acafb6 to 1460a07 Compare January 31, 2017 19:50

pwolfram mentioned this pull request Jan 31, 2017

Xarray crashes when opening too many files are opened MPAS-Dev/MPAS-Analysis#49

Closed

pwolfram changed the title ~~Fixes OS error arising from too many files open~~ WIP: Fixes OS error arising from too many files open Feb 2, 2017

pwolfram force-pushed the fix_too_many_open_files branch from 1460a07 to 73b601d Compare February 5, 2017 05:19

pwolfram force-pushed the fix_too_many_open_files branch from 73b601d to 923473c Compare February 5, 2017 05:39

pwolfram force-pushed the fix_too_many_open_files branch 3 times, most recently from 6c40f49 to 3ddf9c1 Compare March 22, 2017 19:25

pwolfram mentioned this pull request Mar 23, 2017

Add autoclose argument to xarray.open_mfdataset call MPAS-Dev/MPAS-Analysis#151

Merged

pwolfram force-pushed the fix_too_many_open_files branch 4 times, most recently from be6ab05 to beef5af Compare March 23, 2017 16:40

pwolfram force-pushed the fix_too_many_open_files branch from beef5af to 8f2fb8c Compare March 23, 2017 17:04

pwolfram force-pushed the fix_too_many_open_files branch 4 times, most recently from a531b10 to 8f2fb8c Compare March 23, 2017 17:16

shoyer reviewed Mar 23, 2017

View reviewed changes

pwolfram force-pushed the fix_too_many_open_files branch from 8f2fb8c to 20c5c3b Compare March 23, 2017 18:50

shoyer merged commit 371d034 into pydata:master Mar 23, 2017

pwolfram deleted the fix_too_many_open_files branch March 23, 2017 19:21

This was referenced Mar 23, 2017

Too many open files when opening large datasets CCI-Tools/cate#102

Closed

Marks slow, flaky, and failing tests #1336

Merged

neishm mentioned this pull request Jun 21, 2018

to_netcdf(compute=False) can be slow #2242

Closed

Fixes OS error arising from too many files open #1198

Fixes OS error arising from too many files open #1198

Conversation

pwolfram commented Jan 10, 2017 • edited Loading

pwolfram commented Jan 10, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwolfram commented Jan 11, 2017

shoyer commented Jan 11, 2017

pwolfram commented Jan 11, 2017

shoyer commented Jan 11, 2017

pwolfram commented Jan 12, 2017

shoyer commented Jan 12, 2017 via email

shoyer commented Jan 12, 2017

pwolfram commented Jan 13, 2017

PeterDSteinberg commented Jan 25, 2017

pwolfram commented Jan 25, 2017

pwolfram commented Jan 31, 2017

pwolfram commented Feb 2, 2017

vnoel commented Feb 3, 2017

shoyer commented Feb 4, 2017

pwolfram commented Feb 5, 2017

shoyer commented Feb 5, 2017 via email

shoyer commented Mar 22, 2017 via email

pwolfram commented Mar 22, 2017

shoyer commented Mar 22, 2017

pwolfram commented Mar 23, 2017

pwolfram commented Mar 23, 2017

pwolfram commented Mar 23, 2017

pwolfram commented Mar 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Mar 23, 2017

pwolfram commented Mar 23, 2017

pwolfram commented Jan 10, 2017 •

edited

Loading