Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent h5py #3368

Merged
merged 3 commits into from
Feb 11, 2021
Merged

Concurrent h5py #3368

merged 3 commits into from
Feb 11, 2021

Conversation

woutdenolf
Copy link
Contributor

@woutdenolf woutdenolf commented Jan 29, 2021

Closes #3354

Tested with h5py 2.8, 2.9, 2.10, 3.1

Stress tests: https://gitlab.esrf.fr/denolf/concurrent_h5py

The main API:

  • silx.io.h5py_utils.File: replace h5py.File mainly to handle file locking
  • silx.io.h5py_utils.retry: retry the method whenever you get an HDF5 IO exception
  • silx.io.h5py_utils.retry_in_subprocess: retry in a subprocess to capture segfaults
  • silx.io.h5py_utils.retry_contextmanager: retry entering the context manager

The retry logic can also be used for other things than HDF5 reading:

  • silx.utils.retry.retry
  • silx.utils.retry.retry_in_subprocess
  • silx.utils.retry.retry_contextmanager

Common use case of silx.io.h5py_utils for processing a Bliss Nexus file:

import silx.io.h5py_utils

@silx.io.h5py_utils.retry()
def process_scan(filename, name):
    """The method will be executed again if
    any HDF5 IO fails.
    """
    with silx.io.h5py_utils.File(filename) as h5file:
        scan = h5file[name]
        I0 = scan["measurement/I0"][()]
        It = scan["measurement/It"][()]
        return It/I0

scans = silx.io.h5py_utils.safe_top_level_names("myfile.h5")

for name in scans:
    process_scan("myfile.h5", name)

@woutdenolf woutdenolf added this to the 0.15 milestone Jan 29, 2021
@woutdenolf woutdenolf force-pushed the concurrent_h5py branch 3 times, most recently from 1a538b1 to 45b3c6b Compare February 3, 2021 11:40
@woutdenolf woutdenolf force-pushed the concurrent_h5py branch 2 times, most recently from 28f5598 to 68c8568 Compare February 7, 2021 13:43
silx/io/h5py_utils.py Outdated Show resolved Hide resolved
@payno payno self-requested a review February 8, 2021 10:24
@woutdenolf woutdenolf requested review from t20100 and vasole February 8, 2021 12:17
@woutdenolf woutdenolf changed the title WIP: Concurrent h5py Concurrent h5py Feb 8, 2021
@woutdenolf woutdenolf marked this pull request as ready for review February 8, 2021 12:18
@t20100
Copy link
Member

t20100 commented Feb 8, 2021

Nice to have the generic retry stuff apart from the HDF5 utility!

I hope that one day we won't need this any more, but in the (probably very long) meantime it makes it work.

A few minor "cosmetic" suggestions (up to you):

  • silx/io/h5py_utils.py:
    • Make is_h5py_exception, retry_h5py_error, include_default private?
    • I would not put a open function since h5py does not provide one
  • silx/utils/retry.py:
    • Make default_retry_on_error private?
    • Remove retry_ prefix from arguments retry_timeout, retry_period and retry_on_error since function are named retry*

@woutdenolf
Copy link
Contributor Author

Thanks for the review!

Nice to have the generic retry stuff apart from the HDF5 utility!

I hope that one day we won't need this any more, but in the (probably very long) meantime it makes it work.

Exactly, we make it work like this and work on a better solution afterwards.

A few minor "cosmetic" suggestions (up to you):

* `silx/io/h5py_utils.py`:
  
  * Make `is_h5py_exception`, `retry_h5py_error`, `include_default` private?

Ok, except include_default as it may be useful for open_item as well (None is the default there). I will rename it to group_is_complete or something.

  * I would not put a `open` function since `h5py` does not provide one

Ok.

* `silx/utils/retry.py`:
  
  * Make `default_retry_on_error` private?

OK

  * Remove `retry_` prefix from arguments `retry_timeout`, `retry_period` and `retry_on_error` since function are named `retry*`

I added the prefix because you can override the timeout (and also the others) in each individual call while the argument names without the "retry_" prefix are rather generic and have higher probability of colliding with the method's named arguments. So for example:

@retry(retry_timeout=10)
def method(..., timeout=None):
     # "timeout" has nothing to do with retrying

# I want to override 10 by 1 while the method already has
# a timeout argument for some other purpose
method(retry_timeout=1, timeout=5)

@t20100
Copy link
Member

t20100 commented Feb 8, 2021

I added the prefix because you can override the timeout (and also the others) in each individual call

OK. That's the kw.pops.

Copy link
Member

@payno payno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty convenient to use (regarding the context :) and nice test suite)

I was wondering if it would be beneficial to have a redefinition of silx.io.utils.get_data(url, timeout) function in silx.io.h5py_utils.
This can be done in another PR by someone else.
But from my point of view it could be convenient to have
something simple like:

@retry()
def get_data(url):
    return silx_get_data(url)

or something more advance like:
get_data(url, retry_timeout, retry_invalid...) and call open_item if this is an hdf5 file otherwise call the original get_data. But maybe this is overkill.

silx/io/h5py_utils.py Show resolved Hide resolved
silx/io/h5py_utils.py Show resolved Hide resolved
@woutdenolf
Copy link
Contributor Author

woutdenolf commented Feb 10, 2021

I was wondering if it would be beneficial to have a redefinition of silx.io.utils.get_data(url, timeout) function in silx.io.h5py_utils.

I'm not quite sure a method in silx.io.h5py_utils should be able to handle non-HDF5 things. It should be the other way around imo: silx.io.h5py_utils.get_data (HDF5-only with retry) which gets called by silx.io.utils.get_data but then it's probably wise to overwrite the default retry_timeout=None. Otherwise this call will block forever when the HDF5 file is corrupt.

Lets keep that for another PR.

@woutdenolf woutdenolf requested a review from payno February 10, 2021 17:19
@t20100 t20100 merged commit ced52a5 into silx-kit:master Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent HDF5 reading and writing without SWMR
3 participants