Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hickling NumPy arrays with dtypes that are not supported by h5py #90

Closed
1313e opened this issue Feb 15, 2019 · 1 comment · Fixed by #134
Closed

Hickling NumPy arrays with dtypes that are not supported by h5py #90

1313e opened this issue Feb 15, 2019 · 1 comment · Fixed by #134

Comments

@1313e
Copy link
Collaborator

1313e commented Feb 15, 2019

I was just trying out hickle a bit and I am happy to see that it can easily hickle, for example, lists that contain multiple different types of objects (as that cannot be converted to a NumPy array and thus normally cannot be stored quickly by h5py).
However, I know that h5py will fail to save any NumPy array that has a non-supported dtype, of which some are quite far-fetched and some should simply be supported in the first place.
It seems that hickle does not support these either, which would be nice if they were added.

Examples of non-supported NumPy array dtypes (I know that there are more, but I cannot remember them as easily at the moment):

  • dtype('O'), which are used by NumPy arrays if you give it anything that cannot be converted to a NumPy array (like dicts: np.array({1: 2}) produces array({1: 2}, dtype=object)).
  • dtype('<UX') with X being any integer, which are used by NumPy arrays holding unicode strings (Python 3 only probably). Saving any NumPy array of strings in Python 2 will work, as they are bytes strings.

The first example is obviously a bit far-fetched, as there is no reason to store a dict (or anything that cannot be converted to an array) in a NumPy array, but the second example is something that has its uses.

Also, I would like to mention that while the hickle.load function ensures that the opened HDF5-file is closed by using a with-statement (which, btw, makes the try-statement around it obsolete, unless the finally-clause will perform something else in the future), the hickle.dump function does not ensure this.
Therefore, when trying to hickle a NumPy array with one of the dtypes as described above, the HDF5-file will never be closed properly after raising the h5py error.

@telegraphic
Copy link
Owner

Cheers @1313e , I think the first is a relatively straightforward enhancement, the closing file is a bug, so will split into two issues when I next do package maintenance.

There are some string/bytes/unicode handling changes on the todo list allow Py2 and Py3 to make hickle files try a best-effort loading, which might be a good time to look at <U handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants