Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several improvements concerning dicts, passing open HDF5-files and more #94

Merged
merged 10 commits into from
Feb 25, 2019
Merged

Conversation

1313e
Copy link
Collaborator

@1313e 1313e commented Feb 21, 2019

This PR contains the following changes:

I have tried to remain as true as possible to the original style of coding (except I use parenthesis for return, since I hate using it as a statement).
All changes should be backwards compatible, except for hickled dicts that had their dtype saved incorrectly.

Removed use of ModuleNotFoundError, as it was introduced in Python 3.6 and therefore incompatible with Python 3.5.
However, since it is a subclass of ImportError, catching ImportError will also catch ModuleNotFoundError.
…n hickle.py.

Fixed problem with HDF5-files not being closed properly during dump if an error was raised.
…utomatically closes it.

If any errors are raised during hickle.dump or hickle.load, the file is automatically closed unless it was already open.
Set the working directory for running tests to the operating system's temporary directory (this also allows for all os.remove statements to be removed).
Written tests for the new features.
Removed all cases of using os.remove() or any file removal, as files are now created in the temporary directory of the system (which is emptied automatically by either the computer or pytest).
The dtype of a dict is now always properly saved (before, it was sometimes saved as a binary type and sometimes as a normal type).
Added tests for checking if empty dicts are unhickled properly.
Using tuples as dict keys will currently raise an error when loading it back in (I am working on a fix).
@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

Actually, you may want to wait with accepting the PR, as I'm planning on doing one final upgrade to the keys of dicts, which will allow for both '1' and 1 to be used and saved as dict keys (which is currently not possible).

… (in which case they are saved as double strings).

Dict keys are now converted back to their original type using the eval()-function (which converts everything to what it used to be if it was a string to begin with).
Added tests for the new functionalities.
…t[:], as this works in both cases and the latter is deprecated.
@telegraphic
Copy link
Owner

Thanks, looks like there's some great stuff here!

I am however going to push back against the use of eval, for security reasons. A nefarious soul could create a hdf5 file with a key 'os.system("rm -rf /"), with attribute "<class 'tuple'>", which would not be fun.

That said, loading pickles is in itself dangerous if you didn't make it yourself, e.g. in the pickle docs:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

As hickle loads data mapped from HDF5, it should in theory be much safer*. There is however in hickle.load a safe=True flag so a user can explicitly decide to unpickle arbitrary objects.

I think in this case, the cons of using eval outweigh the pros, and existing users may not forsee arbitrary code execution when loading a HDF5 file. Perhaps there should be more discussion of safety in the documentation, as it is not currently stated?

* Of course, don't trust any code! Obligatory disclaimers in the LICENSE.

@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

@telegraphic I was considering ACE as well, but it is not possible to create a dataset with a key that is a tuple but in reality is not.
The only way of doing that would be by manually adjusting the attribute yourself after hickle dumped the data.
In that case, it is not really hickle's fault.

But, I will have a look at it.
Maybe I can come up with a different solution (or make sure that using eval cannot be used for any ACE at all).

@telegraphic
Copy link
Owner

Thanks, hope you're having fun ;). It's was a learning experience for me dealing with bug reports due to weird & wonderful use cases I didn't think of (or even know you could do)! It's a bit of a balancing act to find sensible ways to map between Python and HDF5 data structures.

@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

Thanks, hope you're having fun ;). It's was a learning experience for me dealing with bug reports due to weird & wonderful use cases I didn't think of (or even know you could do)! It's a bit of a balancing act to find sensible ways to map between Python and HDF5 data structures.

Oh, I am having fun with this.
How much focus do you want me to put on avoiding the opportunity for ACE?

@telegraphic
Copy link
Owner

Good question, difficult to answer! I think an estimate of likelihood of use case is required. As little as possible?

For example, I can certainly see use cases where you have tuple indexes all of the same type, e.g. a = {(0, 0, 0) : np.array(...)} for tying sparse data to an array index. So supporting tuples consisting of data all of the same type might be a reasonable compromise in terms of code simplicity and usefulness.

If you do find use cases that really really need eval, I would request it is tied to the safe= keyword in load, so a user gets a warning.

There is very basic support for registering a new class to hickle, which gives a path to allow users to add their own creations, but this is still limited and won't necessarily override default handling of e.g. dict.

@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

I actually have a case where I am using tuples that have integers, floats and strings in them.
However, using the eval function simply makes saving and loading the tuples much, much easier.

@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

Alright, I have currently removed using the eval function for everything besides tuples (it was also required for strings before, to make sure that 1 and '1' could be used).
Now looking into avoiding eval for tuples.

@1313e
Copy link
Collaborator Author

1313e commented Feb 22, 2019

Alright, I just found out about the ast.literal_eval function (link) which does not allow for ACE to occur, but I can still convert tuple strings back to tuples.
I guess this satisfies the condition?

@telegraphic
Copy link
Owner

Thanks @1313e, ast.literal_eval is a nice find! Happy to accept all these changes, fantastic stuff :)

@telegraphic telegraphic merged commit 160d037 into telegraphic:master Feb 25, 2019
@1313e 1313e deleted the dev_1313e branch February 25, 2019 02:57
@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

@telegraphic Would it be possible to generate a new release for PyPI, such that I can update the requirements in PRISM and make a new version there as well?

@telegraphic
Copy link
Owner

sure thing, just rerunning tests, the scipy sparse_matrix test is failing for some reason on Py 3.5 and 3.7 (but not Py 2.7 or 3.7), even though PR passed

@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

I have been having some issues with Travis lately as well, where it was simply giving me errors about timing out or something.

@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

Ugh, it is astropy, again...
Seems that astropy 3.1.2 is the problem, as the PR used 3.1.1, but the current builds are using 3.1.2.

@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

You may want to set a minimum required version for NumPy, as I get the feeling that that is the problem (it raises errors of NumPy 1.7, which was released a very long time ago).

@telegraphic
Copy link
Owner

@1313e -- I think this is an upstream bug, so pushed to PyPi

@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

@telegraphic I saw it, thanks.

@1313e
Copy link
Collaborator Author

1313e commented Feb 25, 2019

@telegraphic Also, if you need some help with Python2/Python3 compatibility (especially strings) or anything else, let me know.

@1313e 1313e mentioned this pull request Sep 10, 2019
@1313e 1313e mentioned this pull request Oct 1, 2019
hernot pushed a commit to hernot/hickle that referenced this pull request Dec 1, 2020
Several improvements concerning dicts, passing open HDF5-files and more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants