-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add save() and load() method to solution. #52
Comments
AFAIK there are 2 main options. The first one (and in my opinion best) is to pickle the I am no expert at this, perhaps @MDAnalysis/coredevs will know more but see the following link: https://docs.python.org/3/library/pickle.html. The other option is to parse out the data to and from some kind of set of JSON files or such like. I am less in favour of this as it is fiddly and will require some introspection into class state etc which is a bit complicated. It may also seperate stuff into multiple files which is a lot less clean. On the plus side, these can then be human readable, but I think the downsides outweigh the positives. |
Pickle is quick but not a good format for data. It can happen that you can’t process a pickle file with a different version of Python IIRC. Results such as RDFs should be in a good data format anyway. CSV (compressed) is the lowest common denominator. HDF5 is quite flexible and widely used but it is a heavy dependency. Overall, I would spend some time figuring out how your workflow should work out. It’s often cleaner to have data producers and data analyzers and reduce coupling between the two. |
If we save the output of expensive operations like Maybe as @orbeckst suggests it's best to decouple the data production and analysis and not bother with implementing I favor JSON because it fits with the other infrastructure I use, but that's my personal bias. CSV would likely be more space efficient for the DataFrames. |
As you suggest, perhaps the best initial target is to implement saving functions for analyses in simple easy to use formats. JSON or CSV is fine, but as @orbeckst says perhaps CSV is the lowest common denominator. If we were to be dumping state and making it loadable I would favour PyHDF5 as everything can be contained in a single space efficient file. However writing The coupling with |
Agreed @hmacdope. I just moved this off the v0.2 roadmap. I'll make a new issue that specifically identifies creating a When we return to this later, I think your points are spot on. |
Sounds good. :) |
This important functionality is currently missing. As is, users would need to rerun their analysis.
There should be methods to serialize and load a solution object in a single file.
I believe we would need to save:
solvation_data
dataframeThat should be sufficient to reconstruct the Solution.
The text was updated successfully, but these errors were encountered: