Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending PR 106 #110

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Extending PR 106 #110

wants to merge 3 commits into from

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Jun 2, 2021

Closes #106

oliver and others added 3 commits May 20, 2021 18:41
fixed a couple typos and filled-in the text since I was here, then I edited the whole thing for my own clarity
fwiw in the third paragraph I'm not happy with: "Unfortunately this means one might accidentally install everything again and replicate their system.", since I think it's not clear what actions the user must take for this to occur and the word install doesn't give me an indication of what that means in Julia where you would add or load or use.  It could also indicate if this is primarily a memory problem and if so which memory, primary, secondary, something else? 🍰
unfilled-in lines
@rofinn rofinn requested a review from oxinabox June 2, 2021 19:18

This information makes it possible to reconstruct the original environment necessary to deserialize the objects if the internal serialization format has introduced breaking changes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't love this sentence, but I want to make it clear that the metadata serves more as breadcrumbs for resolving serialization issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is not great.
Even if the serializer hasn't introduced breaking changes, if the structure of the object itself has changed that is also a problem, that need to be dealt with via loading the old object.

Maybe more like

Suggested change
This information makes it possible to reconstruct the original environment necessary to deserialize the objects if the internal serialization format has introduced breaking changes.
In general, serializers are not able to handle deserialized objects if the type defintions of the stored objects have changed.
For example if a field has been added then there is no such field in the stored data so it can't be loaded using the new version of the package.
JLSO stores the package version, thus allowing you to load the old version of the package from before that change and so be able to recover the object.
Similar occurs if the serializer itself changes.

Even though it is longer

@rofinn rofinn mentioned this pull request Jun 2, 2021
@codecov
Copy link

codecov bot commented Jun 2, 2021

Codecov Report

Merging #110 (168586f) into master (dbf7be6) will increase coverage by 0.11%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
+ Coverage   97.56%   97.67%   +0.11%     
==========================================
  Files           6        6              
  Lines         205      215      +10     
==========================================
+ Hits          200      210      +10     
  Misses          5        5              
Impacted Files Coverage Δ
src/file_io.jl 94.44% <0.00%> (+0.10%) ⬆️
src/JLSOFile.jl 93.33% <0.00%> (+2.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbf7be6...168586f. Read the comment docs.


This information makes it possible to reconstruct the original environment necessary to deserialize the objects if the internal serialization format has introduced breaking changes.
Since the metadata is formatted in language-agnostic BSON (no Julia extensions), all you need is a BSON reader.
Due to the size of this metadata, JLSO files are not ideal for storing many small files.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if some of this would be clearer in the context of an FAQ section like

Who is this intended for?

Why is this metadata useful?

Why is there a balancing act between performance, flexibility and reliability in serialization formats?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Can be a follow up PR.
Some of that wouldn't want to be in a readme?

that employs a serializer, and a compressor, handles all the other concerns including metadata and saving.
Such that the serializer just needs to determine how to turn a julia object into a stream`Vector{UInt8}`,
and the compressor just needs to determine how to turn one stream of `UInt8`s into a smaller one (and the reverse).
Think of it less as a serialization format and more of a container for arbitrarily serialized objects and metadata.
Copy link
Member

@oxinabox oxinabox Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of calling it a container was not merely that that it contains things.
It is specifically referring to a container format
https://en.wikipedia.org/wiki/Container_format_(computing)
such as the Real Media format https://en.wikipedia.org/wiki/RealMedia
which can in turn wrap multiple different codecs/formats (serializers in our case)

With the key untility of JLSO that the serializer and compressors just have to work with Vector{UInt8} and JLSO takes care of hooking them up and of metadata.
(which is very much what container formats do).

I think moving that information away from the start hides this.
Possibly can solve by just linking to wikipedia on container format though


This information makes it possible to reconstruct the original environment necessary to deserialize the objects if the internal serialization format has introduced breaking changes.
Since the metadata is formatted in language-agnostic BSON (no Julia extensions), all you need is a BSON reader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
Since the metadata is formatted in language-agnostic BSON (no Julia extensions), all you need is a BSON reader.
Since the metadata is formatted in language-agnostic BSON (no Julia extensions), in the worst case you can access these breadcumbs with any BSON reader.

Comment on lines +26 to +32
- JLSO version number
- Julia version number
- Serialization format (e.g., "julia_serialize", "bson")
- Compression method like "gzip"
- A docker image URI
- Project.toml
- Manifest.toml (possibly compressed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all this listed in the README?

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do as you judge best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants