-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON contents #516
JSON contents #516
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like a great start to me! I've left a few comments -- mainly related to the merge function.
I am fine with the design, having a top-level dictionary makes sense.
src/irmin/contents.ml
Outdated
let open Merge.Infix in | ||
old () >>=* function | ||
| Some j -> | ||
if j = t1 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you want to use =
here or a custom equality on JSON objects.
src/irmin/contents.ml
Outdated
old () >>=* function | ||
| Some j -> | ||
if j = t1 then | ||
merge_object t1 t2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why merge here? you probably just want t2
(as t1 is in the past of t2)
src/irmin/contents.ml
Outdated
if j = t1 then | ||
merge_object t1 t2 | ||
else if j = t2 then | ||
merge_object t2 t1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: t2
is in the past of t1
you probably just want to return t1
.
src/irmin/contents.ml
Outdated
|
||
let t = Type.(list (pair string json)) | ||
|
||
let merge_object a b = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will have to define a proper 3-way merge function for JSON objects: you need to detect if a value has been updated or if it's a real conflict.
src/irmin/contents.ml
Outdated
merge_object t2 t1 | ||
else | ||
merge_object j t1 >>=* fun t1 -> | ||
merge_object t1 t2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you probably want merge_object j t1 t2
: deciding if a value has been updated in either t1
or t2
can be done by checking if the association from j
has changed. A conflict should be raised only if such a value has been updated in both branches.
src/irmin/contents.ml
Outdated
let of_string s = | ||
let decoder = Jsonm.decoder (`String s) in | ||
Type.decode_json t decoder | ||
let merge = Merge.(option (idempotent t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to fully understand the purpose of Make
, but I guess you probably want to re-use the merge
function defined above here instead of using the default one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make
was just added as a clear way to make a JSON value from an exsiting type, rather than the user having to combine all those pieces manually. It is not entirely necessary, but I found it helpful when testing these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite it being convenient for my testing, I'm not sure that it will work because the merge function would have to be rewritten for this. I am going to remove it, but add it as an example to one of the tutorials.
Another random idea: I am not sure how to do this yet, but it would be great if the user could choose to either flatten the JSON object into a single file or to "project" it into a subtree. One interesting consequence of this is that |
That does sound pretty interesting! After I make the changes from your notes I will spend a little time seeing how feasible that is. |
I've got a super rough implementation of JSON objects projected onto a tree, it still needs some work before I’d be satisfied with it. I'm wondering where the best place to put it is, because right now it feels a little weird with the module argument. Is there somewhere that makes more sense? Any other thoughts on this? It is definitely an interesting feature and I think it should be included in Irmin if we can get the API right. I'm also curious if I can generalize it to work with any list of pairs. Do you think that is worth exploring, or should we just stick to JSON? |
Okay -- after some sleep, and further work I have cleaned it up quite a bit. However, I'm still wondering if there's a better way to do this that I'm missing. Also, I will just be focusing on the JSON based projection rather than trying to generalize it. Maybe I will get around to that in the future if there is interest. |
src/irmin/contents.ml
Outdated
else if equal old b then Merge.ok a | ||
else merge3 old a b | ||
| None -> | ||
merge2 a b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the proper way to write these merge functions is to define mutually recursive functions which match the shape of the json
type: for instance, you want the recursive merge function to be applied on array and dictionaries members too. I think the result should be very similar to what it is done in Tree.merge and Tree.Node.merge but with a different (and somehow a bit simpler, as their no indirection) datatype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you for the tip. I will stop trying to make things too simple!
src/irmin/contents.ml
Outdated
| Error _ -> Lwt.return tree) d tree | ||
| v -> S.Tree.add tree key v | ||
|
||
let get_tree (type n) (module S: STORE with type node = n) (tree: S.tree) key = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha yes I see, having a first-class store is not very nice here. I am wondering if providing Tree.concrete would just be enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having some issues with Tree.to_concrete. It seems to only return the deepest content node of the tree. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will try to track that down as well.
Yes, don't worry too much about the generalisation. |
Rename Proj to Json_tree and cleanup implementation
I've significantly refactored the merge operation and am a lot happier with it now. I didn't remove Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the JSON merge functions looks very good now! I'm happy to merge this one.
The JSON_tree is looking great too. I think we are very close to something nice now. I would propose this:
module Json_tree(S: S with type contents = json) = struct
..
end
e.g. take a full json store as argument. Then your type tree is simple S.Tree.concrete
. And it is indeed fine to redefine the set*
and get*
functions there.
src/irmin/contents.ml
Outdated
| `Contents of json * M.t | ||
] | ||
|
||
let rec to_concrete_tree (j: json): tree = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is not tail recursive, this could be a problem on large JSON docs (the ones that you want to project to a tree).
src/irmin/contents.ml
Outdated
`Tree x | ||
| _ -> `Contents (j, M.default) | ||
|
||
let rec of_concrete_tree c : json = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Actually, can you just keep your changes in that PR for the |
Sure, so should I create a new PR with only the |
Actually your new changes look great! I'll squash and merge as-is, no need for a new PR. |
…t, irmin-chunk, irmin-pack, irmin-mirage-graphql, irmin-mirage, irmin-graphql, irmin-http, irmin-git and irmin-test (2.0.0) CHANGES: #### Added - **irmin-pack** (_new_): - Created a new Irmin backend, `irmin-pack`, which uses a space-optimised on-disk format. - **irmin-graphql** (_new_): - Created a new package, `irmin-graphql`, which provides a GraphQL server implementation that can be used with both the MirageOS and Unix backends. Additionally, a `graphql` command has been added to the command-line interface for starting `irmin-graphql` servers. (mirage/irmin#558, @andreas, @zshipko) - Contents can now be queried directly using `irmin-graphql` with `Irmin_graphql.Server.Make_ext` and the `Irmin_graphql.Server.PRESENTER` interface. (mirage/irmin#643, @andreas) - **irmin-test** (_new_): - Added a new package, `irmin-test`, which allows for packages to access the Irmin test-suite. This package can now be used for new packages that implement custom backends to test their implementations against the same tests that the core backends are tested against. (mirage/irmin#508, @zshipko) - **irmin-unix**: - Add `Cli` module to expose some methods to simplify building command-line interfaces using Irmin. (mirage/irmin#517, @zshipko) - Add global config file `$HOME/.irmin/config.yml` which may be overridden by either `$PWD/.irmin.yml` or by passing `--config <PATH>`. See `irmin help irmin.yml` for details. (mirage/irmin#513, @zshipko) - **irmin-git**: - Allow import/export of Git repositories using Irmin slices. (mirage/irmin#561, @samoht) - **irmin-http**: - Expose a `/trees/merge` route for server-side merge operations. (mirage/irmin#714, @samoht) - **irmin**: - Add `Json_value` and `Json` content types. (mirage/irmin#516 mirage/irmin#694, @zshipko) - Add optional seed parameter to the `Irmin.Type` generic hash functions. (mirage/irmin#712, @samoht) - Add `V1` submodules in `Commit`, `Contents` and `Hash` to provide compatibility with 1.x serialisation formats. (mirage/irmin#644 mirage/irmin#666, @samoht) - Add `Store.last_modified` function, which provides a list of commits where the given key was modified last. (mirage/irmin#617, @pascutto) - Add a `Content_addressable.unsafe_add` function allowing the key of the new value to be specified explicitly (for performance reasons). (mirage/irmin#783, @samoht) - Add `save_contents` function for saving contents to the database. (mirage/irmin#689, @samoht) - Add pretty-printers for the results of Sync operations. (mirage/irmin#789, @craigfe) - `Private.Lock` now exposes a `stats` function returning the number of held locks. (mirage/irmin#704, @samoht) #### Changed - **irmin-unix**: - Rename `irmin read` to `irmin get` and `irmin write` to `irmin set`. (mirage/irmin#501, @zshipko) - Switch from custom configuration format to YAML. (mirage/irmin#504, @zshipko) - **irmin-git**: - Require `ocaml-git >= 2.0`. (mirage/irmin#545, @samoht) - Cleanup handling of remote stores. (mirage/irmin#552, @samoht) - **irmin-http**: - Rename `CLIENT` to `HTTP_CLIENT` and simplify the signatures necessary to construct HTTP clients and servers. (mirage/irmin#701, @samoht) - **irmin-mirage** - Split `irmin-mirage` into `irmin-{mirage,mirage-git,mirage-graphql}` to allow for more granular dependency selection. Any instances of `Irmin_mirage.Git` should be replaced with `Irmin_mirage_git`. (mirage/irmin#686, @zshipko) - **irmin**: - Update to use dune (mirage/irmin#534, @samoht) and opam 2.0. (mirage/irmin#583, @samoht) - Replace `Irmin.Contents.S0` with `Irmin.Type.S`. - Rename `Type.pre_digest` -> `Type.pre_hash` and `Type.hash` -> `Type.short_hash`. (mirage/irmin#720, @samoht) - Change `Irmin.Type` to use _incremental_ hash functions (functions of type `'a -> (string -> unit) -> unit`) for performance reasons. (mirage/irmin#751, @samoht) - Simplify the `Irmin.Type.like` constructor and add a new `Irmin.Type.map` with the previous behaviour. - Improvements to `Irmin.Type` combinators. (mirage/irmin#550 mirage/irmin#538 mirage/irmin#652 mirage/irmin#653 mirage/irmin#655 mirage/irmin#656 mirage/irmin#688, @samoht) - Modify `Store.set` to return a result type and create a new `Store.set_exn` with the previous exception-raising behaviour. (mirage/irmin#572, @samoht) - Rename store module types to be more descriptive: - replace `Irmin.AO` with `Irmin.CONTENT_ADDRESSABLE_STORE`; - replace `Irmin.AO_MAKER` with `Irmin.CONTENT_ADDRESSABLE_STORE_MAKER`; - replace `Irmin.RW` with `Irmin.ATOMIC_WRITE_STORE`; - replace `Irmin.RW_MAKER` with `Irmin.ATOMIC_WRITE_STORE_MAKER`. (mirage/irmin#601, @samoht) - Rename `export_tree` to `save_tree` (mirage/irmin#689, @samoht) and add an option to conditionally clear the tree cache (mirage/irmin#702 mirage/irmin#725, @samoht). - Change hash function for `Irmin_{fs,mem,unix}.KV` to BLAKE2b rather than SHA1 for security reasons. (mirage/irmin#811, @craigfe) - Move `Irmin.remote_uri` to `Store.remote`, for stores that support remote operations. (mirage/irmin#552, @samoht) - Simplify the error cases of fetch/pull/push operations. (mirage/irmin#684, @zshipko) - A `batch` function has been added to the backend definition to allow for better control over how groups of operations are processed. (mirage/irmin#609, @samoht) - A `close` function has been added to allow backends to close any held resources (e.g. file descriptors for the `FS` backend). (mirage/irmin#845, @samoht) - Simplify `Private.Node.Make` parameters to use a simpler notion of 'path' in terms of a list of steps. (mirage/irmin#645, @samoht) - Rename `Node.update` to `Node.add`. (mirage/irmin#713, @samoht) #### Fixed - **irmin-unix**: - Fix parsing of commit hashes in `revert` command. (mirage/irmin#496, @zshipko) - **irmin-git**: - Fix `Node.add` to preserve sharing. (mirage/irmin#802, @samoht) - **irmin-http**: - Respond with a 404 if a non-existent resource is requested. (mirage/irmin#706, @samoht) - **irmin**: - Fix a bug whereby `S.History.is_empty` would return `true` for a store with exactly one commit. (mirage/irmin#865, @pascutto) #### Removed - **irmin**: - Remove `pp` and `of_string` functions from `Irmin.Contents.S` in favour of `Irmin.Type.to_string` and `Irmin.Type.of_string`. - Remove `Bytes` content type. (mirage/irmin#708, @samoht) - Remove `Cstruct` dependency and content type. If possible, switch to `Irmin.Contents.String` or else use `Irmin.Type.map` to wrap the Cstruct type. (mirage/irmin#544, @samoht)
This is my current implementation of a JSON datatype -- it is roughly modeled after a document store, like CouchDB or MongoDB, so it restricts top-level values to objects only.
I think it is a nice addition because even though the
String
type can already be used to store JSON contents, this allows for the JSON encoding to be validated when using the command line tool.I'm not totally sure if it's ready to be merged but wanted to at least open this up to discussion in order to get some feedback before I continue.