Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON contents #516

Merged
merged 19 commits into from
Jul 6, 2018
Merged

JSON contents #516

merged 19 commits into from
Jul 6, 2018

Conversation

zshipko
Copy link
Contributor

@zshipko zshipko commented Jun 12, 2018

This is my current implementation of a JSON datatype -- it is roughly modeled after a document store, like CouchDB or MongoDB, so it restricts top-level values to objects only.

I think it is a nice addition because even though the String type can already be used to store JSON contents, this allows for the JSON encoding to be validated when using the command line tool.

I'm not totally sure if it's ready to be merged but wanted to at least open this up to discussion in order to get some feedback before I continue.

Copy link
Member

@samoht samoht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a great start to me! I've left a few comments -- mainly related to the merge function.

I am fine with the design, having a top-level dictionary makes sense.

let open Merge.Infix in
old () >>=* function
| Some j ->
if j = t1 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you want to use = here or a custom equality on JSON objects.

old () >>=* function
| Some j ->
if j = t1 then
merge_object t1 t2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why merge here? you probably just want t2 (as t1 is in the past of t2)

if j = t1 then
merge_object t1 t2
else if j = t2 then
merge_object t2 t1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: t2 is in the past of t1 you probably just want to return t1.


let t = Type.(list (pair string json))

let merge_object a b =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will have to define a proper 3-way merge function for JSON objects: you need to detect if a value has been updated or if it's a real conflict.

merge_object t2 t1
else
merge_object j t1 >>=* fun t1 ->
merge_object t1 t2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you probably want merge_object j t1 t2: deciding if a value has been updated in either t1 or t2 can be done by checking if the association from j has changed. A conflict should be raised only if such a value has been updated in both branches.

let of_string s =
let decoder = Jsonm.decoder (`String s) in
Type.decode_json t decoder
let merge = Merge.(option (idempotent t))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to fully understand the purpose of Make, but I guess you probably want to re-use the merge function defined above here instead of using the default one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make was just added as a clear way to make a JSON value from an exsiting type, rather than the user having to combine all those pieces manually. It is not entirely necessary, but I found it helpful when testing these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite it being convenient for my testing, I'm not sure that it will work because the merge function would have to be rewritten for this. I am going to remove it, but add it as an example to one of the tutorials.

@samoht
Copy link
Member

samoht commented Jun 13, 2018

Another random idea: I am not sure how to do this yet, but it would be great if the user could choose to either flatten the JSON object into a single file or to "project" it into a subtree. One interesting consequence of this is that merge should work the same way for both structures so this could help designing the merge function for JSON object: for instance nodes.ml is using Merge.alist so the JSON merge function should somehow do it too I guess :-)

@zshipko
Copy link
Contributor Author

zshipko commented Jun 13, 2018

That does sound pretty interesting! After I make the changes from your notes I will spend a little time seeing how feasible that is.

@zshipko
Copy link
Contributor Author

zshipko commented Jun 22, 2018

I've got a super rough implementation of JSON objects projected onto a tree, it still needs some work before I’d be satisfied with it. I'm wondering where the best place to put it is, because right now it feels a little weird with the module argument. Is there somewhere that makes more sense? Any other thoughts on this? It is definitely an interesting feature and I think it should be included in Irmin if we can get the API right.

I'm also curious if I can generalize it to work with any list of pairs. Do you think that is worth exploring, or should we just stick to JSON?

@zshipko
Copy link
Contributor Author

zshipko commented Jun 23, 2018

Okay -- after some sleep, and further work I have cleaned it up quite a bit. However, I'm still wondering if there's a better way to do this that I'm missing.

Also, I will just be focusing on the JSON based projection rather than trying to generalize it. Maybe I will get around to that in the future if there is interest.

else if equal old b then Merge.ok a
else merge3 old a b
| None ->
merge2 a b
Copy link
Member

@samoht samoht Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proper way to write these merge functions is to define mutually recursive functions which match the shape of the json type: for instance, you want the recursive merge function to be applied on array and dictionaries members too. I think the result should be very similar to what it is done in Tree.merge and Tree.Node.merge but with a different (and somehow a bit simpler, as their no indirection) datatype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for the tip. I will stop trying to make things too simple!

| Error _ -> Lwt.return tree) d tree
| v -> S.Tree.add tree key v

let get_tree (type n) (module S: STORE with type node = n) (tree: S.tree) key =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha yes I see, having a first-class store is not very nice here. I am wondering if providing Tree.concrete would just be enough for now.

Copy link
Contributor Author

@zshipko zshipko Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having some issues with Tree.to_concrete. It seems to only return the deepest content node of the tree. Is this expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a bug!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will try to track that down as well.

@samoht
Copy link
Member

samoht commented Jun 25, 2018

Also, I will just be focusing on the JSON based projection rather than trying to generalize it. Maybe I will get around to that in the future if there is interest.

Yes, don't worry too much about the generalisation.

@zshipko
Copy link
Contributor Author

zshipko commented Jul 4, 2018

@samoht

I've significantly refactored the merge operation and am a lot happier with it now. I didn't remove Json_tree.get or Json_tree.set because even though they're kind of ugly I've found them to be very conventient to avoid having to convert between concrete trees all the time.

Let me know what you think!

Copy link
Member

@samoht samoht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the JSON merge functions looks very good now! I'm happy to merge this one.

The JSON_tree is looking great too. I think we are very close to something nice now. I would propose this:

module Json_tree(S: S with type contents = json) = struct
   ..
end

e.g. take a full json store as argument. Then your type tree is simple S.Tree.concrete. And it is indeed fine to redefine the set* and get* functions there.

| `Contents of json * M.t
]

let rec to_concrete_tree (j: json): tree =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is not tail recursive, this could be a problem on large JSON docs (the ones that you want to project to a tree).

`Tree x
| _ -> `Contents (j, M.default)

let rec of_concrete_tree c : json =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@samoht
Copy link
Member

samoht commented Jul 5, 2018

Actually, can you just keep your changes in that PR for the Json and Json_value store here (so I can merge the PR right away) and open an new PR for the Json_tree stuff? I have a few others ideas with the JSON tree that I would like to discuss/check before merging these bits. Many thanks!!

@zshipko
Copy link
Contributor Author

zshipko commented Jul 5, 2018

Sure, so should I create a new PR with only the Json and Json_value stuff and split all the Json_tree updates into another? Or future Json_tree updates should go in a new PR?

@samoht
Copy link
Member

samoht commented Jul 6, 2018

Actually your new changes look great! I'll squash and merge as-is, no need for a new PR.

@samoht samoht merged commit 0593e2b into mirage:master Jul 6, 2018
@zshipko zshipko deleted the json-datatype branch October 26, 2018 00:06
samoht added a commit to samoht/opam-repository that referenced this pull request Nov 20, 2019
…t, irmin-chunk, irmin-pack, irmin-mirage-graphql, irmin-mirage, irmin-graphql, irmin-http, irmin-git and irmin-test (2.0.0)

CHANGES:

#### Added

- **irmin-pack** (_new_):
  - Created a new Irmin backend, `irmin-pack`, which uses a space-optimised
    on-disk format.

- **irmin-graphql** (_new_):
  - Created a new package, `irmin-graphql`, which provides a GraphQL server
    implementation that can be used with both the MirageOS and Unix backends.
    Additionally, a `graphql` command has been added to the command-line
    interface for starting `irmin-graphql` servers. (mirage/irmin#558, @andreas, @zshipko)

  - Contents can now be queried directly using `irmin-graphql` with
    `Irmin_graphql.Server.Make_ext` and the `Irmin_graphql.Server.PRESENTER`
    interface. (mirage/irmin#643, @andreas)

- **irmin-test** (_new_):
  - Added a new package, `irmin-test`, which allows for packages to access the
    Irmin test-suite. This package can now be used for new packages that
    implement custom backends to test their implementations against the same
    tests that the core backends are tested against. (mirage/irmin#508, @zshipko)

- **irmin-unix**:
  - Add `Cli` module to expose some methods to simplify building command-line
    interfaces using Irmin. (mirage/irmin#517, @zshipko)

  - Add global config file `$HOME/.irmin/config.yml` which may be overridden by
    either `$PWD/.irmin.yml` or by passing `--config <PATH>`. See `irmin help
    irmin.yml` for details. (mirage/irmin#513, @zshipko)

- **irmin-git**:
  - Allow import/export of Git repositories using Irmin slices. (mirage/irmin#561, @samoht)

- **irmin-http**:
  - Expose a `/trees/merge` route for server-side merge operations. (mirage/irmin#714,
    @samoht)

- **irmin**:
  - Add `Json_value` and `Json` content types. (mirage/irmin#516 mirage/irmin#694, @zshipko)

  - Add optional seed parameter to the `Irmin.Type` generic hash functions.
    (mirage/irmin#712, @samoht)

  - Add `V1` submodules in `Commit`, `Contents` and `Hash` to provide
    compatibility with 1.x serialisation formats. (mirage/irmin#644 mirage/irmin#666, @samoht)

  - Add `Store.last_modified` function, which provides a list of commits where
    the given key was modified last. (mirage/irmin#617, @pascutto)

  - Add a `Content_addressable.unsafe_add` function allowing the key of the new
    value to be specified explicitly (for performance reasons). (mirage/irmin#783, @samoht)

  - Add `save_contents` function for saving contents to the database. (mirage/irmin#689,
    @samoht)

  - Add pretty-printers for the results of Sync operations. (mirage/irmin#789, @craigfe)

  - `Private.Lock` now exposes a `stats` function returning the number of held
    locks. (mirage/irmin#704, @samoht)

#### Changed

- **irmin-unix**:
  - Rename `irmin read` to `irmin get` and `irmin write` to `irmin set`. (mirage/irmin#501,
  @zshipko)

  - Switch from custom configuration format to YAML. (mirage/irmin#504, @zshipko)

- **irmin-git**:
  - Require `ocaml-git >= 2.0`. (mirage/irmin#545, @samoht)

  - Cleanup handling of remote stores. (mirage/irmin#552, @samoht)

- **irmin-http**:
  - Rename `CLIENT` to `HTTP_CLIENT` and simplify the signatures necessary to
    construct HTTP clients and servers. (mirage/irmin#701, @samoht)

- **irmin-mirage**
  - Split `irmin-mirage` into `irmin-{mirage,mirage-git,mirage-graphql}` to
    allow for more granular dependency selection. Any instances of
    `Irmin_mirage.Git` should be replaced with `Irmin_mirage_git`. (mirage/irmin#686,
    @zshipko)

- **irmin**:
  - Update to use dune (mirage/irmin#534, @samoht) and opam 2.0. (mirage/irmin#583, @samoht)

  - Replace `Irmin.Contents.S0` with `Irmin.Type.S`.

  - Rename `Type.pre_digest` -> `Type.pre_hash` and `Type.hash` ->
    `Type.short_hash`. (mirage/irmin#720, @samoht)

  - Change `Irmin.Type` to use _incremental_ hash functions (functions of type
    `'a -> (string -> unit) -> unit`) for performance reasons. (mirage/irmin#751, @samoht)

  - Simplify the `Irmin.Type.like` constructor and add a new `Irmin.Type.map`
    with the previous behaviour.

  - Improvements to `Irmin.Type` combinators. (mirage/irmin#550 mirage/irmin#538 mirage/irmin#652 mirage/irmin#653 mirage/irmin#655 mirage/irmin#656
    mirage/irmin#688, @samoht)

  - Modify `Store.set` to return a result type and create a new `Store.set_exn`
    with the previous exception-raising behaviour. (mirage/irmin#572, @samoht)

  - Rename store module types to be more descriptive:
     - replace `Irmin.AO` with `Irmin.CONTENT_ADDRESSABLE_STORE`;
     - replace `Irmin.AO_MAKER` with `Irmin.CONTENT_ADDRESSABLE_STORE_MAKER`;
     - replace `Irmin.RW` with `Irmin.ATOMIC_WRITE_STORE`;
     - replace `Irmin.RW_MAKER` with `Irmin.ATOMIC_WRITE_STORE_MAKER`. (mirage/irmin#601,
       @samoht)

  - Rename `export_tree` to `save_tree` (mirage/irmin#689, @samoht) and add an option to
    conditionally clear the tree cache (mirage/irmin#702 mirage/irmin#725, @samoht).

  - Change hash function for `Irmin_{fs,mem,unix}.KV` to BLAKE2b rather than
    SHA1 for security reasons. (mirage/irmin#811, @craigfe)

  - Move `Irmin.remote_uri` to `Store.remote`, for stores that support remote
    operations. (mirage/irmin#552, @samoht)

  - Simplify the error cases of fetch/pull/push operations. (mirage/irmin#684, @zshipko)

  - A `batch` function has been added to the backend definition to allow for
    better control over how groups of operations are processed. (mirage/irmin#609, @samoht)

  - A `close` function has been added to allow backends to close any held
    resources (e.g. file descriptors for the `FS` backend). (mirage/irmin#845, @samoht)

  - Simplify `Private.Node.Make` parameters to use a simpler notion of 'path' in
    terms of a list of steps. (mirage/irmin#645, @samoht)

  - Rename `Node.update` to `Node.add`. (mirage/irmin#713, @samoht)

#### Fixed

- **irmin-unix**:
   - Fix parsing of commit hashes in `revert` command. (mirage/irmin#496, @zshipko)

- **irmin-git**:
  - Fix `Node.add` to preserve sharing. (mirage/irmin#802, @samoht)

- **irmin-http**:
  - Respond with a 404 if a non-existent resource is requested. (mirage/irmin#706, @samoht)

- **irmin**:
  - Fix a bug whereby `S.History.is_empty` would return `true` for a store with
    exactly one commit. (mirage/irmin#865, @pascutto)

#### Removed

- **irmin**:
  - Remove `pp` and `of_string` functions from `Irmin.Contents.S` in favour of
    `Irmin.Type.to_string` and `Irmin.Type.of_string`.

  - Remove `Bytes` content type. (mirage/irmin#708, @samoht)

  - Remove `Cstruct` dependency and content type. If possible, switch to
    `Irmin.Contents.String` or else use `Irmin.Type.map` to wrap the Cstruct
    type. (mirage/irmin#544, @samoht)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants