Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gitlab_git #9

Closed
eugenk opened this issue Sep 26, 2016 · 9 comments · Fixed by #102
Closed

Use gitlab_git #9

eugenk opened this issue Sep 26, 2016 · 9 comments · Fixed by #102
Assignees

Comments

@eugenk
Copy link
Member

eugenk commented Sep 26, 2016

Use gitlab_git to

  • create a git repository
  • commit new files to master
  • commit changed files to master
  • commit deleted files to master

This shall be done in service objects that use the models File and Commit (ontohub/ontohub-models#13).

@eugenk eugenk added this to the Minimal git integration milestone Sep 26, 2016
@phyrog
Copy link
Contributor

phyrog commented Feb 2, 2017

It looks like gitlab_git has been absorbed into gitlab and is no longer maintained[1] as a separate gem. I guess that means we should fall back to rugged directly.

@eugenk
Copy link
Member Author

eugenk commented Feb 14, 2017

I "reviewed" the code of gitlab_git and found that it supports everything we need except for two features:

  1. git clone /git svn clone / git svn pull
  2. On commit of a file: Check if the HEAD has changed and return with a merge conflict in this case.

While the first missing feature can be easily implemented using the git commandline client (which is a dependency anyway), the second one should be implemented with a decorator of gitlab_git. It can be wrapped around the commit method.


There is the problem of maintenance, though: I pulled the changes from gitlab-ce into a fork of gitlab_git and adjusted them to the gem. There were a few lines in the code that depended on gitlab-ce, but they could easily be rewritten such that gitlab_git is self-contained (see the branch update_to_gitlab_edb8ed36 on https://github.com/ontohub/gitlab_git).

It took me about an hour to adjust the code. It might get more difficult in the future with a tighter integration of gitlab_git into gitlab-ce, but I think it's still easier to pull in the changes on every gitlab-ce release than to maintain an own git library.

@eugenk eugenk self-assigned this Feb 14, 2017
@eugenk
Copy link
Member Author

eugenk commented Feb 20, 2017

The routes for this shall look similar to

/:user/:repository[/rev/:revision]/tree/:path_to_the_directory   GET
/:user/:repository[/rev/:revision]/tree/:path_to_the_file   GET
/:user/:repository[/rev/:branch]/tree                    POST (file, mkdir)
/:user/:repository[/rev/:branch]/tree/:path_to_the_file  PATCH
/:user/:repository[/rev/:branch]/tree/:path_to_the_file  DELETE

------------- BEGIN NEW SUGGESTION -------------

/:user/:repository[/rev/:revision]/document/:library//oms # index
/:user/:repository[/rev/:revision]/document/:library//oms//:oms # show
/:user/:repository[/rev/:revision]/document/:library.dol//oms//:oms # show
/:user/:repository[/rev/:revision]/document/:native_document//oms # show

/:user/:repository[/rev/:revision]/document/:library//oms//:oms//symbols # index
/:user/:repository[/rev/:revision]/document/:library//oms//:oms//symbols//:symbol # show
/:user/:repository[/rev/:revision]/document/:oms//symbol//:symbol # show

/:user/:repository[/rev/:revision]/document/:library//oms//:oms//sentences # index
/:user/:repository[/rev/:revision]/document/:library//oms//:oms//sentences//:sentence # show
/:user/:repository[/rev/:revision]/document/:oms//oms//sentences//:sentence # show

/:user/:repository[/rev/:revision]/document/:library//mappings # index
/:user/:repository[/rev/:revision]/document/:library//mappings//:mapping # show

-------------- END NEW SUGGESTION --------------
------------- BEGIN OLD SUGGESTION -------------

/:user/:repository[/rev/:revision]/oms/:path_to_the_library   GET index
/:user/:repository[/rev/:revision]/oms/:path_to_the_library.dol   GET show
/:user/:repository[/rev/:revision]/oms/:path_to_the_library//:oms  GET show
/:user/:repository[/rev/:revision]/oms/:path_to_the_library.dol//:oms  GET show
/:user/:repository[/rev/:revision]/oms/:path_to_the_oms.owl  GET show

/:user/:repository[/rev/:revision]/symbols/:path_to_the_library//:oms  GET index
/:user/:repository[/rev/:revision]/symbols/:path_to_the_library//:oms//:symbol  GET show
/:user/:repository[/rev/:revision]/symbols/:path_to_the_oms//:symbol  GET show

/:user/:repository[/rev/:revision]/sentences/:path_to_the_library//:oms  GET index
/:user/:repository[/rev/:revision]/sentences/:path_to_the_library//:oms//:sentence  GET show
/:user/:repository[/rev/:revision]/sentences/:path_to_the_oms//:sentence  GET show

/:user/:repository[/rev/:revision]/mappings/:path_to_the_library  GET index
/:user/:repository[/rev/:revision]/mappings/:path_to_the_library//:mapping  GET show

-------------- END OLD SUGGESTION --------------

/:user/:repository/branches   GET index
/:user/:repository/branches   POST create
/:user/:repository/branches/:branch   GET show (identical with /:user/:repository/rev/:branch/commits)
/:user/:repository/branches/:branch   DELETE destroy

/:user/:repository/refs   GET index # do we really need this?

/:user/:repository/tags   GET index
/:user/:repository/tags   POST create
/:user/:repository/tags/:tag   GET show (with additional information like release notes)
/:user/:repository/tags/:tag   DELETE destroy

/:user/:repository[/rev/:revision]/commit   GET show (the commit)
/:user/:repository[/rev/:revision]/commits/:path_to_the_directory   GET
/:user/:repository[/rev/:revision]/commits/:path_to_the_file   GET

/:user/:repository/diff/:revision   GET
/:user/:repository/diff/:revision_from..:revision_to   GET

@phyrog
Copy link
Contributor

phyrog commented Mar 5, 2017

We might want to start thinking about how we want to transfer files from the client to the server. We have three options:

  1. a normal multipart file upload, that does not use the JSON API (or at least uses a different MIME type)
  2. base64 encode files and embed them in a JSON payload (which increases the size by a factor of about 1.3)
  3. ignore binary files and embed the files content as a string in the JSON payload

IMO they are all not ideal, but a combination of 2 and 3 could work with a field that indicates if we're transferring a binary file that needs to be decoded on the server, or if the content is plain text. That way we only have the increased file size for binary files.

@eugenk
Copy link
Member Author

eugenk commented Mar 5, 2017

I agree with you. I think a multipart file upload should be avoided in order to stay consistent in the API. Encoding plain text, which is our primary resource type, would be too much. However, we need to encode binary files into plain text, so a hybrid approach seems to be the best.

@phyrog
Copy link
Contributor

phyrog commented Mar 6, 2017

So, as it turns out, it's not as easy as I hoped. Distinguishing binary data from utf-8 encoded strings is pretty much impossible (unless a BOM has been set, which is not needed). Also it would not actually help anything, since the characters that need to be escaped for JSON are normal ASCII characters and control sequences.

So our options are actually:

  • write something that escapes all forbidden characters from the JSON standard in a string (it would not matter if it is a binary file, as long as all forbidden byte sequences are escaped)
  • just use base64 for all files

For now I'am actually favouring the base64 option (it's good enough for now), but implementing it in a way that we can later replace the encoding easily.

For example, we could let the client send a field encoding with content base64 for base64 encoded data, or other values for encodings we later decide to support (e.g. the escaping option).

@tillmo
Copy link
Member

tillmo commented Mar 7, 2017

I agree. The factor of 1.3 is not a big problem.

@phyrog
Copy link
Contributor

phyrog commented Mar 17, 2017

Relevant for server-to-client communication: http://davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ We probably don't want to compress the JSON payload on the client side before sending it to the server, but the other direction should be gzipped to reduce the base64 overhead (when the client accepts it and the payload is bigger than x).

@eugenk
Copy link
Member Author

eugenk commented Mar 18, 2017

Two alternative thoughts on this:

  • We could configure the Apache web server to gzip every JSON response with AddOutputFilterByType DEFLATE application/json
  • We could provide a link to the binary resource in the response.

I think, gzipping some content depending on factors like file size is not a good idea at this point in the development of Ontohub. We should postpone this to when the application is usable because it's more of an optimisation problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants