Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Overhaul REST API #964

Closed
nagmat84 opened this issue Apr 10, 2021 · 13 comments · Fixed by #1217
Closed

[Enhancement] Overhaul REST API #964

nagmat84 opened this issue Apr 10, 2021 · 13 comments · Fixed by #1217
Assignees
Labels
enhancement New feature or request

Comments

@nagmat84
Copy link
Collaborator

nagmat84 commented Apr 10, 2021

I am working on an Android client for Lychee. The current state of the REST API drives me nuts. The way the API is right now make it rather difficult to write a client with the Java libraries GSON+Retrofit2, cache the responses in local SQL database with Room and drive the client GUI from the local cache during offline scenarios.

Maybe, the API is the way it is due to historic reasons and because nobody had the time or an reason to overhaul the API. But maybe there are also good reasons why the API must be the way it is and which a I am not aware of. In the first case, I might try and spend some time to improve the API on-the-go while I am working an the Android client. In the second case, I would like to be enlighten why the API has to be the way it is.

If I had to categorize my findings, I would come up with the following categories

  1. Varying actual type of attributes for the same formal type in JSON responses
  2. Conversion to human-readable strings on the server side instead on the client side including partial but inconsistent localization
  3. Inconsistent naming of attributes

The categories are explained in detail in the addendum. If you think it might be doable to overhaul the API with manageable effort and if there is an option how I could help out, let me know.


Addendum

I. -- Mismatch of formal types and actual types in JSON objects / Prevalence of strings

Most of the time the server responds with attributes encoded as strings although the should actually be of a different type. I guess this might be a problem with PHP on the server-side as PHP tends to convert everything into a string if one does not take care. More specifically:

  • Boolean: Booleans in responses are reported as actual JSON boolean, i.e. { visbible : true } and { visbible : false }, but also as JSON numbers 0 and 1 and as literal strings "0", "1", "true", "false".

  • Numbers: Numbers in responses are reported as actual JSON numbers, i.e. { id : 42 }, but also as literal strings "42". In particular this affect most of the "ID"-like attributes, i.e. photoID, albumID ..., counters for arrays and dimensional attributes (width and height of an image)

  • Arrays of objects: If the array is non-emtpy the response looks like { photos : [ ... ] } and everything is fine. But empty arrays are correctly reposted as [] but also as null and even worse as false. In particular the latter is a problem. While a Java list object can perfectly be null, one cannot assign false to it.

All this might be no problem, if the attributes are only used to be displayed to a user via an HTML page. However, it makes it rather difficult to deserialize the JSON object into a proper Java object with GSON+Retrofit2. In particular, if an attribute is expected to be an array of other objects, the GSON libraray expects to find the tokens [...] which will be deserialized into a Java-List<>. But an empty array must never be false. This seems to be typical PHP quirk.

II. -- Human-readable string conversion on the server side

Many attributes are already converted into a human-readable form on the server side. This mostly affect timestamps (e.g. the attribute takestamp and friends). Two particular ugly findings:

  • Precision varies with the context: for an album which includes a multitude of photos, the maximal and minimal timestamps are reported up to a monthly precision, for an album which only includes some photos from the same month, the timestamps are reported with daily precision. A timestamp of a photo entity is reported with hourly precision, if this photo entity is part of an array of photos in an album (i.e. as part of the response from /api/Album::get), but the timestamp of the very same photo entity the timestamp is reported up to a precision with seconds.
  • Localization depends on server-settings and is inconsistent: The formatted timestamp depends on the localization settings of the PHP configuration. However, not all timestamps are returned localized, some are always returned in the C-locale.

Both aspects make it nearly impossible to deserialize the response on the Android client side and store the response in a local database cache, because it needs a lot of clever guessing what time format is used and then convert this time into a true Java time object or SQL timestamp. Moreover, if the same entity is received from different responses and with different timestamps it is not clear which one is "more" correct.

IMHO, localization should only take place on the client side on the GUI level right before the timestamp is going to be displayed, but should not happen on the server backend. I believe it would be best to report all times as a JSON number (Java/Javascript long) in seconds from epoch or another suitable unambiguous format. A JSON-like ISO 8601 string would also be fine. Even Javascript in a browser support to convert those timestamps into a human-readable format and will also honor the browsers locale settings.

The same problem also affects other attributes:

  • File sizes are reported as strings formatted as "<localized quantity> <unit>" with quantity being a possibly fractional number and the unit something like "kB", "MB" etc, i.e "12.5 kB". With a different server-side locale, the same size could also be reported as "12,5kB" (comma instead of decimal point, no between space between value and unit). IMHO, one should use a fixed unit, e.g. bytes, and only report the value as an integer. Conversion into a more human-friendly format should be left to the GUI layer.
  • Shutter speed is reported as a string with a true fraction, i.e. "1/50 s", or as a decimal fraction, i.e "0.02 s", or with a different unit, i.e. "20ms". IMHO, one should use a fixed base unit, e.g. µs, and report the shutter speed as an integer multitude of that base unit
  • and so on

III. -- Inconsistent naming of attributes

This is the least problem and probably the easiest to fix. Some attributes are named inconsistently over the API or the same name has different meaning. Of course, this is nitpicking but is makes it harder to implement a client fluently. For example, lets take the attribute size. When it comes to images this meaning is ambiguous: size of file on disk or geometrical size of the photo? Unfortunately, the API uses the name for both depending on the context. IMHO, the wording should at least be consistent across an API and unambiguous at the optimum. In this particular case: filesize and dimension.

My apologies for this very long post. But I wanted to point out all aspects at once.

@ildyria
Copy link
Member

ildyria commented Apr 11, 2021

My apologies for this very long post. But I wanted to point out all aspects at once.

No worries. I do agree that an overhaul of the API is needed. More precisely, changing some the return values, especially on error.

With regard to the ID. This is a legacy thing and I do agree it is not intuitive at all. The reason why ID were considered as strings, was due to limitations of size with regard to 32 bit architecture and large numbers...

I am currently considering going for ID as strings client side, number on server side. Something like that is considered: https://github.com/vinkla/hashids 🤔

The reason is that it is "possible" to parse all the endpoint and find the public hidden folders. Which I would like to fix by adding stronger randomness on the ID sent. I don't care how it is stored in the database.

Never the less. If someone wants to make a PR with the respective API change server side (here) and client side (Lychee-front), we will for sure have a look. 👍

The reason why this is not on my priority list is because I am planning to rewrite the full front-end with Livewire. As a result, the current front-end API could be seen as "depreciated". However I don't mind keeping the current REST entry points, actually it would be an idea to add a version number in the REST API access point. I would say, we leave this as an exercise for the reader. 😉

@ildyria ildyria added the enhancement New feature or request label Apr 11, 2021
@nagmat84
Copy link
Collaborator Author

Never the less. If someone wants to make a PR with the respective API change server side (here) and client side (Lychee-front), we will for sure have a look. +1

OK, that's great to hear. Just wanted to know, if you are open to that before I really consider to start it. Probably, I going to start with some minor corrections which should be easily to fix like issue #962 but would help a lot with respect to Retrofit2+GSON.

The reason why this is not on my priority list is because I am planning to rewrite the full front-end with Livewire. As a result, the current front-end API could be seen as "depreciated". However I don't mind keeping the current REST entry points, actually it would be an idea to add a version number in the REST API access point. I would say, we leave this as an exercise for the reader. wink

I already thought about that, too, in particular with respect to the problem with the timestamps. If I would start to correct that in the current API, e.g. changing all timestamps from locale strings to something more machine-readable, I probably had to change a lot in the Lychee frontend, too. If this is going to be rewritten sooner or later, then this would mean much work for nothing.
So, I already though to correct some of the minor errors first in both the API backend and the frontend (see above), and then branch from that, create a new API, e.g. like /api2/..., which mostly follows the old API but with better types. Thus the old API stays intact.

With regard to the ID. This is a legacy thing and I do agree it is not intuitive at all. The reason why ID were considered as strings, was due to limitations of size with regard to 32 bit architecture and large numbers...

OK, so I think with modern architectures, int64 should be no problems.

I am currently considering going for ID as strings client side, number on server side. Something like that is considered: https://github.com/vinkla/hashids thinking

The reason is that it is "possible" to parse all the endpoint and find the public hidden folders. Which I would like to fix by adding stronger randomness on the ID sent. I don't care how it is stored in the database.

I do not get what you are talking about, probably, because I do not know what public, but hidden folders are supposed to be. However, from a cryptographers point of view your approach sounds fishy, though. If you want people not to retrieve something, then you should not rely on people inability to guess an ID. That sounds like security by obscurity. I believe there should be better approaches to that. Maybe I can help out. However, I first need to understand what you aim at, i.e. what this public-but-hidden-thingy is. But we should take that discussion offline and not litter this tread with that.

@ildyria
Copy link
Member

ildyria commented Apr 11, 2021

OK, so I think with modern architectures, int64 should be no problems.

About that... We do have some user on some old Raspberry Pi or still using PHP 32-bit. We do advise in the Diagnostic to upgrade for 64-bit arch.

I do not get what you are talking about, probably, because I do not know what public, but hidden folders are supposed to be. However, from a cryptographers point of view your approach sounds fishy, though. If you want people not to retrieve something, then you should not rely on people inability to guess an ID.

I'm actually a Cryptographer so I know extremely well what you are talking about. :) I just did not bother updating the original code since LycheeOrg took the code from Lychee v3...

For the sake of simplicity, let's say that there are 3 kinds of albums:

  • public: directly accessible AND visible on the gallery
  • hidden: directly accessible via link but NOT visible on the gallery.
  • private: directly accessible if you are actually logged in (we do have ownership security check etc).

The process to access the data of an album is the same in 3 cases: an api/Album::get with the desired ID.
The current problem is that this ID is actually the timestamp of the date of creation of the album. This is thus easily guessable.

From there I can see two "solutions":

  1. we randomize the ID, which would need a rewrite of some of the add functions, nothing too complex, but I still fear that they would be guessable without any sort of secret input as the full code of Lychee is open-source. This could for example make use of a hash function and a secret input (admin defined) and convert back the number into a 64bit integer...

  2. we add an encryption (and decryption) function that convert IDs into a Youtube-like string, the key for such encryption is naturally kept secret and admin defined, guaranteeing the non-predictability of the result.

Why has this not be done before? Well lack of time, more pressing matters at hand...

This is not security by obscurity as we reveal the full source of Lychee as per the 2nd Kerckhoffs's principle. 😉

@nagmat84
Copy link
Collaborator Author

nagmat84 commented Apr 11, 2021

OK, so I think with modern architectures, int64 should be no problems.

About that... We do have some user on some old Raspberry Pi or still using PHP 32-bit. We do advise in the Diagnostic to upgrade for 64-bit arch.

I get your point. In this case I have changed my opinion and I would recommend to stay with strings for IDs, especially if they become longer in the future (like a hash) and may not fit into 64bit neither.

In that case /api/Album::setPublic and friends should be fixed, because those API methods expect an integer as ID. I wonder why this hasn't caused any problems given the use case you mentioned above. A 32bit architecture should not be able to make that API call, right?

IDs as strings don't cause any trouble for Retrofit+GSON as long as their type is consistent across the whole API.

For the same reason (32bit architectures, really?), I change my opinion with respect to the type for timestamps, too. First, I thought seconds since epoch, would be best, but this requires at least a 64bit counter. Maybe, it is better to encode them as a ISO 8601 strings in order to support legacy architectures.

I will look into all that, I give it a try next or the next-but-one weekend.

  • hidden: directly accessible via link but NOT visible on the gallery.

I got it. In other words, it is equivalent to a sort of password-protected album without the need to have an actual user account for Lychee and the secret "password" is the ID of the album itself. This frees users from typing in the password as the password is transported as part of the URL. And the URL is protected by TLS.

This is not security by obscurity as we reveal the full source of Lychee as per the 2nd Kerckhoffs's principle. 😉

With obscurity, I referred to the confidentiality of the album's ID not the source code. 😉 But sure, you are right. If you consider the ID of the album as the shared secret between the creator of the album and the watchers, then everything is just fine.

I'm actually a Cryptographer

Really? Cool, me too.

@kamil4
Copy link
Contributor

kamil4 commented Apr 14, 2021

In that case /api/Album::setPublic and friends should be fixed, because those API methods expect an integer as ID. I wonder why this hasn't caused any problems given the use case you mentioned above. A 32bit architecture should not be able to make that API call, right?

In Lychee v4 we are forced to use 32-bit IDs on 32-bit architectures. 64-bit IDs simply don't seem to work with Laravel on 32-bit archs (very annoying, as you can imagine). So the outcome is that either strings or ints currently work for IDs.

In previous Lychee version (v3), which didn't use Laravel, 64-bit IDs were used also on 32-bit architectures, so there IDs had to be strings.

We currently prefer strings for IDs, and we have been fixing the instances of ints in the API.

I also agree with you with respect to partial localization of strings. I documented some of the issues around that in #829. The major underlying issue here is that we store them in this butchered form in the database...

@ildyria
Copy link
Member

ildyria commented Apr 17, 2021

I get your point. In this case I have changed my opinion and I would recommend to stay with strings for IDs, especially if they become longer in the future (like a hash) and may not fit into 64bit neither.

Personally I would stay as ID as big int in the database, because this gives us access to relationships in Laravel. However, outputing id as strings would be something I am fine with.

@nagmat84
Copy link
Collaborator Author

Personally I would stay as ID as big int in the database, because this gives us access to relationships in Laravel. However, outputing id as strings would be something I am fine with.

I'm fine with that. I would not change the DB neither. IMHO, it should be an integer (64bit) everywhere. That's what I proposed in the first place

But then you told me that there are still 32bit architectures which are forced to use strings, because they cannot put 64bit into their native integer. So, I assumed we were talking about using 64bit integers in the database backend (MySQL, PostgreSQL, sqlite), but strings on the PHP layer. The former is possible, because the databases support big integers even on 32bit architectures, but it is required to use strings within PHP enable 32bit support. That would give us the following simplified picture which I had in the back of my mind:

               /     +---------------------------------+                           +--------+
              |      |    JSON API (IDs are strings)   |     <--- network  -->     | client |
              |      +---------------------------------+                           +--------+
PHP (32bit)  -|      |   OO-Model + "Business logic"   |
              |      +---------------------------------+
              |      |    Laraval  (IDs are strings)   |
               \     +=================================+
                     |  DB backend (IDs are integers)  |
                     +---------------------------------+

Now, you are telling me, that Laravel requires integers anyway and thus seems to be capable of handling 64bit integers. But Laravel is part of the PHP layer. So, where exactly is the 32bit problem? I am completely confused now.

@nagmat84
Copy link
Collaborator Author

I also agree with you with respect to partial localization of strings. I documented some of the issues around that in #829. The major underlying issue here is that we store them in this butchered form in the database...

I see. I would call that a typical "wtf?!". Changing the backend to store a better, i.e. "raw" encoding, like seconds since epoch for timestampe is one thing. But it is a completely different pair of shoes to migrate existing databases to a new format.

@ildyria
Copy link
Member

ildyria commented Apr 17, 2021

Basically when I started to work on the v4 of Lychee, I had decided to rewrite everything in Laravel, because it makes it significantly easier to manage the databases connection, requests etc. The parent-children & ownership relations in Laravel works only on numerical ID. So it has to be either Int (32bit) or BigInt (64bit). At that time, I was still unaware of the php 32bit problem.

As our id are epoch based, I decided to go 64-bit id instead of 32-bit id, this in order to avoid the year 2038 problem (17 years left...). It is only later that we realized the problem of php32 bit.

As of now, 32-bit ID are generated by truncating the last 3 digits of the epoch time (not the prettiest solution). Why does this matters? Because as you mentioned the DB are not the problem here, it is PHP. When retrieving a 64-bit integer value from the DB into PHP, there is some kind of truncation done by PHP, and as a result, the id do not match anymore...

We only discovered later that for this reason the original author of Lychee was using strings as ID in the database to circumvent this 32-bit problem.

@ildyria
Copy link
Member

ildyria commented Apr 17, 2021

The architecture I am considering switching is :

               /     +-----------------------------------------+                           +--------+
              |      |    JSON API (IDs are ENCRYPTED strings) |     <--- network  -->     | client |
              |      +-----------------------------------------+                           +--------+
PHP (32bit)  -|      |   OO-Model + "Business logic"           |
              |      +-----------------------------------------+
              |      |    Laraval  (IDs are integers)          |
               \     +=========================================+
                     |  DB backend (IDs are integers)          |
                     +-----------------------------------------+

The reason being, the problem mentioned here and here and for uniformity for the front-end. Which is why I am strongly in favor of solution 2:

  1. we add an encryption (and decryption) function that convert IDs into a Youtube-like string, the key for such encryption is naturally kept secret and admin defined, guaranteeing the non-predictability of the result.

@nagmat84
Copy link
Collaborator Author

nagmat84 commented May 2, 2021

I have started working to fix the low-hanging fruits, i.e. added some ..._raw attributes to the JSON API. However, I came across a problem understanding some code. IMHO, it should never have worked in the first place, but it obviously does. So I guess it is some of those PHP "miracles". (I'm not a good PHP developer, I must admit).

My prime question relates to App\Models\Extensions\PhotoCast::urls(array &$return). On top, I have some minor questions regarding the workflow.

However, I don't want to clutter this thread with my questions. Maybe, we could take them offline? What is the right place to discuss those questions?

@ildyria
Copy link
Member

ildyria commented May 2, 2021

IMHO, it should never have worked in the first place, but it obviously does. So I guess it is some of those PHP "miracles".

😆

We have gitter: https://gitter.im/LycheeOrg/Lobby You can just login with your github ID. :)

@reptilex
Copy link

I'm thinking (I might never find the time but ...) about writing an IOS client because I would love to access this through my apple tv. So I will be following this conversation. Thanks for all the great work so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants