Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign/revamp oc_storages to fix various issues #11830

Closed
PVince81 opened this issue Oct 29, 2014 · 17 comments
Closed

Redesign/revamp oc_storages to fix various issues #11830

PVince81 opened this issue Oct 29, 2014 · 17 comments

Comments

@PVince81
Copy link
Contributor

Currently the oc_storages has no real unique database id but requires the PHP code to build ids on the fly.

Here is how the oc_storages table looks like:

MariaDB [owncloud]> select * from oc_storages;
+---------------------------------------+------------+
| id                                    | numeric_id |
+---------------------------------------+------------+
| 77703924cafa1097220e2ff1a8d2b0e7      |          3 |
| home::root                            |          1 |
| local::/srv/www/htdocs/owncloud/data/ |          2 |
+---------------------------------------+------------+

First, the id column must be computed in PHP based on the following logic:

  1. If we're looking for the storage of user "someuser", then the id will be "home::someuser"
  2. If we're looking for the entry of a specific external storage, the id is computed based on a custom algorithm which depends on the external storage backend. For example SMB would generate an id like smb://user@host/rootpath.

Additionally if the id's length is bigger than 64 characters, then the md5() of that id will be used.

The current schema is troublesome, because:

  1. Reverse lookups by id are not possible: when the md5() function kicks in, it is not possible to get the user id directly. One needs to iterate over ALL users (imagine LDAP with millions of users) to find out to who this storage belongs.
  2. We already have another id called "numeric_id", why not use that instead ?

@icewind1991 @DeepDiver1975 does anyone know why we have these string ids in the first place ?
Why not use the numeric_id ? Every user could have a oc_preferences entry mapping them to a given storage. This would prevent strange and bad issues like the legacy storage + data dir moving issue (#10897).

What speaks against discarding the storage id string column and working only with the numeric_id ?
We cannot rely on the storage id column to detect duplicate mount configurations (from external storage).

I suggest to do the following:

  1. Get rid of oc_storages. If not, at least get rid of the id column and use a purely numeric_id based access.
  2. Every user gets a storage_id in their oc_preferences table that points to their home storage id. This way there will be no more slippages (legacy storage issues with lost shares, etc...)
  3. The external storage mount configuration mount.json will also be extended with a "storage_id" parameter that points to the id of the matching storage. That id is created as soon as the mount point is added.
  4. Optional: move mount.json into a table ? Are there may use cases for that file ?
  5. Optional: add a column "owner" to the oc_storages table which could make resolving file owners from shares easier instead of having to jump through hoops

What do you think ?

@PVince81
Copy link
Contributor Author

Maybe oc_storages could look like this:

column1: unique id (formerly known as numeric_id)
column2: storage type: "home", "external"
column3: owner: user name

Having the owner name here would make it much easier to find out "who is the owner of the file with id 123", which is a common problem with sharing. CC @schiesbn

@icewind1991
Copy link
Contributor

The problem with dropping the string id is that it would remove the way to get the numeric storage id from an instance of the storage class, which is needed to be able to access it's cache etc

@PVince81
Copy link
Contributor Author

@icewind1991 why do we need to map a storage class instance to a string id ? Is there a better way ?

If mount.json contains the matching numeric_id and oc_preferences the one for the user, there is no more need for this id.

@PVince81
Copy link
Contributor Author

This also means that we need to tell that storage instance (through the constructor) what numeric_id it has in the database.

@PVince81
Copy link
Contributor Author

I'd also like to remind again how much pain this (unstable id) has caused for the legacy storages, not only for me for implementing a repair step, but also for all the people who had to manually their database and needed to "guess" the ids because they were in md5 format. Also: this pain is still not over, as many people will continue to run into this issue when they decided to move the data folder. I'd like to avoid such pains in the future.

Another pain for external storage that has no stable id, as it is generated by the backend config. Which means that if you change the ext storage config (ex: user name), its string id changes, resulting in a new numeric_id being created, so it becomes a different storage entry. This makes it much harder to manage storages in the first place (and detect duplicates). I'd expect from any database object that they get a unique stable id that cannot be change.

Let's work together to find a suitable solution, possibly with a cleaner database design. (as part of technical debt fixing)

@icewind1991
Copy link
Contributor

Which means that if you change the ext storage config (ex: user name), its string id changes, resulting in a new numeric_id being created, so it becomes a different storage entry.

This also has positive effects, causing changes to the external storage config will trigger a rescan which is needed in a lot of cases.

In general I'm very much in favor for cleaning the storage system

@PVince81
Copy link
Contributor Author

I think the rescan can be achieved through other means, for example a dirty flag or a "last scanned mtime" column in that table.
I'd say the external storage app must explicitly mark the storage as dirty / reset the mtime column (for rescan) after changing the config instead of relying on that implicit behavior.
.

@PVince81
Copy link
Contributor Author

@Xenopathic made a nice suggestion for storage the ext storage config in the database and have a storage to user mapping: #11261 (comment)

@PVince81
Copy link
Contributor Author

Another case of storage id slippage, related to LDAP: #12035

@mmattel
Copy link
Contributor

mmattel commented Apr 7, 2015

As described in the top:

I like the suggestion of @PVince81 to do a oc_storages cleanup and a extension of mount.json
This would be much more consistant and managable than it is right now.
Keep in oc_storages the columns,
...numeric_id is a unique number identifying the storage, a reference number
...id will only keep the type and path, so it is visible where the numeric_id belongs to

And as a extension, all files or folders shared need to get in their table a storage_id in addition.
Or in other words, all tables containing any reference to a file or folder will get a column storage_id added and referenced

@PVince81
Copy link
Contributor Author

PVince81 commented Apr 7, 2015

@mmattel I agree about the storage_id column/mapping. I thought I had documented this already but can't find it any more.

@mmattel
Copy link
Contributor

mmattel commented Apr 7, 2015

I can not support you from the coding point of view. My knowledge is limited here.
But it is imho urgently necessary to do these adoptions due to the success of owncloud and its broader acceptance. It is maybe not easy to do, but as later it is done, as more complicated it will become...

@PVince81
Copy link
Contributor Author

PVince81 commented Apr 7, 2015

Something related: mapping user name to storage #15166

@DeepDiver1975 I'll set this to 8.2

Also CC @MorrisJobke who also went through the storage pains before

@PVince81
Copy link
Contributor Author

Another case of storage id mismatch which leads to loss of shares: #17800 😡

@PVince81
Copy link
Contributor Author

This huge task is out of scope now as we're past feature freeze, moving to 9.0

@PVince81 PVince81 modified the milestones: 9.0-next, 8.2-current Sep 21, 2015
@PVince81
Copy link
Contributor Author

We've made progress, there is now a table "oc_mounts" containing the mount points of every user:
#20768

@lock
Copy link

lock bot commented Aug 6, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants