Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to account ids #29503

Closed
PVince81 opened this issue Nov 7, 2017 · 38 comments
Closed

Transition to account ids #29503

PVince81 opened this issue Nov 7, 2017 · 38 comments

Comments

@PVince81
Copy link
Contributor

PVince81 commented Nov 7, 2017

Currently, most tables in OC are based on the immutable user id.
Changing all tables to use account id instead would take a long time and the benefit of being able to change a user id is deferred to the moment where all tables were adjusted.
Also changing too many at once in the hope to have this feature soon is time costly and dangerous.

I suggest to use an alternate temporary route:

  1. Introduce a new column oc_accounts.internal_user_id. The column defaults to the same value as oc_accounts.user_id at user creation or import.
  2. Let the admin rename the user id, in which case only oc_accounts.user_id is changed.
  3. Internally, we always use internal_user_id when mapping with other tables. So the code needs to be rewired to work with internal_user_id for internal operations, but keep user_id when dealing with logins or user:sync.

Once this is done, we already gain the benefit of user id renaming.

After that, we can take our time to migrate all other tables one by one to use oc_account.id instead (and use the opportunity to add a foreign key there...).

Once all tables are migrated, we can get rid of oc_accounts.internal_user_id and make it use user_id again.

@tomneedham @cdamken @butonic @DeepDiver1975 @jvillafanez @pmaier1

@cdamken
Copy link
Contributor

cdamken commented Nov 8, 2017

Let the admin rename the user id, in which case only oc_accounts.user_id is changed.

This depends on the backend and if they want to update additional stuff, for example, the storage path.

would that be over a occ command?

@butonic
Copy link
Member

butonic commented Nov 8, 2017

how about taking this one step further: instead of oc_accounts.internal_user_id let us add oc_accounts.uuid.

  • For any existing users the userid will be null.
  • For new DB users a RFC 4122 UUID will be generated.
  • For new LDAP users the UUID attribute will be used.

This way, the account can fall back to the userid if the uuid is null. New users will always get a uuid. Not all admins have configured a proper UUID attribute for ldap, we should add a warning in the log whenever a non RFC4122 UUID is encountered. But for correct ones we could fill the uuid column when changing the existing userid in all the other tables.

Same approach as with the oc_accounts.internal_user_id But it would allow us to keep the uuid column and use its nice properties in the future, eg: merging instances, create primary key without a trip to the db, no need for autoincrement primary key.

Hm ... the drawback is that storing the uuid efficiently requires a binary(16) column. which means we would have to alter the columns of all the db tables ... or live with the performance implications when storing as string. Well, it does not get any worse, because we already use string everywhere. we could even limit the length to 36chars.

Yuck, in the addressbook cards and federated share ids we store the userid:

BEGIN:VCARD
VERSION:3.0
PRODID:-//Sabre//Sabre VObject 4.1.2//EN
UID:8d9f02f6-d23b-1035-960f-9bb74d28f7e6
FN:Junge\, Susan
N:Susan;Junge\,;;;
CLOUD:8d9f02f6-d23b-1035-960f-9bb74d28f7e6@localhost:8000/oc/core
END:VCARD

This means our userids are now out there and must never be renamed. We should make new users use a uuid, so this becomes more stable and uniform. But for existing users the userid must not change, independently of the effort it takes to implement a migration for all existing users. ... We could try to implement some kind of aliases for all endpoints that expose the user id ...

We might need to check if uuids from ldap are really unique or at least log when a collision is detected? Can we detect it?

also see

@ghost
Copy link

ghost commented Nov 8, 2017

Ref to an old feature request / discussion about that: #14476

@jvillafanez
Copy link
Member

What's the point of going through the temporary route? What's the problem we're trying to solve that can't wait to do things "properly"? If we have to make the transition then we should schedule time and do it. If what we're going to code we're already killing it, I think we're planning a waste of time.

Now, changing the perspective a bit. What if the "temporary route" is permament? What are the benefits of adding (permanently) a mapping between the account and the user backend? I think this is something we must think about first before going through it.
If there are benefits, we can implement it properly, add value to the product and then make the transition.

@DeepDiver1975
Copy link
Member

DeepDiver1975 commented Nov 8, 2017

The internal id (integer) is the id which is to be used dir Database integrity as Foreign Keys.

The uiud Could be an externally used Identifier e.g. for federated Sharing

@cdamken
Copy link
Contributor

cdamken commented Nov 8, 2017

Why integer?

@DeepDiver1975
Copy link
Member

Why integer?

Why put Guacamole into Chilli-Wraps? - Because we do it this way since our for-fathers 🤣

@PVince81
Copy link
Contributor Author

PVince81 commented Nov 9, 2017

Why integer?

Because that's how one properly designs a database schema. You give numeric ids to your table entries as primary key so you can link/join multiple tables based on that, ideally with foreign keys... (btw #13143)

@cdamken
Copy link
Contributor

cdamken commented Nov 29, 2017

I meant why integer if we can use it for more stuff, like https://github.com/owncloud/platform/issues/154

an Unique_ownCloud_User_ID would also help for future stuff.

@butonic
Copy link
Member

butonic commented Dec 12, 2017

int as primary key is like ... really old school: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439 ;-)

@PVince81
Copy link
Contributor Author

#14476 contains a lot of input

@butonic
Copy link
Member

butonic commented Feb 26, 2018

On step after another. I thunk we should first change the column names in the account table to reflect the reality:

  1. user_id - cannot be changed and is used throughout all our tables as de facto foreign key. rename to uuid. Leave existing users as they are. for new users generate a uuid.
  2. lower_user_id - always lowercase, unique -> rename to username, used as login and in search results.

Also see https://cloudplatform.googleblog.com/2018/01/12-best-practices-for-user-account.html

This will allow users to change their login / username as long as it is unique.

The ldap backend can implement a new IProvidesUUID interface so the uuid stays the same and admins can identify users by the same uuid everywhere. Also the ldap backend should implement a new IProvidesUsername interface that allows syncing the current login (for AD eg samaccountname or userPrincipalName)

Will require changes to the user listing because we should show the username, not the uuid.

But this can be done by starting to use the account tables column in a slightly different way.

@DeepDiver1975
Copy link
Member

Altering column names is tricky on the various dbms we support. Not sure if we should take that risk with respect to long running migrations ....

@DeepDiver1975
Copy link
Member

While I'm pretty aware of the old schoolness of ints and uuids being more cool (yes - they have real benefits - I know) - I still don't see this as high prio as well.

No matter how we name the columns or if we use int vs uuid - we need to carry on the foreign key mechanism and start referencing the user in other tables.
For me using ints for this is still the way to go.

TBH: I have no idea if we can use uuids in all supported dbs in the way we want to. Because I expect them to be autoincremented/generated by the system on insert of a new record set. Ans what about doctrine - does it support uuid as pk/autoincrement.
Too many open questions to make a decision right away.

@tomneedham
Copy link
Contributor

Not sure if we should take that risk with respect to long running migrations ....

For oc_accounts tables up to 1m rows this should still be acceptable - not as expensive as changing column types on oc_filecache.

@butonic
Copy link
Member

butonic commented Feb 26, 2018

@DeepDiver1975 I agree that moving to account ID as the foreign kay is the final goal. But we should only expose a uuid to external systems. and yes that should be ints.

for now I am happy with storing uuids as strings, as it will make the migrations a lot easier because the existing user_id can just be taken as the uuid. Remember. the current user_id is unique and cannot change. In effect we can treat it as a uuid.

@DeepDiver1975
Copy link
Member

In effect we can treat it as a uuid.

and in case of ldap it is - most of the time ;-)
but this means we need to add yet another column for the login name(s)

@butonic
Copy link
Member

butonic commented Feb 26, 2018

nope ... lower_user_id already is the login column ... the same as username.

@butonic
Copy link
Member

butonic commented Feb 26, 2018

working in #30617

@jvillafanez
Copy link
Member

The ldap backend can implement a new IProvidesUUID interface so the uuid stays the same and admins can identify users by the same uuid everywhere. Also the ldap backend should implement a new IProvidesUsername interface that allows syncing the current login (for AD eg samaccountname or userPrincipalName)

Please, no more of these interfaces. This is a a really bad practice IMHO.

@butonic
Copy link
Member

butonic commented Feb 27, 2018

@jvillafanez Not all user backends can provide emails or even avatars. What do you suggest? A single interface with all the methods? Or an abstract class that backends can extend with the default method implementations all throwing NotImplementedException? Please elaborate!

@jvillafanez
Copy link
Member

jvillafanez commented Feb 27, 2018

abstract class that backends can extend with the default method implementations all throwing NotImplementedException

This seems a better option for extension.

The main problem is to provide a way to include optional features / methods that might not be implemented by the backends. Using an abstract class to provide the default behaviour seems a reasonable choice.

We have a IUser interface. Make an abstract class AbstractUser or DefaultUserImpl that implements the IUser interface and provides the default implementations of all the optional methods (either provide a default value or throw a NotImplementedException, as long as it's properly documented in the IUser interface, it shouldn't matter). Then make all the backends extend from the abstract class and let them provide custom implementations.

In order to add a new method to the interface, add it in the IUser interface and provide a default implementation in the abstract class if the method is intended to be optional. If the method has to be implemented by the backend (or a default can't be provided), just make it abstract in the abstract class (note that this will break the apps at least temporary).

The overall idea would be:

  • Create AbstractRWUser and AbstractROUser
  • The AbstactRWUser:
    • Will have all the "setWhatever" methods abstract, forcing backends to implement those methods
    • Required "getWhatever" methods (such as the id and the displayname) will also be abstract
    • Optional "getWhatever" methods will provide a default value. Backends can override the method if needed
  • The AbstractROUser:
    • Will have all the "setWhatever" methods will throw a NotImplementedException by default. These methods shouldn't be call anyway.
    • Required "getWhatever" methods (such as the id and the displayname) will be abstract
    • Optional "getWhatever" methods will provide a default value. Backends can override the method if needed

The only potential problem is a new "setWhatever" method in the AbstractRWUser. My proposal, in this case, is to throw a NotImplementedException by default in the AbstractRWUser and warn the developers that the method will be required in OC version Y. When we reach that version, make it abstract and clean this.

@mrow4a
Copy link
Contributor

mrow4a commented Feb 28, 2018

In addition, we also have autoincreamented id ;o maybe we should drop it and replace with uuid?

@DeepDiver1975
Copy link
Member

@jvillafanez long story short (we had this discussion already in the scope of ldap): ownClouds API is made of interfaces. We don't use the mechanism of abstract base classes.

While there seem to be benefits with respect to abstract base classes I prefer to keep clean on the API side. We use interfaces.

@DeepDiver1975
Copy link
Member

In addition, we also have autoincreamented id ;o maybe we should drop it and replace with uuid?

integer id is internal reference - we use it for foreign keys. the uuid is a unique identifier which is available outside as well.

@mrow4a
Copy link
Contributor

mrow4a commented Feb 28, 2018

@DeepDiver1975 you can use uuid as foreign key as well (and not use id)

@butonic
Copy link
Member

butonic commented Feb 28, 2018

nope

uuid is for exposal to other systems, eg other oc instances when establishing federated shares. while you can use the [email protected] to send the share to a user with the current user name username it should resolve to the [email protected] as the federated sharing id.

We might need to give these things different names to better clarify and document how it works.

the account id is used internally to keep foreign keys short and fast.

@butonic
Copy link
Member

butonic commented Feb 28, 2018

real uuid would require BINARY(16) culumns as well as reordering the bytes of the uuid so they sort nicely with indexes. For more details see https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

@DeepDiver1975
Copy link
Member

real uuid would require BINARY(16) culumns as well as reordering the bytes of the uuid so they sort nicely with indexes. For more details see https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

hopefully this works on all dbs ......... 😨

@mrow4a
Copy link
Contributor

mrow4a commented Feb 28, 2018

Convinced

@butonic
Copy link
Member

butonic commented Feb 28, 2018

we cannot move to binary uuid because we have legacy userids that are not uuids. but they are out there as federated sharing ids ... unless we decide to kill all federated shares and force a reconnect ... a migration might be possible ... in the context of ocm. cc @tomneedham

@mrow4a
Copy link
Contributor

mrow4a commented Feb 28, 2018

We need to check if BINARY works as expected first (as a test), and then maybe discuss federated shares (which to me is still immature feature which is under discussion in ocm). I would kill legacy fed shares.. and give proper notification/activity to user that it was deleted maybe?

@tomneedham
Copy link
Contributor

Created a ticket (#30690) to track impacts of this change on federation to keep the two discussions isolated.

@butonic
Copy link
Member

butonic commented Feb 12, 2019

we actually now have username und userid. introduced accidentially with https://github.com/owncloud/core/pull/32587/files#diff-3030ca4989f892c50dea24eaa3f1ae38R133, even as public interface https://github.com/owncloud/core/pull/32587/files#diff-dc2e2ee403cebc838875eddf32a8a7beR50

We can now add apis to search byUsername and byUserId and even allow renaming users. Existing userids will just stay the same ... new ones could always be uuids ...

@TommyTran732
Copy link

Is this still being worked on?

@DeepDiver1975
Copy link
Member

we have account ids in place. The owncloud php code base will no longer receive changes of this scale -> closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants