Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Rewrite userdir to be faster #4537

Merged
merged 71 commits into from
Mar 7, 2019
Merged

Rewrite userdir to be faster #4537

merged 71 commits into from
Mar 7, 2019

Conversation

hawkowl
Copy link
Contributor

@hawkowl hawkowl commented Jan 31, 2019

Fixes #4581

@codecov-io
Copy link

codecov-io commented Jan 31, 2019

Codecov Report

Merging #4537 into develop will increase coverage by 0.21%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop    #4537      +/-   ##
===========================================
+ Coverage    75.13%   75.34%   +0.21%     
===========================================
  Files          340      340              
  Lines        34984    34878     -106     
  Branches      5739     5705      -34     
===========================================
- Hits         26285    26279       -6     
+ Misses        7082     6986      -96     
+ Partials      1617     1613       -4

@hawkowl hawkowl requested a review from a team February 5, 2019 11:19
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said to you earlier: there's a lot of stuff going on here; it's hard to get a handle on all of it at once, and breaking it up into smaller chunks would be very helpful, both to help my understanding of what's going on, and in giving confidence that the changes are sensible.

In terms of the actual schema change: it looks like we're simply splitting users_who_share_rooms in two, based on the value of the share_private flag. I'm certainly not saying that's a silly thing to do, but at face value it looks like we end up doing twice the read queries (since we have to check each table separately), and I'm not following why this gives a dramatic improvement in the write path. Most likely I've just missed the relevant bit of code among all the other changes: can you spell it out?

It looks like we're dropping the old table but I'm not immediately seeing any code to populate the new ones: am I missing it, is it yet to be written, has it been forgotten?

As a general note on this sort of PR (and following up from what I was saying earlier): it's really helpful to a potential reviewer if:

  • you can give a high-level walkthrough of the changes - it's very hard to see where to start when github just presents it as a big diff sorted alphabetically.
  • you can give some sort of overview of the state of the PR ("I'm aware that this is going to need more comments, tests, and better naming, but feedback on the schema changes would be useful") - otherwise I tend to assume it's a done deal.
    Obviously no need for an essay on either front, but "Rewrite userdir to be faster" doesn't give me much to go on!

@hawkowl hawkowl requested a review from richvdh February 27, 2019 23:04
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this certainly looks way saner. thank you.

A few thoughts and questions...

@@ -51,14 +51,14 @@ class UserDirectoryHandler(object):
INITIAL_USER_SLEEP_MS = 10

def __init__(self, hs):
self.hs = hs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule, we try to avoid generic self.hs references, in favour of pulling things out of hs in the constructor. It makes dependencies more explicit and gives a better chance of detecting broken references early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also makes it impossible to reload or edit config during tests :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you want to argue that the advantages of this style outweigh the disadvantages, it's a topic for a retro or something. For now please put it back as it was.

synapse/storage/user_directory.py Outdated Show resolved Hide resolved
synapse/handlers/user_directory.py Outdated Show resolved Hide resolved
synapse/handlers/user_directory.py Show resolved Hide resolved
tests/handlers/test_user_directory.py Outdated Show resolved Hide resolved
synapse/handlers/user_directory.py Show resolved Hide resolved

# Then, re-add them to the tables.
for user_id, profile in iteritems(users_with_profile):
yield self._handle_new_user(room_id, user_id, profile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like we're (still) repeating ourselves here.

Suppose there are 3 local users in the room: A, B, C.

First we call _handle_new_user with A. That will add entries: (A, A), (A, B), (A, C), (B, A), (C, A).
Then we call _handle_new_user with B. That will add entries: (B, B), (B, A), (B, C), (A, B), (C, B).
Finally we call _handle_new_user with C. That will add entries: (C, C), (C, A), (C, B), (A, C), (B, C).

So most of those entries are being added twice.

(happy if you want to say that's an existing problem, to be ignored for now, but since we're rewriting this it seems a reasonable time to consider it, or at least add a comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the adding yourself, that drops one of those, but yes, it does repeat that, but that's an existing problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the adding yourself, that drops one of those

Ironically the "adding yourself" was the only entry which wasn't being redone.

yes, it does repeat that, but that's an existing problem.

Fair enough. I'd still be happier if you could add a TODO or something so that I know I'm not going mad next time I look at this code.

synapse/handlers/user_directory.py Show resolved Hide resolved
synapse/storage/user_directory.py Show resolved Hide resolved
synapse/storage/schema/delta/53/user_share.sql Outdated Show resolved Hide resolved
@hawkowl hawkowl requested a review from richvdh March 2, 2019 03:15
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now other than #4537 (comment) and #4537 (comment).

Thanks!

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

@hawkowl hawkowl merged commit f6135d0 into develop Mar 7, 2019
@hawkowl hawkowl deleted the hawkowl/speed-userdir branch March 7, 2019 09:33
richvdh added a commit that referenced this pull request Jul 8, 2019
Now that the userdir has been rewritten (#4537 etc), I reckon there's a good
chance of these tests working right. Let's find out.

Fixes #2306
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants