Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Speed up lazy loading #3827

Merged
merged 23 commits into from
Sep 11, 2018
Merged

Speed up lazy loading #3827

merged 23 commits into from
Sep 11, 2018

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Sep 7, 2018

Rather than pulling room member events out of state storage to calculate room summaries, instead generate it from the room_memberships table.

@ara4n
Copy link
Member Author

ara4n commented Sep 7, 2018

(this speeds up LL by about 2-3x)

@ara4n ara4n requested a review from erikjohnston September 7, 2018 23:14
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only had a chance to look at the lazy loading bits of this. Can you please split out the fixes to get_filtered_current_state_ids to a separate PR please?

Things look broadly fine, other than its quite confusing trying to keep track of what r is any given situation. Hopefully the suggested changes will help clear things up a bit.

The only major thing is whether we are currently pulling the members out of the database. It feels odd to pull out 6 arbitrary users regardless of membership, while it seems the sync code assumes that e.g. if there are joined users in the room MemberSummary.members will be nonempty

" AND m.user_id = c.state_key"
" WHERE c.type = 'm.room.member' AND c.room_id = ?"
" GROUP BY m.membership"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can write:

sql = """
    SELECT count(*), m.membership FROM room_memberships as m 
    INNER JOIN current_state_events as c
    ON m.event_id = c.event_id
    AND m.room_id = c.room_id
    AND m.user_id = c.state_key
    WHERE c.type = 'm.room.member' AND c.room_id = ?
    GROUP BY m.membership
"""

rather than faffing around with quotes. It has the added benefit that it makes it much easier to c+p :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed; this was just trying to be consistent with what was already there.

txn.execute(sql, (room_id,))
res = {}
for r in txn:
summary = res.setdefault(to_ascii(r[1]), MemberSummary([], r[0]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nicer to write

for count, membership in txn:
    ...

so that its easier to verify that things match the intent

dict of membership states, pointing to a MemberSummary named tuple.
"""

def f(txn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name these something like _get_room_summary_txn or whatever, it makes stack traces a lot easier to understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this was following the pattern already present, fwiw)

txn.execute(sql, (room_id, 6))
for r in txn:
summary = res.get(to_ascii(r[1]))
members = summary.members
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this explode if summary is None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it'll definitely not be None due the shape of the queries, but then we should probably not use get to make the intent a bit clearer.

synapse/storage/events.py Show resolved Hide resolved
Membership.BAN
):
for r in details.get(m, ([], 0))[0]:
member_ids[r[0]] = r[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not following what this is doing. Can we choose variable names and/or unpack tuples to help clarity please?

Unpacking this a bit seems to give:

for user_id, event_id in details.get(membership, empty_ms).members:
     member_ids[user_id] = event_id

I think?

@@ -51,6 +51,12 @@
"ProfileInfo", ("avatar_url", "display_name")
)

# "members" points to a truncated list of (user_id, event_id) tuples for users of
# a given membership type, suitable for use in calculating heroes for a room.
# "count" points to the total numberr of users of a given membership type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that members is a list of tuples rather than a dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; i was worried about creating a new large cache in addition to the one we already maintain of room members, and so wanted it to be small (i'm assuming that tuples take up less room as they don't include key strings).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i'm assuming that tuples take up less room as they don't include key strings)

Well this tuple is including the same number of strings, right? It might save a bit by not including the hash map I guess

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're worried about the size of the cache then what we can do is have get_room_summary return an object with a custom __len__ implementation that returns the sum of the sizes of MemberSummary.members. This would ensure that the total number of tuples stored by the cache is less than the given cache size (as opposed to the number of room summaries being less than the given cache size)

synapse/handlers/sync.py Show resolved Hide resolved
# XXX: this may include false positives in the form of LL
# members which have snuck into state
batch.limited and
any(t == EventTypes.Member for (t, k) in state.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use .keys() or .items() etc, they are expensive in py2 due to them copying the data into a list. Either use six.iteritems etc to get iterators, or in this case simply drop .keys() as iterating over a dict iterates over the keys anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL :)

# members which have snuck into state
batch.limited and
any(t == EventTypes.Member for (t, k) in state.keys())
) or
since_token is None
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can this have a comment to explain why we're not always including the summary and when we do include a summary? The logic is non-trivial and its hard to know whether its right or not without a guiding comment.

@ara4n
Copy link
Member Author

ara4n commented Sep 10, 2018

Can you please split out the fixes to get_filtered_current_state_ids to a separate PR please?

done, it's back in #3792

The only major thing is whether we are currently pulling the members out of the database. It feels odd to pull out 6 arbitrary users regardless of membership, while it seems the sync code assumes that e.g. if there are joined users in the room MemberSummary.members will be nonempty

Right. i guess it needs an order by membership in there to make sure we prefer joined users over invited ones and only use left users in a disaster.

@ara4n
Copy link
Member Author

ara4n commented Sep 10, 2018

@erikjohnston ptal

@ara4n ara4n assigned erikjohnston and unassigned ara4n Sep 10, 2018
@ara4n
Copy link
Member Author

ara4n commented Sep 11, 2018

i got an IRL lgtm from @erikjohnston on this. i think.

@ara4n
Copy link
Member Author

ara4n commented Sep 11, 2018

n.b. this is now twinned with matrix-org/sytest#489

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo perhaps wanting to further pin down the ordering of heroes

@ara4n
Copy link
Member Author

ara4n commented Sep 11, 2018

the failed test here seems to be unrelated; merging anyway.

@ara4n ara4n merged commit b041115 into develop Sep 11, 2018
@hawkowl hawkowl deleted the matthew/speed-up-ll branch September 20, 2018 14:00
hawkowl added a commit that referenced this pull request Sep 24, 2018
Features
--------

- Python 3.5 and 3.6 support is now in beta.
([\#3576](#3576))
- Implement `event_format` filter param in `/sync`
([\#3790](#3790))
- Add synapse_admin_mau:registered_reserved_users metric to expose
number of real reaserved users
([\#3846](#3846))

Bugfixes
--------

- Remove connection ID for replication prometheus metrics, as it creates
a large number of new series.
([\#3788](#3788))
- guest users should not be part of mau total
([\#3800](#3800))
- Bump dependency on pyopenssl 16.x, to avoid incompatibility with
recent Twisted.
([\#3804](#3804))
- Fix existing room tags not coming down sync when joining a room
([\#3810](#3810))
- Fix jwt import check
([\#3824](#3824))
- fix VOIP crashes under Python 3 (#3821)
([\#3835](#3835))
- Fix manhole so that it works with latest openssh clients
([\#3841](#3841))
- Fix outbound requests occasionally wedging, which can result in
federation breaking between servers.
([\#3845](#3845))
- Show heroes if room name/canonical alias has been deleted
([\#3851](#3851))
- Fix handling of redacted events from federation
([\#3859](#3859))
-  ([\#3874](#3874))
- Mitigate outbound federation randomly becoming wedged
([\#3875](#3875))

Internal Changes
----------------

- CircleCI tests now run on the potential merge of a PR.
([\#3704](#3704))
- http/ is now ported to Python 3.
([\#3771](#3771))
- Improve human readable error messages for threepid
registration/account update
([\#3789](#3789))
- Make /sync slightly faster by avoiding needless copies
([\#3795](#3795))
- handlers/ is now ported to Python 3.
([\#3803](#3803))
- Limit the number of PDUs/EDUs per federation transaction
([\#3805](#3805))
- Only start postgres instance for postgres tests on Travis CI
([\#3806](#3806))
- tests/ is now ported to Python 3.
([\#3808](#3808))
- crypto/ is now ported to Python 3.
([\#3822](#3822))
- rest/ is now ported to Python 3.
([\#3823](#3823))
- add some logging for the keyring queue
([\#3826](#3826))
- speed up lazy loading by 2-3x
([\#3827](#3827))
- Improved Dockerfile to remove build requirements after building
reducing the image size.
([\#3834](#3834))
- Disable lazy loading for incremental syncs for now
([\#3840](#3840))
- federation/ is now ported to Python 3.
([\#3847](#3847))
- Log when we retry outbound requests
([\#3853](#3853))
- Removed some excess logging messages.
([\#3855](#3855))
- Speed up purge history for rooms that have been previously purged
([\#3856](#3856))
- Refactor some HTTP timeout code.
([\#3857](#3857))
- Fix running merged builds on CircleCI
([\#3858](#3858))
- Fix typo in replication stream exception.
([\#3860](#3860))
- Add in flight real time metrics for Measure blocks
([\#3871](#3871))
- Disable buffering and automatic retrying in treq requests to prevent
timeouts. ([\#3872](#3872))
- mention jemalloc in the README
([\#3877](#3877))
- Remove unmaintained "nuke-room-from-db.sh" script
([\#3888](#3888))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants