Split the state_group_cache in two #3726

ara4n · 2018-08-20T23:52:34Z

One half contains normal state events; the other contains member events.

The idea is that the lazyloading common case of wanting a subset of the members events but the entirety of the other events can be accomplished efficiently without having to make DictionaryCache aware of these sort of complicated semantics.

one half contains normal state events; the other contains member events. the idea is that the lazyloading common case of wanting a subset of the members cache but the entirety of the other cache can be accomplished efficiently without having to make DictionaryCache aware of these sort of complicated semantics

ara4n · 2018-08-21T01:22:32Z

no longer WIP; @richvdh ptal.

richvdh · 2018-08-21T11:24:57Z

synapse/storage/state.py

+            if members is True:
+                clause_to_args.append(("AND type = ?", (EventTypes.Member, )))
+            elif members is False:
+                clause_to_args.append(("AND type <> ?", (EventTypes.Member, )))


ok, so suppose I call _get_state_groups_from_groups([group_id], types=[('m.room.topic', '')], members=False).

I think this means we will do two separate selects:

SELECT {...} WHERE state_group IN (...) AND type='m.room.topic' AND state_key=''; SELECT {...} WHERE state_group IN (...) AND type<>'m.room.member';

... which doesn't sound right?

yup, you're right - i'd forgotten (again) that the postgres branch implements the clauses as a series of selects rather than concatenating them.

richvdh

Ok, having pored over this for some time, I still think it's confusing, but I'll concede that I don't have any other bright ideas right now. Some (hopefully small) thoughts below.

Your tests are failing, possibly because of the whole ? vs %s thing.

richvdh · 2018-08-21T14:37:26Z

synapse/storage/state.py

+        )
+        self._state_group_members_cache = DictionaryCache(
+            "*stateGroupMembersCache*",
+            500000 * get_cache_factor_for("stateGroupMembersCache")


we'll need to remember to tune this when we deploy it.

richvdh · 2018-08-21T14:38:38Z

synapse/storage/state.py

@@ -61,7 +61,12 @@ def __init__(self, db_conn, hs):
        super(StateGroupWorkerStore, self).__init__(db_conn, hs)

        self._state_group_cache = DictionaryCache(


it would be useful to have a block comment somewhere (and here seems like a reasonable place) which explains in detail what's going on here, and what the two caches do.

richvdh · 2018-08-21T14:41:05Z

synapse/storage/state.py

@@ -284,6 +289,9 @@ def _get_state_groups_from_groups(self, groups, types):
            types (Iterable[str, str|None]|None): list of 2-tuples of the form
                (`type`, `state_key`), where a `state_key` of `None` matches all
                state_keys for the `type`. If None, all types are returned.
+            members (Boolean|None): whether we are limiting this to return just


bool|None. Also the description could do with being a bit clearer. How about:

"If not None, then, in addition to any filtering implied by types, the results are also filtered to only include member events (if True), or to exclude member events (if False)"

richvdh · 2018-08-21T14:47:22Z

synapse/storage/state.py

@@ -358,12 +366,18 @@ def _get_state_groups_from_groups_txn(
                # empty where clause with no extra args.
                clause_to_args = [("", [])]

+            additional_clause = ""
+            if members is True:
+                additional_clause = "AND type = '?'" % EventTypes.Member


you can't use % interpolation on a ? ... I guess you mean %s here

nit: generally we prefer % (foo, ) over % foo to avoid unpleasant surprises when foo turns out to be a tuple.

Could we, instead of building additional_clause, do sql += " AND type = '%s' % (EventTypes.Member,) (and put it at line 350)? I think I'd find that clearer than messing with additional_clause.

richvdh · 2018-08-21T15:22:20Z

synapse/storage/state.py

+            Deferred[dict[int, dict[(type, state_key), EventBase]]]
+                a dictionary mapping from state group to state dictionary.
+        """
+        if types is not None:


I'm finding the multiple calls to _get_state_for_groups_using_cache quite hard to reason about, and the special-case of checking for filtered_types == [EventTypes.Member] is a bit sad (apart from the case of longer lists including EventTypes.member, what if filtered_types is actually a tuple or a set? I know the docstring claims it should be a list but I can't see any reason for that, and the failure mode will be subtle if someone gets it wrong).

How about:

if types is not None: non_member_types = [t for t in types if t[0] != EventTypes.Member] if filtered_types is not None and EventTypes.Member not in filtered_types: # we want all of the membership events member_types = None else: member_types = [t for t in types if t[0] == EventTypes.Member] else: non_member_types = None member_types = None non_member_state = yield self._get_state_for_groups_using_cache( groups, self._state_group_cache, non_member_types, filtered_types, ) member_state = yield self._get_state_for_groups_using_cache( groups, self._state_group_members_cache, member_types, None, )

This is certainly much terser, but ironically i personally find it much more cryptic and hard to reason about, whereas the simple symmetricity of "if we're lazy-loading members, split the query intelligently. if we're filtering on types, split the query naively. otherwise, split the query without filtering" felt clearer by spelling out the flows we care about and following the same pattern for each branch.

However, I don't have strong feelings, and it's nice that this handles the whole "what if filtered_types is a longer list that contains EventTypes.Member" scenario better (which I'd considered, but fell through to the naive handler). So i've gone with it (with a comment to explain the somewhat magical 'None' in the final line).

I think if the param was "we're lazy-loading members", I'd agree with you. But for better or worse, it's not.

richvdh · 2018-08-21T15:39:44Z

synapse/storage/state.py

@@ -725,7 +810,7 @@ def _get_state_for_groups(self, groups, types=None, filtered_types=None):
                types_to_fetch = types

            group_to_state_dict = yield self._get_state_groups_from_groups(
-                missing_groups, types_to_fetch
+                missing_groups, types_to_fetch, cache == self._state_group_members_cache,


this is a bit horrid and feels a bit backwards.

Can you pass a bool param (use_members_cache?) into this function instead of cache?

not sure i follow. passing the right cache object into _get_state_for_groups_using_cache seems fine to me, and simplifies all the other cache references to just be cache.get etc? And _get_state_groups_from_groups is already taking the bool|None to say whether it should be limiting to members or non-members or not. Or is the problem just the inlining the boolean expression?

sorry, this was unclear.

I'm suggesting:

rather than pass a cache into _get_state_for_groups_using_cache, pass a bool which is True to use _state_group_members_cache and False to use _state_group_cache. Then on the first line (or wherever) of _get_state_for_groups_using_cache, do

cache = self._state_group_members_cache if use_members_cache else self._state_group_cache

and then on this line here you can just use use_members_cache instead of going back to a bool.

It just felt odd to be going back to a bool here. Though I don't feel that strongly about it so if you want to leave it alone that's fine.

okay, that's how i interpreted, but am unsure that a cryptic bool flying around the place is better than saying "use this cache please", even though it does mean we end up with the slightly backwards comparison here back to a bool. i'd rather leave it as is if you're borderline.

ara4n · 2018-08-21T21:48:57Z

@richvdh ptal; have incorporated all changes other than failing to understand why you object to cache == self._state_group_members_cache for determining which cache is being used. i'm also failing to see which bit is confusing in general - is it the splitting of the cache? or the clarity of the code?

richvdh

lgtm

richvdh · 2018-08-21T22:34:29Z

synapse/storage/state.py

+        # and the other for tracking member_events.  This means that lazy loading
+        # queries can be made in a cache-friendly manner by querying both caches
+        # separately and then merging the result.  So for the example above, you
+        # would query the members cache for a specific subset of state types


I think this should be "subset of state keys" ?

richvdh · 2018-08-21T22:34:45Z

synapse/storage/state.py

+        # event IDs for the state types in a given state group to avoid hammering
+        # on the state_group* tables.
+        #
+        # The point using a DictionaryCache is that it can cache a subset


"The point of using"

richvdh · 2018-08-21T22:49:47Z

i'm also failing to see which bit is confusing in general - is it the splitting of the cache?

Don't take my confusion as a criticism of the PR specifically; it's the general complexity of the code as a whole, and a lot of it is down to the fact it's building on a codebase that is already somewhat confusing (as I think we've both agreed) and adding new dimensions of complexity to it.

That said there are things that I wish we weren't doing, though as I said I can't think of plausible better solutions:

the fact that we are basically special-casing members
the fact that _get_state_groups_from_groups takes a tri-state
the fact that said tri-state overlaps in a not-entirely-obvious-way with types
the fact that other bits of code have yet a third way of filtering keys called filtered_types (and _get_state_for_groups_using_cache has all three in one function, yay!)
_get_state_for_groups's magic for splitting the queries among the two caches is magic. (I do think it's better now you've adopted my rewrite, and I don't think that's just NIH syndrome)

ara4n requested a review from richvdh August 20, 2018 23:52

ara4n mentioned this pull request Aug 20, 2018

Initial /sync isn't as fast as it should be when LL is enabled #3720

Closed

ara4n added 2 commits August 21, 2018 03:20

make it work

0cdb1fa

fix tests to match new split state_group_cache

04b7fd5

ara4n changed the title ~~WIP: split the state_group_cache in two~~ Split the state_group_cache in two Aug 21, 2018

ara4n mentioned this pull request Aug 21, 2018

Lazyload findings element-hq/element-web#7182

Closed

8 tasks

ara4n added 2 commits August 21, 2018 03:33

changelog

f2d105f

clean up _get_state_for_groups a bit

a49ceca

richvdh reviewed Aug 21, 2018

View reviewed changes

fix thinko in filtering members from SQL

26c10b5

richvdh suggested changes Aug 21, 2018

View reviewed changes

ara4n added 2 commits August 21, 2018 22:49

fix stupid interpolation bug and incorporate review

cda67b5

incorporate review

3bc1e8e

ara4n assigned richvdh Aug 21, 2018

richvdh approved these changes Aug 21, 2018

View reviewed changes

fix chianti-related typos

aa4d990

ara4n merged commit bb81e78 into develop Aug 21, 2018

hawkowl deleted the matthew/split_state_group_cache branch September 20, 2018 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split the state_group_cache in two #3726

Split the state_group_cache in two #3726

ara4n commented Aug 20, 2018 •

edited

Loading

ara4n commented Aug 21, 2018

richvdh Aug 21, 2018

ara4n Aug 21, 2018

richvdh left a comment

richvdh Aug 21, 2018

richvdh Aug 21, 2018

richvdh Aug 21, 2018

richvdh Aug 21, 2018

richvdh Aug 21, 2018

richvdh Aug 21, 2018 •

edited

Loading

richvdh Aug 21, 2018

ara4n Aug 21, 2018

richvdh Aug 21, 2018

richvdh Aug 21, 2018

ara4n Aug 21, 2018

richvdh Aug 21, 2018

ara4n Aug 21, 2018

richvdh Aug 21, 2018

ara4n commented Aug 21, 2018

richvdh left a comment

richvdh Aug 21, 2018

richvdh Aug 21, 2018

richvdh commented Aug 21, 2018

		@@ -61,7 +61,12 @@ def __init__(self, db_conn, hs):
		super(StateGroupWorkerStore, self).__init__(db_conn, hs)

		self._state_group_cache = DictionaryCache(

Split the state_group_cache in two #3726

Split the state_group_cache in two #3726

Conversation

ara4n commented Aug 20, 2018 • edited Loading

ara4n commented Aug 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh Aug 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ara4n commented Aug 21, 2018

richvdh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh commented Aug 21, 2018

ara4n commented Aug 20, 2018 •

edited

Loading

richvdh Aug 21, 2018 •

edited

Loading