Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unread: Convert unread.streams to Immutable, and update efficiently #4448

Merged
merged 14 commits into from
Mar 18, 2021

Conversation

gnprice
Copy link
Member

@gnprice gnprice commented Jan 30, 2021

This makes us quite a bit faster at handling a message being marked as read! That's a frequent and latency-sensitive path -- whenever the user is reading a conversation and sees some messages that were unread, this is part of the path to updating the "N unreads" banner at the top of their view. Speeding it up is #4438.

Specifically, in measurements with a test user with lots of unreads, the total time we spend handling an EVENT_UPDATE_MESSAGE_FLAGS action goes from about 300ms to 110-120ms -- an improvement of about 2.5x, or 180ms. (Details on measurement in the commit messages.)

Of that time, about 70-80ms was going into the reducer. After these changes that's 7ms in the slowest sample I've seen, or as little as <=1ms (the threshold of measurement).

The two key changes that get us this improvement are:

The 110-120ms we're still spending, almost all of it now outside the reducer, still isn't great. But it's enough better that I think further optimization is no longer a top-priority thing for me to work on; and because the remaining problem isn't inside the reducer where I've been working and have built up the perf-logging tools to iterate on, it's beyond the scope of where it's just easy to keep going. So with this I'm declaring victory on #4438, the task of making this "unread" model efficient at marking a message as read.

When we do pursue further optimization in that direction, I think a key thing to do will be to have this reducer start taking over more of the jobs of the selectors that currently consume it. At present, when a update is to be made, the reducer efficiently updates the data in state.unreads... and then selectors like getUnreadStreamsAndTopics go and read through that whole tree and rebuild their own outputs from scratch. Instead, we should make an efficient incremental update to that data, too. In addition to saving time in the selector, this will mean that subparts of the output data remain ===-identical other than the specific parts that are updated, and that may enable optimizations in the React component tree.

Fixes: #4438

Copy link
Contributor

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm glad to have this! See a few small comments, below.

// e.g. the `setIn` call could be completely wrong and the type-checker
// wouldn't notice.
const result: Immutable.Map<number, Immutable.Map<string, Immutable.List<number>>> =
Immutable.Map().asMutable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of how withMutations would be any more efficient than .asMutable() and .asImmutable(), but I notice that the doc says this on asMutable (emphasis mine):

If possible, use withMutations to work with temporary mutable copies as it provides an easier to use API and considers many common optimizations.

…Ah, and never mind, I see this goes away in a later commit. 🙂

@@ -253,6 +253,9 @@ const migrations: { [string]: (GlobalState) => GlobalState } = {
// Convert `messages` from object-as-map to `Immutable.Map`.
'23': dropCache,

// Convert `unread.streams` from over-the-wire array to `Immutable.Map`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

91ccf50 unread: Convert unread.streams to Immutable!

After this change, it's usually <=1ms (too small to measure with our existing tools), and the longest I've seen is 3ms.

Not that we necessarily need to use it here, but I believe we have a new tool that lets us measure <=1ms times, courtesy of RN v0.63! 🙂 facebook/react-native@232517a

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, yeah, I guess this remark is out of date!

const initialStreamsState: UnreadStreamsState = Immutable.Map();

// Like `Immutable.Map#map`, but with the update-only-if-different semantics
// of `Immutable.Map#update`. Kept for comparison to `updateAllAndPrune`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Like Immutable.Map#map, but with the update-only-if-different semantics
// of Immutable.Map#update.

Strange, I guess I'd have thought this should be built in to Immutable.Map#map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least that they had something else with a different name, that does this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the idea is that if you're running through all the items in the data structure, then you're probably doing some transformation where the new value is some entirely different kind of thing from the old value, not where you're taking the old value and just maybe modifying it a bit.

If you are making such a transformation, so that all or nearly all the items will be new, then it's likely more efficient to allocate a single new thing and fill that in from scratch, which is what I expect the implementation of map to do (though I haven't checked), than to start from the old and make a bunch of intermediate updates.

There's also the fact that if you have an algorithm that relies on something like updateAll, where only a few or potentially no elementsare actually being changed, then you're probably better off finding a variation of your algorithm where you go straight to those elements and don't sweep through all the others looking for them. As we do later in this branch 🙂

});
}

// Like `updateAll`, but prune values equal to `zero` given by `updater`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Like updateAll, but prune values equal to zero given by updater.

—with an exception, I think. It doesn't prune a value equal to zero given by updater if the value was equal to zero in the first place, before running it through updater.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm true!

That never matters in our usage here, because we have an invariant that there never values equal to zero. But certainly the comment should match the behavior.

Effectively this lets the "update only if different" semantics of update take precedence over the "prune" semantics. I don't have a firm view on which way is better from an API-design perspective.

@@ -142,19 +142,28 @@ describe('stream substate', () => {
};
};

const streamAction = args => mkMessageAction(eg.streamMessage(args));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e0874fd unread: Use state.messages to optimize updating stream unreads.

as little as <1ms (the threshold of measurement)

(same comment about sub-millisecond timing)

const minSize = Math.min(list.size, toDelete.size);
let i = 0;
for (; i < minSize; i++) {
// This loop takes time O(log n) per iteration, O(k log n) total.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(k log n) total.

nit: Possibly not worth mentioning, but isn't it O(minSize log n), i.e., the lesser of O(k log n) and O(n log n)?

I can't think of why k would be greater than n, but I see that you've sensibly used minSize in case that happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- it is that, but that makes it also true that it's O(k log n)... and for that matter it's O(n log n), and O(n^2), and O(2^n) and anything else that's bigger 🙂 The definition of the "f = O(g)" notation is such that it behaves a lot like a form of less-than-or-equal-to, not like equal-to. It's a bit of a confusing choice of symbols for that semantics, but a very handy semantics.

One reason that semantics is handy is that it lets you state an upper bound, and have it be true even if there's some more-precise upper bound that would also be true. Here, it'd be kind of pathological for k to be more than n -- it would mean we're being informed of some messages marked as read when we didn't know about them as unreads in the first place. The code's behavior needs to be correct in that case (the case can happen if e.g. the user has >50k unreads and goes and catches up on some of the older of them), and also our analysis should be truthful about that case, but it's fine for the analysis to not state the tightest possible bounds about it.

Similarly, for the overall bound lower down:

 * existing messages is much larger.  Specifically the time spent should be
 * O(N' log n + c log C), where the messages to delete appear in c out of a
 * total of C conversations, and the affected conversations have a total of
 * N' messages and at most n in any one conversation.  If the messages to be

the "at most n in any one conversation" thing probably isn't the most precise bound that'd be possible to give here, but it is accurate, and a more precise one would I think be more complicated to state (involving like the distribution of sizes of different conversations), so this is a convenient bound for thinking about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right! Thanks!

continue;
}
const { stream_id, subject: topic } = message;
mut.updateIn([stream_id, topic], l => (l ?? Immutable.List()).push(id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could use a default parameter, like this, elsewhere in the branch:

      // prettier-ignore
      return state.updateIn([message.stream_id, message.subject],
        (perTopic = Immutable.List()) => perTopic.push(message.id));

?? unnecessarily handles l being null.

@@ -142,19 +142,28 @@ describe('stream substate', () => {
};
};

const streamAction = args => mkMessageAction(eg.streamMessage(args));
const messages = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e0874fd unread: Use state.messages to optimize updating stream unreads.

So with this I'm declaring victory on #4438, the task of making this "unread" model efficient at marking a message as read.

This is fine even though the branch just improves handling for stream messages (not PMs or mentions), right? 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Hmm, it looks like I may not have written out this part of my reasoning: I expect that it's very rare for the other parts of this data structure to get to be real giant, like the streams part easily does.

You can get 50k unread stream messages just by being in a busy org like chat.zulip.org, being subscribed to some busy streams (perhaps by default), and just not staying caught up all the time.

To get to even 1k unread PMs or unread mentions, or even 100 of them, seems a lot harder. You could get 100, perhaps, by being in a group-PM thread where some other people are talking to each other a lot; or from @everyone mentions in a busy org where those are common practice.

So I think making the other parts of this data structure efficient will have much less of a direct performance benefit. It'd still be good to do it, but the main benefits will be to deduplicate this logic (there's a lot that's shared between the four different reducers here) and perhaps to simplify some of the code that consumes this data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Feb 2, 2021

There's a sketchy check in the replacer logic for Immutable.Maps with numeric keys; we'll likely want to fix that before merging this. See discussion. Fixed in #4463!

@chrisbobbe
Copy link
Contributor

@gnprice, this looks good except for a couple of small things, mentioned above (like #4448 (comment)). Do you have another revision on the way, or are you waiting for me for something? 🙂

gnprice added 14 commits March 17, 2021 22:35
A couple of bugs slipped by me at first in draft changes for
converting the "unreads" model to Immutable, because I wasn't
anticipating this.
We're going to switch the data structure we use for this to something
that's efficient to keep updated (zulip#4438).  The new one will be just
as well suited for code that consumes this data structure as the old
-- in fact better for some of them -- so we'll have the consumers
switch to using the new data structure, too.  That means the selector
that provides the old data structure is going away.

As a first step, we rename the old selector to a "legacy" name,
making room for a selector providing the new data structure to
take its name.
This provides a data structure that's just like the one we're going
to switch to maintaining as the state internally.  We'll migrate
consumers to use this one, and then we can change out the state and
reducer underneath it.

As we migrate consumers to this selector, it may initially cause a
certain performance regression: when the "unread" data structure gets
updated, we'll promptly be making this (reformatted) copy of it, as
the selectors that use it go to look for the new data.  But that extra
work will then get eliminated when we switch to this being the data
structure that we maintain in the first place.
The logic gets slightly simpler because we get to do our whole
computation for each stream all at once before moving on to the next.
This one also gets slightly simpler because we now get to handle
each entire stream all at once in turn.
This gets rather more efficient, because we no longer need to scan
through the entire data structure and can instead go straight to the
particular stream, or stream and topic, that we're interested in.
Oof, this had been quadratic!  We would scan through the entire
unreads data structure, looking just for one particular entry...
again and again for each topic in turn.

Now we just go straight to the single entry for this particular topic.
Much better.
We've converted all its consumers to use the new `getUnreadStreams`
data structure instead... except for the `getUnreadStreams` selector
itself.  That one will simplify out when we switch to maintaining
the new data structure in Redux, coming next.
Getting the same message ID twice just shouldn't happen -- it'd be
a break in the contract of the server API.  If some server bug does
ever cause it to happen, and we don't notice and just include the
additional message ID here, nothing particularly bad will happen;
at worst a glitch where the unread counts are too high.

Meanwhile, maintaining this data structure is a hot path, so it
won't make sense for us to defensively check for such a server bug.
Take out the test that checks for it.
This makes us quite a bit faster at handling a message being marked
as read!  That's a frequent and latency-sensitive path -- whenever
the user is reading a conversation and sees some messages that were
unread, this is part of the path to updating the "N unreads" banner
at the top of their view.  Measurements described below.

As we convert the other parts of the unread state, we'll want to fold
their reducers into this file too, and in fact combine the logic.  No
need to have four copies of all this, most of which is the same.

While making this conversion, I notice that this reducer doesn't reset
state on LOGIN_SUCCESS like it does for LOGOUT and ACCOUNT_SWITCH.  In
general we're pretty consistent about resetting state on those latter
two, but many of our reducers do so on LOGIN_SUCCESS while many others
don't.  I've filed zulip#4446 for fixing them all up to be consistent.

Performance measurements:

I made some manual performance measurements to evaluate this change
and the others in this series.  I used a test user with lots of
unreads on chat.zulip.org, on a few-years-old flagship phone: a
Pixel 2 XL running Android 10.  The test user has 50k unreads in this
data structure (that's the max the server will send in the initial
fetch), across about 3400 topics in 27 different streams.

Before this series, on visiting a topic with 1 unread, we'd spend
about 70ms in this reducer, which is a long time.  We'd spend 300ms in
total on dispatching the EVENT_UPDATE_MESSAGE_FLAGS action, including
the time spent in the reducer.  (The other time is probably some
combination of React re-rendering the components that use this data;
before that, our selectors that sit between those components and this
data recomputing their own results; and after that, React Native
applying the resulting updates to the underlying native components.
We don't yet have clear measurements to tell how much time those each
contribute.)

After this change, we spend about 30-50ms in the reducer, and a total
of 150-200ms in dispatch.  Still slow, but much improved!  We'll speed
this up further in upcoming commits.

For EVENT_NEW_MESSAGE, which is the other frequent update to this data
structure, not much changes: it was already "only" 4-9ms spent in this
reducer, which is too slow but far from our worst performance problem.
After this change, it's usually <=1ms (too small to measure with our
tools, until recently), and the longest I've seen is 3ms.  The total
dispatch time varies widely, like 70-200ms; not clear if it changed.

There is one performance regression: we now spend about 100ms here on
REALM_INIT, i.e. on handling the data from the initial fetch.
Previously that time was <=1ms; we just took the data straight off the
wire (well, the data we'd already deserialized from the JSON that came
off the wire), and now we have to copy it into our more
efficiently-usable data structure.  As is, that 100ms is already well
worth it: we save more than 100ms, of visible latency, every time the
user reads some unread messages.  But it's enough to be worth
optimizing too, and we'll do so later in this series.

Fixes-partly: zulip#4438
This is one property that only even becomes possible with the new
data structure!

This is useful in part because it might help downstream consumers
of the data make an optimization, by not recomputing their own work
if the particular piece of this data that they're interested in
hasn't changed.

The more immediate motivation for the test, though, is that it
effectively checks that we have a certain optimization working
within this model's own implementation.  A naive implementation
might end up copying the maps for all the streams when only one or
a few of them actually needed to change; in fact an early draft of
this implementation did just that.  If instead we successfully
returned the old maps, then that probably means we successfully
avoided doing the work of allocating new ones.
Before this commit, when a message was marked as read we'd have to
search through this entire data structure to find where it was so we
could remove it.  In fact, even if the message was actually a PM we'd
end up searching through this whole data structure which is entirely
about stream messages, because we didn't use that information.

The same thing was true with the old data structure, before this
series.

Much better would be, when a message in a particular conversation gets
marked as read, to go straight to that particular conversation's part
of the data structure and update that without having to search through
anything else.  Do that.

Knowing what conversation the message is in requires looking that
information up in our data structures.  Happily we can do that now
(and without an intrusive hack like we've sometimes done in the past):
that was zulip#4437.

This reduces the time spent in this reducer to 7ms in the slowest
sample I've seen, or as little as <1ms (until recently the threshold
of measurement), and the total time spent in dispatch to 110-120ms.
Those compare with 30-50ms reducer / 150-200ms total before this
commit, and with 70ms reducer / 300ms total before the whole series,
using the old data structure.  (Based on measuring the same way as
described a few commits ago.)  So that's an improvement of about 2.5x,
or 180ms!

The 110-120ms we're still spending, almost all of it now outside the
reducer, still isn't *great*.  But it's enough better that I think
further optimization is no longer a top-priority thing for me to work
on; and because the remaining problem isn't inside the reducer where
I've been working and have built up the perf-logging tools to iterate
on, it's beyond the scope of where it's just easy to keep going.  So
with this I'm declaring victory on zulip#4438, the task of making this
"unread" model efficient at marking a message as read.

Fixes: zulip#4438
This substantially reduces the time we spend in constructing this
data structure at the initial fetch: from about 100ms, it goes down
to about 55ms.

The basic principle here is that if you have a whole bunch of items
you're going to build a data structure out of, it's faster to
construct the data structure with all of them at once than to start
with an empty data structure and add them one at a time.  That's
already what we routinely do in simple cases -- and in fact there's
a typical example at the innermost part of this code, the
`Immutable.List(unread_message_ids)`.

For this data structure as a whole (the Map of Maps of Lists), the
reason we didn't was just that the input comes to us as individual
topics, not grouped by stream.  But we can group them as a
preprocessing step, and then construct each Immutable data structure
all in one go without updates.  For the preprocessing step, we do need
a data structure we build up incrementally... but we can use plain
mutable built-in data structures for that, as it's temporary and
local; and even use just a plain Array for each stream's topics,
rather than any kind of Map, because we're only going to run through
the whole thing and never try to look up a particular item, or update
a particular existing item in it.  Those have a lot less overhead than
Immutable.Map, partly because they're built into the JS engine and
partly because an Array is a much simpler data structure than a map.
Should be helpful when converting over the other parts of the
unread state.
@gnprice gnprice merged commit 09cbcec into zulip:master Mar 18, 2021
@gnprice
Copy link
Member Author

gnprice commented Mar 18, 2021

Made those changes, and merged! Thanks for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make "unread" model efficient at marking a message as read
2 participants