Refetch server data from scratch after server upgrade #4793

gnprice · 2021-06-11T01:12:19Z

Since zulip/zulip#18205, the server sends its new version and feature level in an event each time it restarts; and since #4707, we use that to keep our record of the server's version and feature level up to date.

This is good because it means that when we subsequently make a request to the server, if we have any logic that needs to look at the feature level to interpret the results of that request, we'll use the correct feature level in order to behave correctly.

There still may be edge cases, though, where that isn't quite enough. As I wrote on the chat thread that led to the mentioned changes: there can have been places where we did something relying on the old feature level, and have data sticking around from that that is no longer quite right for the new one. Also places where we have something in flight at the time of the restart -- like some request we decided to make, and prepared something based on the old version, and with the new version that's no longer right.

To be sure to put a bound on the effect of any such edge case, we should notice when a restart event indicates the server got upgraded (for which a change in zulip_feature_level should be enough), and then make sure to reload the data from scratch within some bounded period of time.

Right now we effectively get this automatically, because we already reload the server data from scratch on each app startup, plus whenever we've been offline for 10 minutes so that the queue was expired. Realistically that means we never go more than a day or so without reloading. So there's nothing we need to do about this now.

Once we're otherwise getting close to implementing long-lived event queues, though -- aka #3916 -- it'll start becoming possible for small data errors to persist indefinitely and to accumulate. This issue will be one of the things we should do to prevent such errors, before rolling long-lived event queues out broadly.

The text was updated successfully, but these errors were encountered:

gnprice · 2021-06-11T01:27:20Z

Thinking about how often this logic would cause a refetch, I see on the API changelog:
https://zulip.com/api/changelog
that there were about 35 new feature levels between Zulip Server 3.0 and 4.0. There was about a 10-month period between those releases, so that's roughly one new feature level per week on average. For people using Zulip Cloud (or another server that updates frequently from Git), that means a refetch probably every 1-2 weeks or so, or less often if deploys are less frequent.

That's often enough that it could reduce somewhat the benefit of #3916 -- though every 1-2 weeks is still a lot less often than the status quo, for anyone who uses the app frequently.

In principle, we don't actually have any need to refetch on every upgrade -- only on upgrades that change how the app itself would behave through some feature-detection logic. In other words, when the version and/or feature level moves past one of the values that we compare them against somewhere in the app's code. There's only a handful of those per year; so one strategy that would reduce the number of refetches here to a very low level would be to keep all such versions in a central list somewhere in the code, and only trigger a refetch on an upgrade that changes where the version falls vs. that list.

gnprice · 2021-06-11T01:32:02Z

Also, when implementing this we should be sure to not just reload immediately, in order to avoid hitting the server with a thundering herd of reloading clients. Instead, following the API documentation at https://zulip.com/api/get-events#restart , we should spread the reload out randomly over a period of at least 10 minutes.

This thunk action includes registering an event queue, of course -- but it has also included some unrelated tasks: notification setup and an invocation of the broken outbox subsystem. Make this action only about registering a queue and starting a polling loop on it, pulling out that other stuff. Also change its name to explicitly say what it does: not just the "initial fetch", which I think basically means POST /register [1], but also starting a polling loop, which should always be done synchronously after that. The fetch-PMs call stays in the function just because it's meant to fill in a shortcoming of the register-queue functionality on older servers. We still plan to remove that call when we desupport server 2.1. Possibly the unrelated stuff should be grouped into a new thunk action of its own, but I don't yet see a nice name or description for such a group. Next, we'll move this and its helpers to eventActions.js. [1] See https://zulip.readthedocs.io/en/latest/subsystems/events-system.html#the-initial-data-fetch . But also "initial" can be a bit misleading, because it could misleadingly imply, for example, that this is only meant to run on startup and on first-time connection with an account. We run this with DEAD_QUEUE and ACCOUNT_SWITCH too, and will probably run it on `restart` events for zulip#4793.

gnprice added the a-data-sync Zulip's event system, event queues, staleness/liveness label Jun 11, 2021

gnprice mentioned this issue Jun 11, 2021

accountsReducer: Use restart events to update server info when we can. #4707

Merged

gnprice mentioned this issue Sep 16, 2021

Ensure all server events and their operations are handled #3408

Open

42 tasks

chrisbobbe mentioned this issue Oct 28, 2022

presence: Replace "unavailable" status with "invisible mode" #5531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refetch server data from scratch after server upgrade #4793

Refetch server data from scratch after server upgrade #4793

gnprice commented Jun 11, 2021

gnprice commented Jun 11, 2021

gnprice commented Jun 11, 2021

Refetch server data from scratch after server upgrade #4793

Refetch server data from scratch after server upgrade #4793

Comments

gnprice commented Jun 11, 2021

gnprice commented Jun 11, 2021

gnprice commented Jun 11, 2021