Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTP] How will Kibana server detect a different version UI on serverless? #159127

Closed
jloleysens opened this issue Jun 6, 2023 · 15 comments
Closed
Labels
Feature:http Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)

Comments

@jloleysens
Copy link
Contributor

jloleysens commented Jun 6, 2023

Original issue

Initial ideas about "detecting" a new version from the UI and notifying the user in some way. We might return to this in the future

(1) Adding custom response headers when an old UI build is detected
(2) A dedicated endpoint that gets checked by UIs (not the greatest...)
(3) Using service worker lifecycle events, although this might not result in detecting immediate updates if we rely on the bit-wise detection of an update
(4) ...what else?

@jloleysens jloleysens self-assigned this Jun 6, 2023
@jloleysens jloleysens added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Jun 6, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rayafratkina rayafratkina added the Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) label Jun 6, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

@pgayvallet
Copy link
Contributor

pgayvallet commented Jun 8, 2023

Since the plan is to allow for old Kibana UIs to communicate with Kibana server we should spend some time deciding how we want to communicate that an update is available

I think the first step would be to actually define what that an update is available means, and our system should detect that state / situation.

Is that deployment-based:

  • The Kibana deployment started rolling the new version
  • The Kibana deployment finished rolling the new version

Or just something at the application level:

  • The Kibana server that the UI hit is in a different version

Where I want to go is: do we need our system to communicate with some higher-level service (e.g the Kibana controller or something) to know about this "is an update available" information, or is this something we assume we will be able to detect in isolation, just between our UI and server?

Now my 2 cps regarding how we can communicate that info between client and server:

  • pooling seems pretty bad, and retrieving the info passively (e.g when requests are performed) seems sufficient
  • service workers are ihmo overkill (and would need some significant investment given we don't have anything in place atm. e.g building the worker bundle)
  • header-based propagation works fine with licensing and I don't see why it wouldn't work here?

@lukeelmers
Copy link
Member

service workers are ihmo overkill

++ Just wanted to leave a flyby comment to give a 👎 vote to service workers. Feels like overkill, and like they could add more risk than they help mitigate.

@jloleysens
Copy link
Contributor Author

jloleysens commented Jun 13, 2023

Thanks both @lukeelmers @pgayvallet for the clarifying comments/questions.

The Kibana server that the UI hit is in a different version

I was thinking about it as: the Kibana UI would simply detect that it is talking to a newer server version and that would be the prompt to consider updating.

It could be that assumption is wrong and that is not how we want the "update available" prompt to be triggered. Maybe we want a more definitive "push" to clients that a new version is available not just that you are talking to a new server, but as you point out this requires systems outside just Kibana client/server.

some significant investment

++ service workers would be the most work but do afford leveraging browser capabilities like managing state across multiple open pages (if we wanted that). but I can see the case for not wanting to use this as the time to explore service workers.

header-based propagation

++ also primarily leaning in this direction as it seemed simplest to get working.

...risk than they help mitigate...

Just curious, is there a security risk you are thinking of?

@lukeelmers
Copy link
Member

Just curious, is there a security risk you are thinking of?

Nope, the only risk I'm referring to here is risk to the project, i.e. risk of delays or unforeseen bugs due to starting with a more complex approach than is necessary for the MVP. That doesn't mean we can't go back later and change the implementation if we determine service workers are the only way to achieve the UX we want, I'd just caution against taking the more complex route up front.

@afharo
Copy link
Member

afharo commented Jun 13, 2023

Just dropping my 2-cents:

I think a header returning the buildNumber could be enough for the browser to notice that it's running on a different version (mind that I'm not using newer because rollbacks might be an use case to keep in mind).

I think that the core.http client on the browser, could intercept that header and check it against the currently loaded buildNumber. If different, it could show a warning to the user, requesting them to reload the page ASAP, or that they could face unexpected behavior (I've seen Firefox notifying about that if you happen to upgrade the browser but refuse to restart it). Kind of "refresh or use it at your own risk" message.

At the moment, if not refreshed, I would expect the UX to break as soon as the user requires a new chunk to be downloaded. This may be minimized if, in the future, we decide to use CDN for bundles and assets, but for the MVP, I think this could be good enough for now.

Finally, I'd like to raise some awareness of the main edge case: the moment we are upgrading the nodes, some requests may be served from the new nodes, and some from the old ones. Typically, the proxy is round-robin, so the chances of a single user hitting both nodes are really high (receiving a Please refresh message, no matter which buildNumber they loaded). I recall someone said that a typical use case is to have one Kibana UI node, so we may want to look at this issue post-MVP. Happy to discuss it now, if you think we should handle that situation right now.

@jloleysens
Copy link
Contributor Author

jloleysens commented Jun 13, 2023

UX to break as soon as the user requires a new chunk to be downloaded

Yes, I tested this locally and you get "stuck" as the bundle download returns 404 however the UI did not hard crash in my test case. Still not good.

some requests may be served from the new nodes...

Yeah, this is a good point. I called this the "flip-flop" issue 😄 . Another, related issue is lack of forward compatibility (FC) between UI and server (maybe the new UI expects new versions of endpoints to be available but then gets routed to N-1 server which gives 400).

I was hoping we could mitigate this at the orchestration layer OR we could make detection a little more sophisticated. For example:

  • Delaying when we notify of new UI being ready somehow. There are more and less hacky ways of doing this, we could conceptualise rolling-upgrade time as purely jitter since we expect it to be small -- would be quite a hack.
  • Comprehensive way to address this would be some way for Kibana to actually know when rollout is done and then only notify via update mechanism. This might be something like using ES as an observable for Kibana node statuses. This still would not entirely address potential FC issues though.
  • ...other ideas?

It might be worth waiting for our first set of Kibana releases (not in production) so that we can verify where the biggest issues during the rolling upgrade window are and then consider best ways to mitigate or address. Depending on how we address that, the mechanism for detecting updates (headers) could be orthogonal or do you think of them as fundamentally related @afharo ?

@afharo
Copy link
Member

afharo commented Jun 13, 2023

It might be worth waiting for our first set of Kibana releases (not in production) so that we can verify where the biggest issues during the rolling upgrade window are and then consider best ways to mitigate or address.

++ I think we can address the edge case once we know better its actual impact. I'd apply the 80/20 rule here and move on without thinking about the "flip-flop" edge case. We can address it later :)

@petrklapka petrklapka assigned sebelga and unassigned jloleysens Jun 14, 2023
@rudolf
Copy link
Contributor

rudolf commented Jun 14, 2023

++

Our assumption here are

  • A1 upgrades are relatively infrequent, perhaps every 3 weeks
  • A2 the majority of internal APIs remain compatible between upgrades, just a small portion of APIs could trigger an incompatible UI/server error
  • A3 Upgrades are quick ~70s * number of pods and most clusters have just two pods

I would say the order of priorities are:

  1. Don't lose data or cause unexpected behaviour by allowing incompatible browser/api communication. An "unexpected exception occurred" is already better than losing data.
  2. Improve the UX a little bit by explaining why the above error occured and how to remedy it: Kibana is busy upgrading, please refresh the page.
  3. (potentially post-MVP) Polish the user experience by avoiding the flip-flop problem where users refresh their browser and still get errors and have to try several times until eventually after 5 minutes it starts working. For this we'd need Kibana to know if the upgrade is complete or not.

@petrklapka petrklapka assigned Dosant and unassigned sebelga Jun 20, 2023
@lukeelmers
Copy link
Member

Had a chance to re-read the design doc on this, and wanted to add a few questions/thoughts as I'm not sure if there is an agreed-upon approach yet.

  • Why do we need a dedicated endpoint for checking if the server version has changed, vs using /api/status which already contains the build info?
  • Are we sure we want to automatically show a refresh prompt on every release of the Kibana UI? This is partially a product question (@sixstringcode). If we are assuming that upgrades are relatively infrequent, and that only a small portion of releases will contain breaking changes, do we really want users to be aware every time a new release goes out?
  • I agree with Rudolf's order of priorities above. For MVP we should focus on avoiding data loss, and providing the best possible error state for users who encounter problems. I'm not convinced we should proactively be notifying users when there's a new release until we can have a strategy in place for only surfacing these notifications when we are certain it is necessary... and this isn't something we should rush. For MVP, it feels sufficient to prompt them to refresh only if there's an error, and take our time to polish the rest of the UX post-MVP.

cc @clintandrewhall

@clintandrewhall
Copy link
Contributor

Are we sure we want to automatically show a refresh prompt on every release of the Kibana UI?

I agree with Luke: I'm not convinced we need to do this, yet. We don't know if this is (going to be) a problem yet, and it has a dead simple solution.

On the other hand, I'm in favor of some kind of release note dialog when the system is upgraded for the user experience. They're running this project to be fully managed, let's show them we're managing it with updates. That's a step short of "refresh your browser", though.

The most important thing, in my mind, adding metadata to our logging with UI + Server build information. We could see, when an upgrade is done, how many projects lag. It may turn out that, after an upgrade, 30% of projects are mismatched and we see that drop to 2% in 24 hours. Or we could see a 50% mismatch and a drop to 15% over two weeks. Either way, we'll have data and curves to be able to understand the landscape and if any UX solution is needed.

@jloleysens
Copy link
Contributor Author

Related #161594

@Dosant Dosant removed their assignment Jul 19, 2023
@jloleysens jloleysens changed the title [HTTP] How will Kibana communicate "upgrade available" to UIs [HTTP] How will Kibana server detect a different version UI on serverless? Jul 20, 2023
@jloleysens
Copy link
Contributor Author

Are we sure we want to automatically show a refresh prompt on every release of the Kibana UI?

With the help of product, we decided the answer is "no", at least for now. But I like Clint's idea of having some indication that a new release is ready, but we can defer this to future iterations.


Since our thinking has evolved on the original issue here are my thoughts on technical details of an approach to just detecting old browsers/UIs on Kibana server.

Proposal for detecting old UIs

On serverless we will be deploying versions "between" stack versions but we only communicate the stack version today (i.e. 8.9.0) via the kbn-version header.

We modify the version being sent via kbn-version to something that can represent a new Kibana build (as opposed to just a new stack version). Top of my head I can imagine two approaches:

  1. We update the kbn-version value with a build number: something like 8.9.0+123abcdefg and keep it SemVer compliant.
  2. Change kbn-version to something entirely new like a build hash 123abcdefg, non SemVer compliant

I prefer (1), I also think it will be easier to implement as it remains largely compatible with how this value is used today, but I can also see a case for (2) in that "stack version" is not really a thing on serverless anymore.

Note: we have a way to get a unique build number for Kibana, this will be a number like 64910 not an actual commit SHA. In development we just set this to Node.js' max int. See:

https://github.com/elastic/kibana/blob/a39531edce1c59e18c1f5b77e62cf77ac5e43ef2/src/dev/build/lib/get_build_number.ts

One other point, asked by @jbudz is that using the build number we generate today also communicates a sequence: higher numbers mean newer builds, relevant if we are considering adding a SHA-like value.

Let's discuss some other ideas too!


Detecting a mismatch is step 1, but we'd like to capture/trace this metadata to any errors that result from the request too.

Errors in Kibana can happen for any number of reasons and so we'd need to narrow down what information is readily available at the time of error. One idea to narrow this down is to say that we want to detect any >399 status codes being returned from Kibana to older UIs, when we detect this we log something specific and/or create a telemetry event. However this needs a bit more investigation.

@jloleysens
Copy link
Contributor Author

Closing this issue in favor of #162332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:http Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)
Projects
None yet
Development

No branches or pull requests

10 participants