Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: Use the "load" event instead of polling to catch ready frames #21061

Merged
merged 4 commits into from
Oct 16, 2024

Conversation

mvollmer
Copy link
Member

@mvollmer mvollmer commented Oct 1, 2024

  • TestPages flakes with Firefox

Sorry, something went wrong.

@mvollmer mvollmer added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 1, 2024
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 73cab72 to b919586 Compare October 8, 2024 13:26
@mvollmer mvollmer removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 10, 2024
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch 3 times, most recently from 28f9e82 to 2d23583 Compare October 10, 2024 09:36
@mvollmer
Copy link
Member Author

The two testFrameReload tests are suspicious: I don't know what requirement or feature they are supposed to test. They still pass, but maybe only by accident.

What they do:

  1. modify the DOM directly to remove the "data-ready" attribute of a iframe
  2. modify the DOM to change the "src" attrr of the iframe
  3. wait for the "data-ready" attribute to come back

Step 2) might simulate the code inside the iframe changing its own location, but that is not something Cockpit supports beyond changing the hash. Both the old and new Shell will just react to the "load" event on the iframe, run their hash tracking stuff, and then put the correct "src" attribute back. The old Shell would run its polling loop, but the new Shell just brings the DOM into sync with its own reality.

So, without knowing what testFrameReload is actually testing, I don't know whether the new Shell broke something.

@martinpitt
Copy link
Member

FTR, I'm afraid I don't know what that data-ready hackery in the test should achieve. It goes way back to 2015:

It seems to me that we should test this using the "Active pages" killer in the session menu (with Alt) and then loading it back from the menu -- otherwise it's just internal hackery and meaningless IMHO.

@mvollmer
Copy link
Member Author

It seems to me that we should test this using the "Active pages" killer in the session menu (with Alt) and then loading it back from the menu -- otherwise it's just internal hackery and meaningless IMHO.

Hmm, there is also the "Reload this frame" action in the browser menu. I wonder if we survive that.

37aedec is worried about the router, and that a page needs to be unregistered when it is reloaded. (so that the channels open for the previous incarnation of the frame are closed,) I think this happens now via the "unload" event that the router listens to. I'll experiment a bit with "Reload this frame" as well.

@mvollmer
Copy link
Member Author

I'll experiment a bit with "Reload this frame" as well.

Alright, reloading the frame would correctly unregister it from the router, but when it was loaded, we didn't do the necessary setup, like attaching the event handler for the idle timeout.

So let's attach our own permanent "load" and "unload" handlers and do the setup on every load. There is now also a teardown that unsets the ready flag, which helps with avoiding flicker in dark mode.

The testFrameReload tests should not touch the data-ready attribute, I'd say, but should instead check that open channels are closed properly. It would be best to reload the frame explicitly instead of messing with the "src" attribute. I would guess bidi can do that, right?

@martinpitt
Copy link
Member

@mvollmer: Yes, bidi has a reload() API, we already use it in testlib.py's Browser.reload() -- except that this always does self.switch_to_top() first, i.e. it always reloads the entire page. You could just add an option to it to not do that, and then just have it reload the current frame.

@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 2d23583 to 7455a9e Compare October 10, 2024 11:35
@mvollmer
Copy link
Member Author

You could just add an option to it to not do that, and then just have it reload the current frame.

I tried, but it seems to reload everything always. Also, I got testlib.Error: unknown error: navigation canceled. Kicking the "src" attribute is not wrong, according to StackOverflow, so...

@mvollmer mvollmer added the blocked Don't land until something else happens first (see task list) label Oct 10, 2024
@mvollmer mvollmer marked this pull request as ready for review October 10, 2024 11:38
@mvollmer
Copy link
Member Author

Let's get this in as a followup to #21012.

@mvollmer mvollmer force-pushed the shell-no-frame-polling branch 2 times, most recently from 341b8d4 to 06995fd Compare October 10, 2024 13:06
@mvollmer mvollmer removed the blocked Don't land until something else happens first (see task list) label Oct 10, 2024
Comment on lines +76 to +79
if (frame.ready) {
frame.ready = false;
state.update();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. they should. I'll check.

@mvollmer
Copy link
Member Author

Step 2) might simulate the code inside the iframe changing its own location, but that is not something Cockpit supports beyond changing the hash.

Just to clarify: step 2 is very likely only supposed to reload the frame without changing the url, not simulate loading a different url inside the frame.

And only show the frame once it is loaded. This side-steps the
problems with showing a white not-yet-loaded iframe in dark mode.

The two testFrameReload tests have been changed to more directly check
for what we want: The channels open by the previous incarnation of the
frame should be closed when it is reloaded. This also means we don't
need the "data-ready" attribute anymore in the DOM.
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 6e6015b to 196edc0 Compare October 15, 2024 09:34
Comment on lines +76 to +79
if (frame.ready) {
frame.ready = false;
state.update();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed. We need to add the "unload" handler once every "load" event, since it sits on the contentWindow, which is a different one on every load.

This might remove a flake, and it also removed the OSTree special
case, somehow.
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 196edc0 to 0bdc149 Compare October 15, 2024 13:01
@mvollmer
Copy link
Member Author

TestPages.testBasic and TestPages.testHistory are flaking pretty consistently on Firefox now. I'll check this out more.

@mvollmer
Copy link
Member Author

TestPages.testBasic and TestPages.testHistory are flaking pretty consistently on Firefox now. I'll check this out more.

Also testMenuSearch. I hope we have only uncovered existing races in the tests.

@martinpitt
Copy link
Member

TestPages.testHistory likely runs into https://bugzilla.mozilla.org/show_bug.cgi?id=941146.

-> we need a naughty for this.

Perhaps just skip the test (or that part of the test) on Firefox?

@mvollmer
Copy link
Member Author

@mvollmer
Copy link
Member Author

mvollmer commented Oct 16, 2024

Perhaps just skip the test (or that part of the test) on Firefox?

Nah, it's a good test for tricky business (getting browser history right) and we really should run it on Firefox. It does reach the end successfully. There is just the occasional oops. We could allow oopses, but that would be too broad.

We could invent machinery that ala allow_journal_messages, that allows specific oopses, but this here really is a bug in Firefox that might conceivably be fixed some day.

@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 0c1cea4 to 6ffdfcd Compare October 16, 2024 08:28
@mvollmer
Copy link
Member Author

@martinpitt
Copy link
Member

The naughty landed -- so a retry of the amplified run should go green now, with the naughty hitting?

Comment on lines 189 to 191
b.go("/system")
b.enter_page("/system")
b.switch_to_top()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really weird -- can you please mark this as # HACK? its really not obvious why this happens. open_lang_modal() already does a switch_to_top(), and that is an entirely trivial operation (just setting an internal variable), so that line should go. More importantly, it's not clear why the language switcher would work from one frame but not the other one. Is that some "mouse click gets blocked by some element" situation again? how does it look like?

Copy link
Member Author

@mvollmer mvollmer Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about switch_to_top, I'll remove that.

The problem was doing the go("/system") after changing the language. It sometimes didn't have any effect. Changing the language always works.

I can debug this further, but I could never reproduce it locally, so that's going to be a bit of a pain...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ahh, I think I know. The router starts working a bit later since the shell rewrite, because I though there can be no messages before we create iframes. But tests actually send messages to implement b.go, and if they come to early after a reload, we'll miss them. So let's wait for the shell to be fully loaded again before navigating.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes a flake.
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 6ffdfcd to e8608f3 Compare October 16, 2024 09:52
@mvollmer mvollmer removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 16, 2024
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from e8608f3 to f21b647 Compare October 16, 2024 10:02
@mvollmer
Copy link
Member Author

Moved to #21125

Sorry, something went wrong.

@mvollmer
Copy link
Member Author

The naughty landed -- so a retry of the amplified run should go green now, with the naughty hitting?

In principle yes, but the naughty pattern is too tight and doesn't trigger on a test named testHistory6...

martinpitt
martinpitt previously approved these changes Oct 16, 2024
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A work of art! 🙇

@mvollmer
Copy link
Member Author

Hmm, TestPages.testMenuSearch on ubuntu also flakes. Some service error show up later in the test although they have been cleared at the beginning. Let's fix that as well...

@mvollmer
Copy link
Member Author

Hmm, TestPages.testMenuSearch on ubuntu also flakes. Some service error show up later in the test although they have been cleared at the beginning. Let's fix that as well...

It's fwupd-refresh.service that runs on a timer. It probly fails because it has no network connection.

Let's just reset the failure state immediately before the screenshot, that should make the time window impossible small to hit....

On ubuntu-stable, the fwupd-refresh service would otherwise sometimes
manage to fail between resetting the failures at the start of the test
and the time we take the screenshot.
@martinpitt
Copy link
Member

Thanks for tracking that down. That also seems fine to disable in test/vm.install.

@mvollmer mvollmer added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 16, 2024
@mvollmer
Copy link
Member Author

That also seems fine to disable in test/vm.install.

Yeah, but then it's a different service that fails later on...

@martinpitt
Copy link
Member

Let's just reset the failure state immediately

Ah, you meant a global systemctl reset-failed. Yes, that seems right.

@mvollmer mvollmer removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Oct 16, 2024
@mvollmer mvollmer force-pushed the shell-no-frame-polling branch from 36bf06f to 54a3ccd Compare October 16, 2024 12:17
@mvollmer
Copy link
Member Author

Oh no, the naughty pattern doesn't match, probably because the text it matches against doesn't actually contain the # testHistory header... Without that header, the pattern is very unspecific, hmm.

@mvollmer
Copy link
Member Author

mvollmer commented Oct 16, 2024

Oh no, the naughty pattern doesn't match, probably because the text it matches against doesn't actually contain the # testHistory header... Without that header, the pattern is very unspecific, hmm.

Luckily @martinpitt figured it out. Extra whitespace, fixed.

@mvollmer mvollmer requested a review from martinpitt October 16, 2024 14:44
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it! 🚀

@martinpitt martinpitt merged commit fe19608 into cockpit-project:main Oct 16, 2024
80 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants