Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebViewWrapperFuture handling on job termination timeout #1752

Merged
merged 2 commits into from
Feb 4, 2025

Conversation

amartya4256
Copy link
Contributor

@amartya4256 amartya4256 commented Jan 22, 2025

This Pull Request introduces following behaviour in Edge browser:

  • Encapsulates the ICoreWebView*::Release for all the different webView* instances in the WebViewWrapper in a method WebViewWrapper::releaseWebViews
  • The futures complete exceptionally if there occurs a timeout at processOSMessagesUntil
  • If the webViewWrapperFuture finishes exceptionally (during the initialization of webViews), there's a streamlined rollback of all the resources.

contributes to #1466 and #2734

@amartya4256 amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 7d16ca0 to 8195777 Compare January 22, 2025 12:01
@fedejeanne
Copy link
Contributor

fedejeanne commented Jan 22, 2025

I would additionally reduce the timeout introduced in #1677 to 10 seconds (+5 seconds from your new timeout) just to make sure that your change here actually covers all failing tests and use cases. If it does then I would expect to see no timeouts hitting 10s, only timeouts after 5s.

WDYT?

Copy link
Contributor

github-actions bot commented Jan 22, 2025

Test Results

   494 files  ±0     494 suites  ±0   9m 41s ⏱️ - 1m 47s
 4 333 tests ±0   4 319 ✅ ±0   14 💤 ±0  0 ❌ ±0 
16 574 runs  ±0  16 465 ✅ ±0  109 💤 ±0  0 ❌ ±0 

Results for commit 753b503. ± Comparison against base commit 009532c.

♻️ This comment has been updated with latest results.

@amartya4256
Copy link
Contributor Author

I would additionally reduce the timeout introduced in #1677 to 10 seconds (+5 seconds from your new timeout) just to make sure that your change here actually covers all failing tests and use cases. If it does then I would expect to see no timeouts hitting 10s, only timeouts after 5s.

WDYT?

Alright, I'll add it here.

@amartya4256 amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 04d2500 to ae4fcac Compare January 24, 2025 15:34
Copy link
Contributor

@HeikoKlare HeikoKlare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have another thorough look at what may cause Edge initialization to block the UI thread. When trying to test the timeout logic, I came across several places where I can introduce blocking operations that will not be caught by this proposal.

When systematically going through the public methods of Edge, I see that the calls to getWebView*() and isWebView*Availabile() are the ones that block in case initialization or any of the subsequently scheduled futures do not finish timely.
Then, taking a look at which calls may potentially block inside those methods, I see these two:

  • waitForFutureToFinish()
  • future.join()

The latter will be captured for the initialization future by the timeout added in this PR. The former will only partly be resolved, as inside waitForFutureToFinish() there is a call to processNextOSMessage(), which spins a loop that may be blocking as well. Thus, I think we need to break that potentially blocking operation as well. Maybe we can simply spin Display.readAndDisplay() in the loop checking for future completion (inside waitForFutureToFinish()) instead of having the additional loop inside processNextOSMessage()?

Then, what about the other callers of processNextOSMessage()? In particular, createEnvironment() is called synchronously when creating the first Edge instance. May that one block as well? I also see further calls of that method of which I am not sure if they are correct. We need to check them as well.

webViewFuture.orTimeout(1, TimeUnit.MILLISECONDS).exceptionally(exception -> {
// Throw exception on the Display thread directly to prevent CompletableFuture
// to wrap the exception and throw it silently
browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of only throwing an error, should we maybe hide the (uninitialized) browser widget and add a label informing the user about the failed initialization instead? Calling something like this inside the execution on display might do the trick:

	private void replaceWithErrorLabel() {
		browser.setVisible(false);
		Label errorLabel = new Label(browser.getParent(), SWT.WRAP);
		errorLabel.setForeground(browser.getDisplay().getSystemColor(SWT.COLOR_RED));
		errorLabel.setText("Edge browser initialization failed");
		errorLabel.setLocation(0, 0);
		errorLabel.setSize(browser.getSize());
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like his idea. But how about we combine this with also throwing an error even though it doesn't pop up in a dialog. The error is atleast visible in the Error Log of the IDE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we don't use this execute method here, it will sill throw the error but the error will be wrapped in CompletionException and not SWTException since, the errors in the CompletableFuture are caught implicitly and wrapped with CompletionException. In case of SWT.error as well, I see the exception to be silent and there's no dialog coming up. What do you prefer? Is that okay to not wrap the exception with SWT.error and let it throw the CompletableFuture CompletionException and just replace the browser with a label?
image
image
The image above is for when I dont throw a wrapped SWT exception and let the future handle it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still be in favor of logging some kind of error. Even though the user will primarily see the information in the UI, someone debugging issues may only have a look at the logs, which is why having such an error documented there is reasonable.

Copy link
Contributor Author

@amartya4256 amartya4256 Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so replaceBrowser with label and log an error? the error is logged automatically by the future of exception. Do you mean that or should we throw an exception explicitly using, browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out"));?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion this, but I see two things to consider:

  1. If possible, we should avoid throwing a future-specific exception to consumers of SWT/Edge and rather provide proper information inside an SWTException to them.
  2. We should have a look at the where consumers may face that exception and how that conforms to the existing APIs they are using (usually those of Browser).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one is interested in the exception, get() is better than join()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going with browser.getDisplay().execute(() -> SWT.error(SWT.ERROR_UNSPECIFIED, exception, "Edge Browser initialization timed out")); since with this we make sure our SWT exception is not caught by CompletableFuture and silenced on timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very strange way to print an exception... and it will require a synchronous call to the SWT thread... why not use display.getErrorHandler () directly?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that looks better. But do you think its is good to check whether to use RunTimeHandler or ErrorHandler?

@amartya4256 amartya4256 linked an issue Jan 26, 2025 that may be closed by this pull request
@amartya4256 amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 8b86b9b to 893b4dd Compare January 27, 2025 13:25
@amartya4256
Copy link
Contributor Author

When systematically going through the public methods of Edge, I see that the calls to getWebView*() and isWebView*Availabile() are the ones that block in case initialization or any of the subsequently scheduled futures do not finish timely. Then, taking a look at which calls may potentially block inside those methods, I see these two:

  • waitForFutureToFinish()
  • future.join()

I evaluated your points and i have implemented everything into a webviewWrapper which is responsible for containing all the webviews and we need just one future (webViewWrapperFuture) now which provides us this wrapper from where all the webview* can be obtained, so now the dependency flow is like this: webViewWrapperFuture -> lastWebViewTask. for every webView*, it will be responsible - in case of a webViewWrapperFuture timeout, everything times out.

@amartya4256
Copy link
Contributor Author

The latter will be captured for the initialization future by the timeout added in this PR. The former will only partly be resolved, as inside waitForFutureToFinish() there is a call to processNextOSMessage(), which spins a loop that may be blocking as well. Thus, I think we need to break that potentially blocking operation as well. Maybe we can simply spin Display.readAndDisplay() in the loop checking for future completion (inside waitForFutureToFinish()) instead of having the additional loop inside processNextOSMessage()?

Then, what about the other callers of processNextOSMessage()? In particular, createEnvironment() is called synchronously when creating the first Edge instance. May that one block as well? I also see further calls of that method of which I am not sure if they are correct. We need to check them as well.

I am wondering if addressing processNextOSMessage is necessary after this, since we are looping until the future is done eventually and it can happen either if the initialization completes or the timeout, so there always exists a path with which future will be done. In terms of temporal logic, this is verified: A F webViewWrapperFuture.isDone()

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:

while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}

Other than that, I think we should retest the new implementation and try to see if we still have tests / workspcaes freezing. If no freezes, at least we will have logs for the timeout.

@HeikoKlare
Copy link
Contributor

I evaluated your points and i have implemented everything into a webviewWrapper which is responsible for containing all the webviews and we need just one future (webViewWrapperFuture) now which provides us this wrapper from where all the webview* can be obtained, so now the dependency flow is like this: webViewWrapperFuture -> lastWebViewTask. for every webView*, it will be responsible - in case of a webViewWrapperFuture timeout, everything times out.

That sounds good. There are several good proposals in this PR now. Can we please separate into different smaller PRs to ease reviewing and testing? In particular, putting the web views inside a wrapper captured by one future is something that we might do as a simple "cleanup" and preparatory PR for this.

@HeikoKlare
Copy link
Contributor

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:

while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}

This code will put the thread to sleep until some OS message is received. If for some reason no message arrives anymore, this will block the whole application. Not sure how likely it is, but I would be in favor of avoiding the risk. As mentioned above, it might be possible to simplify this code anway.

@fedejeanne
Copy link
Contributor

The question about if processNextOSMessage is a blocking anything: what kind of messages are processed in there and what control do we have? Looks like, we are just calling display.readAndDispatch(), which is just processing the messages on the display thread from the OS, if something unrelated to Edge was blocking it here, it could have been blocked anywhere in the platform. And We have all the handlers well defined to handling the Edge messages in the callbacks properly, my concern is about the block:

while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
		display.sleep();
	}

This code will put the thread to sleep until some OS message is received. If for some reason no message arrives anymore, this will block the whole application. Not sure how likely it is, but I would be in favor of avoiding the risk. As mentioned above, it might be possible to simplify this code anway.

Can it be that this already happened? I'm looking at the failing tests and I see this thread-dump:

"main" prio=5 Id=1 RUNNABLE
	at app//org.eclipse.swt.internal.win32.OS.PeekMessage(Native Method)
	at app//org.eclipse.swt.browser.Edge.processNextOSMessage(Edge.java:481)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.waitForFutureToFinish(Edge.java:460)
	at app//org.eclipse.swt.browser.Edge$WebViewProvider.getWebView(Edge.java:383)
	at app//org.eclipse.swt.browser.Edge.getUrl(Edge.java:910)
	at app//org.eclipse.swt.browser.Browser.getUrl(Browser.java:771)
	at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.createBrowser(Test_org_eclipse_swt_browser_Browser.java:309)
	at app//org.eclipse.swt.tests.junit.Test_org_eclipse_swt_browser_Browser.setUp(Test_org_eclipse_swt_browser_Browser.java:185)
	...

Maybe because the tests produce less OS messages and therefore the issue with the empty queue is more likely to happen? What I'm thinking is: an environment without a mouse (e.g. the test environment) probably generates way less OS events.

@laeubi
Copy link
Contributor

laeubi commented Jan 29, 2025

@fedejeanne Display.sleep() is very dangerous and you are correct that it can sleep "forever" (many seconds on linux if you don't move the mouse).

@@ -316,10 +316,23 @@ ICoreWebView2 initializeWebView(ICoreWebView2Controller controller) {
return webView;
}

private CompletableFuture<WebViewWrapper> initializeWebViewFutureWithTimeOut() {
CompletableFuture<WebViewWrapper> webViewWrapperFuture = new CompletableFuture<>();
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionally(exception -> {
webViewWrapperFuture.orTimeout(3, TimeUnit.SECONDS).exceptionallyAsync(exception -> {...}, browser.getDisplay())

@laeubi
Copy link
Contributor

laeubi commented Jan 29, 2025

Just a general remark here:

The usage of CompleteableFuture here (even though very convenient and powerful) does not seem to be the proper way when looking at the overall design here.

Instead one should probably better use a Task-List design here, so that different task are processed one after the other, this could be for example a (single thread) ExecutorService where one submit(...) tasks, or a Queue that is processed by a special thread (e.g. if SWT thread is needed) unless it is empty.

If ExecutorService is used then one can also combine it with CompleteableFuture, but then it should be used in an event driven way without ever using a join().

Beside that, if a browser control fails to initialize, I think it is fine to just show nothing / something broken / whatever to the user so no need to replace it with something else.

@laeubi
Copy link
Contributor

laeubi commented Jan 29, 2025

When systematically going through the public methods of Edge

I think the key is actually: What public API methods really require blocking operations.

These methods should not use join() at all, they must use a busy loop that drives the event queue while waiting on this similar to what we do here (stage == future!):

stage.handle((nil1, nil2) -> {
if (!display.isDisposed()) {
try {
display.wake();
} catch (SWTException e) {
// ignore then, this can happen due to the async nature between our check for
// disposed and the actual call to wake the display can be disposed
}
}
return null;
});

while (!future.isDone() && !display.isDisposed()) {
if (!display.readAndDispatch()) {
display.sleep();
}
}

the wake is important to break out the sleep in a timely way!

@HeikoKlare
Copy link
Contributor

Thanks for the feedback! The proposals make sense and revising the design again might be a good thing. But we should move that discussion into a separate issue. This PR is part of efforts to mitigate the risk of freezes / UI thread blocks by using Edge for the upcoming release. For that reason, we need to improve the current design instead of revising the design, which we may defer to the next development cycle.

@laeubi
Copy link
Contributor

laeubi commented Jan 29, 2025

@HeikoKlare as a workaround replace all parts that do a join() on the event thread with this code:

public static void waitForFuture(CompletableFuture<?> future, Display display) {
	if (!future.isDone()) {
		future.handle((nil1, nil2) -> {
			if (!display.isDisposed()) {
				try {
					display.wake();
				} catch (SWTException e) {
					// ignore then, this can happen due to the async nature between our check for
					// disposed and the actual call to wake the display can be disposed
				}
			}
			return null;
		});
		while (!future.isDone() && !display.isDisposed()) {
		 	if (!display.readAndDispatch()) {
		 		display.sleep();
		 	}
		 }
	}
}

Then you don't need special timeout handling and won't see UI freeze.

@laeubi
Copy link
Contributor

laeubi commented Jan 29, 2025

it looks odd, but I think

private <T> void waitForFutureToFinish(CompletableFuture<T> future) {
	while(!future.isDone()) {
		Display display = Display.getCurrent();
		future.handle((nil1, nil2) -> {
			if (!display.isDisposed()) {
				try {
					display.wake();
				} catch (SWTException e) {
					// ignore then, this can happen due to the async nature between our check for
					// disposed and the actual call to wake the display can be disposed
				}
			}
			return null;
		});
		MSG msg = new MSG();
		while (!OS.PeekMessage (msg, 0, 0, 0, OS.PM_NOREMOVE)) {
			if (!future.isDone()) {
				display.sleep();
			}
		}
		display.readAndDispatch();
	}
}

would probably prevent blocking (future itself does not use (a)sync exec).

@fedejeanne
Copy link
Contributor

This PR will probably be superseded by #1776 , right?

@HeikoKlare
Copy link
Contributor

This PR will probably be superseded by #1776 , right?

Not necessarily. We will need to check that in detail. E.g., #1776 does currently not cancel any futures. Also things like the "replacement" of the browser widget (if we agree that we want to try that), is something #1776 does not cover.

@HeikoKlare
Copy link
Contributor

Regarding the error label: I found that different consumers already have fallback logic when the browser instantiation failed, like the internal browser view showing an error message and the JDT control using a "simple" viewer with the note that the browser was not instantiated correctly:
image

So we should drop the idea of doing something similar in browser itself.

@amartya4256 amartya4256 force-pushed the edge_timeout branch 2 times, most recently from 5192586 to 70a4805 Compare January 31, 2025 14:34
@amartya4256 amartya4256 changed the title WIP: Edge Browser Scheduled Job timeout WebViewWrapperFuture handling on job termination timeout Jan 31, 2025
@amartya4256 amartya4256 marked this pull request as ready for review January 31, 2025 15:01
@amartya4256 amartya4256 force-pushed the edge_timeout branch 3 times, most recently from 135e841 to 3a4f8a7 Compare January 31, 2025 16:24
Copy link
Contributor

@HeikoKlare HeikoKlare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the changes look sound to me. I have rather comments on local design. I just do not understand the necessity for the added processOSMessagesUntil() call.

In addition, please revise the commit message to (1) be more descriptive and (2) refer to proper issues. They currently refer to some HiDPI issues, which are not related to Edge.

@amartya4256 amartya4256 force-pushed the edge_timeout branch 6 times, most recently from 9df68a3 to f5c10bc Compare February 3, 2025 13:34
Copy link
Contributor

@HeikoKlare HeikoKlare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one remaining question regarding correctness and two minor proposals for improvement.

This commit encapsulates the ICoreWeICoreWebView2*::Release call for all
the ICoreWebView2* instances in WebViewWrapper into a method
WebViewWrapper::releaseWebViews for a better encapsulation which
can be used to dispose off the created ICoreWebView2* resources.

Contributes to
eclipse-platform#1466
This commit conributes to handle how webViewWrapperFuture should
exceptionally complete if there happens an Edge operation timeout in
processOSMessagesUntil method.

Contributes to
eclipse-platform#1466 and
eclipse-platform/eclipse.platform.ui#2734
@HeikoKlare HeikoKlare merged commit 5f54080 into eclipse-platform:master Feb 4, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Application freezes due to using Edge
4 participants