-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Feedback: Improving Startup Latency in X11 #16033
Conversation
Before: 80-100 After: 66-81
You can test this PR using the following package version. |
|
@cla-avalonia agree |
src/Avalonia.X11/X11CursorFactory.cs
Outdated
var cursorShapes = GetAllCursorShapes(); | ||
_cursors = new Dictionary<CursorFontShape, IntPtr>(cursorShapes.Length); | ||
foreach (var shape in cursorShapes) | ||
_cursors[shape] = XLib.XCreateFontCursor(_display, shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is XCreateFontCursor
a bottleneck? If so, I think this call could be done lazily on GetCursor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a viable strategy aswell.
Without initializing the cursors at all, startup is around 50ms.
Hence why I noted the savings were less than what I expected, and there might have been some contention.
It might actually be better than the current solution; as the user is unlikely to immediately jump in and use all cursor styles. Let me make a quick commit to test this.
As for the force pushes, that was just me fixing up spacing as to not make unnecessary changes in the diff, aligning myself with the contributors guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some testing, I've it seems that lazy initialization for cursors is probably the better approach.
It gets a startup time of 40-51ms
on my machine, so around twice as fast than the baseline.
Loading a previously uncreated cursor seems to be less than 1ms, so no horrific frame spikes to be found.
I originally thought XCreateFontCursor
was doing a file read under the hood, since I was seeing fgets
in the stack trace, which is where I got the idea of sending that to background came from. But even if it was, that data would be cached one way or another, so the read would be fast.
I've committed bfb2ef2 which achieves that as the current commit and updated the opening post. The previous PR is still available in the branch history.
d9a2ff6
to
2c635b8
Compare
2c635b8
to
bfb2ef2
Compare
You can test this PR using the following package version. |
@kekekeks thoughts? |
Technically we can move GPU initialization to the render thread since we are using a separate X11 connection for rendering anyway. Cursors should be fine to lazy-initialize since the user code isn't likely to cycle through all of them at the same time. |
What does the pull request do?
This pull request reduces the latency required to display the first Window when running Avalonia under the X11 backend.
What is the updated/expected behavior with this PR?
On my machine, running Hyprland and a 5900X, a blank Avalonia project compiled with NativeAOT previously took
80-105ms
from runningAppBuilder.Configure
to runningInitialize
inApp.axaml.cs
.This PR improves it to
66-81ms
or40-51ms
depending on commit used.How was the solution implemented (if it's not obvious)?
Startup is largely bottlenecked on X11 by the following methods:
X11CursorFactory__ctor
InitializeGraphics(options, Info)
During the backend initialization.
These two methods alone take around half the startup time.
Because these methods don't depend on each other's inputs/outputs,
and have approximately the same runtime, they can be ran in parallel
to speed up booting.
Note: We call XInitThreads, which enables concurrent access to X11.
Two Implementations
There are two implementations in this PR, pick the preferred commit.
Baseline: 80-105ms
66-81ms
(0.825x the time).40-51ms
startup times. (0.5x the time).My lower end laptop with 4 threads yielded similar results.
Request for Feedback
Note
This concerns the original parallel implementation
PR was updated to instead make the lazy implementation the current one, since I believe it may be preferable.
I would like to know if this could potentially be further improved.
Oddly, I was expecting slightly greater savings, something approximately equal to not calling
X11CursorFactory
at all.Although I am seeing an improvement, the improvement is not as large as I expected it to be. I'm not sure if this is due to some sort of resource locking in the XWayland implementation powering my desktop behind the scenes, or whether it's due to another reason.
I did investigate the latency with which the threadpool tasks were immediately started; starting a dedicated thread, etc. And it seems to be somewhere in well, not Avalonia code.
Why?
I thought Startup could be faster.
So I made it a bit faster.
I think this can even be further improved by starting the processing right at the call to
UseX11(this AppBuilder builder)
, but I can't see that passing review, as that should be used for configuration, not execution.