-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x11rb-protocol takes very long to build #883
Comments
Random guess would be "lots of XML for protocol stuff resulting in even more generated Rust code that
I was curious about enabled features, but that is apparently not so easy to figure out just with a web browser. According to https://github.com/alacritty/alacritty/blob/a58fb39b68caa34b073f66911c0ac6945f56eac2/Cargo.lock, a dependency on x11rb 0.12.0 comes from winit. x11-clipboard depends on x11rb 0.10.1 x11-clipboard just enables winit enables I am ignoring the non-X11-extension features (since I kinda expect the generated code to be the problem since it really is huge). This leads to the following files being enabled to build in
And random time measurements with
So, to me this is clearly due to the huge generated code files. However, that still does not provide any hint on what to do about this. Adding more feature flags doesn't sound like a workable solution. |
I guess the only way is to reduce the amount of things being generated, since for example wayland stuff is pretty fast to generate on the other side and we use a lot of protocols. |
I'm not familiar with X11 internals though to say whether what you generate makes sense or not in the end of the day, but the amount of code you say sounds like really a lot (it's like alarcitty + glutin + winit + some wayland crates combined)... Especially when those types have One technique to reduce code with generated source code is |
I don't know much about Wayland, but there does not seem to be anything like nested structs in there. I hacked together a quick Python script to show which XML tags exist and found Perhaps wayland was actually designed to be simple. X11 certainly was not.
https://github.com/psychon/x11rb/blob/master/doc/generated_code.md has some introduction to the generated code and it shows how much we generate. Random example: I feel like cutting down on the build time here means cutting down on features. :-( |
https://fasterthanli.me/articles/why-is-my-rust-build-so-slow suggested some crimes:
The SVG from The |
It seems that the issue is the raw amount of code generated being passed to LLVM. It would be nice if there was a feature that disabled some of the lesser used trait implementations (like the |
This won't help unfortunately. The only thing that could probably help is speculating on the data and reducing the amount of matches you have. Like if you look at Maybe with the help of some unsafe and speculations on how things are located in memory it could reduce at least giant matches? Probably don't need a bunch of impls of from basic integer types for bitmasks, since you can obviously cast them and then call The issue is not only about the build times, but the amount of code ending in the binary. The stuff like |
Idea similar to one I tried in |
I made an attempt to delete all code related to parsing requests (we normally only have to serialise them, but our The result does not properly build (e.g. x11rb is broken), but at least x11rb-protocol builds fine. The patch for the code generatordiff --git a/generator/src/generator/namespace/request.rs b/generator/src/generator/namespace/request.rs
index f9d9d32..72d4d9d 100644
--- a/generator/src/generator/namespace/request.rs
+++ b/generator/src/generator/namespace/request.rs
@@ -193,6 +193,7 @@ pub(super) fn generate_request(
header = generator.ns.header,
lifetime = lifetime_block
));
+ /*
if gathered.has_fds() {
enum_cases.request_parse_cases.push(format!(
"{header}::{opcode_name}_REQUEST => return \
@@ -214,6 +215,7 @@ pub(super) fn generate_request(
name = name,
));
}
+ */
emit_request_function(
generator,
request_def,
@@ -299,7 +301,7 @@ pub(super) fn generate_request(
&reply_fields,
&[],
false,
- true,
+ false,
StructSizeConstraint::EmbeddedLength { minimum: 32 },
true,
true,
@@ -316,6 +318,7 @@ pub(super) fn generate_request(
));
}
+/*
if gathered.needs_lifetime {
enum_cases.request_into_owned_cases.push(format!(
"Request::{ns_prefix}{name}(req) => Request::{ns_prefix}{name}(req.into_owned()),",
@@ -329,6 +332,7 @@ pub(super) fn generate_request(
name = name,
));
}
+ */
}
fn generate_aux(
@@ -341,7 +345,7 @@ fn generate_aux(
let aux_name = format!("{}Aux", request_def.name);
if switch_field.kind == xcbdefs::SwitchKind::Case {
- switch::emit_switch_type(generator, switch_field, &aux_name, true, true, None, out);
+ switch::emit_switch_type(generator, switch_field, &aux_name, false, true, None, out);
} else {
let doc = format!(
"Auxiliary and optional information for the `{}` function",
@@ -351,7 +355,7 @@ fn generate_aux(
generator,
switch_field,
&aux_name,
- true,
+ false,
true,
Some(&doc),
out,
@@ -912,6 +916,7 @@ fn emit_request_struct(
outln!(out, "}}");
// Parsing implementation.
+ /*
outln!(
out,
"/// Parse this request given its header, its body, and any fds that go along \
@@ -1071,6 +1076,7 @@ fn emit_request_struct(
});
outln!(out, "}}");
}
+ */
});
outln!(out, "}}");
outln!(
@@ -1118,6 +1124,7 @@ fn emit_request_struct(
});
outln!(out, "}}");
+ /*
let request_trait = if request_def.reply.is_none() {
"crate::x11_utils::VoidRequest"
} else if gathered.reply_has_fds {
@@ -1136,6 +1143,7 @@ fn emit_request_struct(
outln!(out.indent(), "type Reply = {}Reply;", name);
};
outln!(out, "}}");
+ */
num_slices_opt.unwrap()
}
diff --git a/generator/src/generator/namespace/struct_type.rs b/generator/src/generator/namespace/struct_type.rs
index 73b7d86..be9f268 100644
--- a/generator/src/generator/namespace/struct_type.rs
+++ b/generator/src/generator/namespace/struct_type.rs
@@ -25,7 +25,7 @@ pub(super) fn emit_struct_type(
out: &mut Output,
) {
assert!(!generate_serialize || fields_need_serialize);
- assert!(matches!(parse_size_constraint, StructSizeConstraint::None) || generate_try_parse);
+ //assert!(matches!(parse_size_constraint, StructSizeConstraint::None) || generate_try_parse);
let deducible_fields = gather_deducible_fields(fields);
diff --git a/generator/src/generator/requests_replies.rs b/generator/src/generator/requests_replies.rs
index 4b29d82..6100fb1 100644
--- a/generator/src/generator/requests_replies.rs
+++ b/generator/src/generator/requests_replies.rs
@@ -276,6 +276,7 @@ pub(super) fn generate(out: &mut Output, module: &xcbdefs::Module, mut enum_case
);
});
outln!(out, "}}");
+ /*
outln!(
out,
"/// Get the matching reply parser (if any) for this request."
@@ -340,6 +341,7 @@ pub(super) fn generate(out: &mut Output, module: &xcbdefs::Module, mut enum_case
outln!(out, "}}");
});
outln!(out, "}}");
+ */
});
outln!(out, "}}");
outln!(out, ""); Diffstat for the generated code:
One one hand, saving two seconds is not much. On the other hand, saving about 15% of time is a relative large gain. Dunno how that number evolved when the code is not simply deleted, but actually hidden behind a flag. Besides that, not much caught my eye when looking through Hm... perhaps the switch stuff can be simplified to the compiler with helper function? I am talking about the |
That helped surprisingly much. Removing diff --git a/generator/src/generator/namespace/helpers.rs b/generator/src/generator/namespace/helpers.rs
index f57ebb2..2c95235 100644
--- a/generator/src/generator/namespace/helpers.rs
+++ b/generator/src/generator/namespace/helpers.rs
@@ -225,12 +225,12 @@ impl Derives {
debug: true,
clone: true,
copy: true,
- default_: true,
- partial_eq: true,
- eq: true,
- partial_ord: true,
- ord: true,
- hash: true,
+ default_: false,
+ partial_eq: false,
+ eq: false,
+ partial_ord: false,
+ ord: false,
+ hash: false,
}
}
Combined with yesterday's patch, the result is 10.4 seconds. Uhm.... I would have expected more. |
Okay, so here are some measurements for For non-release x11rb, I had to apply the following patch to winit for it to builddiff --git a/src/platform_impl/linux/x11/window.rs b/src/platform_impl/linux/x11/window.rs
index bab2f8c..e7c2f0e 100644
--- a/src/platform_impl/linux/x11/window.rs
+++ b/src/platform_impl/linux/x11/window.rs
@@ -1322,7 +1322,7 @@ impl UnownedWindow {
self.xwindow as xproto::Window,
xproto::AtomEnum::WM_NORMAL_HINTS,
)?
- .reply()?;
+ .reply()?.unwrap();
callback(&mut normal_hints);
normal_hints
.set(
@@ -1375,6 +1375,7 @@ impl UnownedWindow {
)
.ok()
.and_then(|cookie| cookie.reply().ok())
+ .flatten()
.and_then(|hints| hints.size_increment)
.map(|(width, height)| (width as u32, height as u32).into())
}
@@ -1764,6 +1765,7 @@ impl UnownedWindow {
WmHints::get(self.xconn.xcb_connection(), self.xwindow as xproto::Window)
.ok()
.and_then(|cookie| cookie.reply().ok())
+ .flatten()
.unwrap_or_default();
wm_hints.urgent = request_type.is_some(); I used
So... #883 clearly helped, but we are still dominating build times. And somehow the total build time goes down when patching from latest x11rb release to my local, newer copy, but the times for x11rb-protocol goes up (for a debug build)?! |
I suggest to test single threaded build, because modern CPUs schedule things weirdly and the core used for longer crates could have less clock. With massive build times like we have, general testing could be done 'normally', since any improvement will be noticeable. I'm afraid though, that feature gating won't help, unless it's enforced, since other crates could enforce features not used by winit (x11-clipboard) which will result in basically no perf changes. Though, we could just suggest everyone to disable default features as of now. |
The new features are not enabled by default (at least for x11rb, x11rb-protocol has a default feature, but is used by x11rb with default features disabled). So, no need to have everyone disable default features for this. |
During profiling both
debug
andrelease
builds of both alacritty and winit, I've noticed thatx11rb-protocol
builds around 5x longer than most massive deps like (serde).And if you happen to have 2 copies of
x11rb-protocol
in your tree, it'll really slow things down.For example alacritty on my system builds in around 20seconds, and each of the
x11rb-protocol
takes around 12s (I have 32 threads, so it doesn't really matter, but on slow systems it could really decrease perf).I was testing with the
cargo build --timings
andcargo build --release --timings
on thetip
of thealacritty/alacritty
repo. You could do that on the winit repo alone for example.It was used on both
rustc 1.72.0
andrustc 1.74.0-nightly (0288f2e19 2023-09-25)
showing the exact same compilation times.The text was updated successfully, but these errors were encountered: