Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: GNU Make jobserver support #1381

Closed
orlitzky opened this issue Dec 20, 2024 · 16 comments
Closed

Feature request: GNU Make jobserver support #1381

orlitzky opened this issue Dec 20, 2024 · 16 comments

Comments

@orlitzky
Copy link

This is a follow-up to #117 which I still think does not address the most common use case.

I have a 64-core machine, and want (roughly) to be running 64 jobs simultaneously to utilize them. A typical build system will launch many compile and link jobs at the same time. The problem with the current solution (from #117) is that it forces an all-or-nothing approach:

  1. I can launch each mold process with MOLD_JOBS=1. This will sometimes leave the system idle when it could be doing something. For example, on the very last link job. (This also defeats the purpose of using mold!)
  2. I can launch each mold process with "unlimited" jobs. This can overload the system when the linking phase begins because many mold processes (up to 64 of them with make -j64) will be started at the same time.

Since mold is deciding itself how many jobs to launch, it would be a lot more efficient if mold could coordinate with the other parts of the build system that are launching jobs. In my opinion, the GNU Make jobserver protocol is the perfect way to do that.

We may finally be getting a solution to this same problem in Ninja after many years (ninja-build/ninja#2506) and it would be great if mold could coordinate with both ninja and/or make.

@rui314
Copy link
Owner

rui314 commented Dec 21, 2024

GNU Make jobserver is not a perfect protocol for this purpose. It's designed with a single-threaded model in mind, and we can't effectively represent multi-threaded programs within that model.

For example, let's say mold uses 32 threads simultaneously. You may think that mold should reserve 32 jobs from the jobserver, but that's not an ideal resource allocation, because mold wouldn't do anything until all 32 cores become idle.

I think the problem can be solved by allowing values larger than 1 for MOLD_JOBS. For example, for your environment, MOLD_JOBS=2 to MOLD_JOBS=4 should suffice.

@orlitzky
Copy link
Author

For example, let's say mold uses 32 threads simultaneously. You may think that mold should reserve 32 jobs from the jobserver, but that's not an ideal resource allocation, because mold wouldn't do anything until all 32 cores become idle.

I'm not sure I understand. Why wouldn't mold do anything? Currently, in the absense of the job server, each instance of mold starts running immediately. (That's why people were complaining about OOM issues.) If the job server only has 10 jobs available, then, for example, mold could use at most 10 threads rather than the default limit of 32, and take tokens from the job pool for each thread that it uses.

This isn't perfect: if one more job then becomes available, the next instance of mold might be launched with one thread. But I think that's just a fundamental limitation of this sort of resource management rather than a problem with the job server per se. If people choose to limit the number of overall build jobs with make -j, they are OK with those minor inefficiencies in exchange for being able to limit their resource usage.

@rui314
Copy link
Owner

rui314 commented Dec 22, 2024

I'm not sure I understand. Why wouldn't mold do anything? Currently, in the absense of the job server, each instance of mold starts running immediately. (That's why people were complaining about OOM issues.) If the job server only has 10 jobs available, then, for example, mold could use at most 10 threads rather than the default limit of 32, and take tokens from the job pool for each thread that it uses.

Unfortunately, make jobserver doesn't work that way. That "if the job server only has 10 jobs available" is infeasible because there's no way to know how many jobs are available. Even worse, there's no way to reserve "up to N jobs"; the only thing you can do with make jobserver is to reserve a job one at a time, and the reservation is a blocking protocol. If a job is not available, your process will block until a job become available. The protocol was not designed with multi-threaded jobs in mind and not suitable for processes like mold.

@orlitzky
Copy link
Author

A multi-threaded linker is the example given in the documentation :)

As an example, suppose you are implementing a linker which provides for multithreaded operation. You would like to enhance the linker so that if it is invoked by GNU make it can participate in the jobserver protocol to control how many threads are used during link...

(from https://www.gnu.org/software/make/manual/html_node/Job-Slots.html)

Unless I have missed an important detail, you could e.g. dup() the file descriptor for the pipe, and then use fnctl() to make the new descriptor non-blocking.

@rui314
Copy link
Owner

rui314 commented Dec 23, 2024

Multi-threaded linker didn't exist until recently, so the example in the documentation was hypothetical and not demonstrated to be useful for the real use case. I think it is actually not. To understand why, let's assume you are building large programs consisting of thousands of object files. At the end of the build, a few dozen executables/DSOs are linked. You are building it on a 32-core machine with -j32. For simplicity, assume one job retires every second.

Until the linker kicks in, everything is in equilibrium. One compiler process terminates and a new one starts every second. There's nothing the build system can do to accelerate building.

Now, assume that the build system starts invoking the linker. When the linker is invoked, there's only one job available, since only one compiler process has retired. So, let's assume the linker reserves one job and proceeds. The next second, the same thing happens -- we now have two linker processes. After 32 seconds, we have 32 linker processes running simultaneously. Each linker process is 32 times slower than it could be, and their aggregated memory usage is 32x higher than ideal. This is not the situation we want.

What was the ideal scheduling? The ideal scheduling is to run at most one linker at any time, and the linker process spawns 32 threads unconditionally, assuming that the linker scales well with more threads. This is enough to keep all cores busy. This is what MOLD_JOBS=1 does. Having more runnable threads than the number of cores doesn't cause adverse effects, such as thrashing, unlike having more processes, which increases memory pressure.

The jobserver protocol aims to solve the memory oversubscription issue by limiting the total number of processes to the number of available cores, but that's not our problem. Imagine, hypothetically, that the compiler doesn't consume any memory. Then we wouldn't need fine-grained control like the jobserver because we could invoke thousands of compiler processes simultaneously without causing any issues. Having more threads or processes is fine, as long as it doesn't cause thrashing. The jobserver protocol aims to solve the memory oversubscription issue while keeping all cores busy, assuming that each job consumes some amount of memory, and that doesn't make much sense for our threads.

@orlitzky
Copy link
Author

I am probably framing this the wrong way. Build jobs can use any amount of CPU, any amount of RAM, and depend on (or be depended upon by) any number of other targets. It's pretty easy to make up examples where any fixed strategy will be non-optimal. I can easily be lured into a discussion about optimizing builds because I find it interesting, but I think optimizing the build is beside the point here.

Ultimately, whatever my perceived problem is, I'm already using -jN to address it. By passing -jN to make, I am asking my toolchain to do N independent things at once, so the feature request is really just for mold to respect my wishes and cooperate with the rest of the toolchain, even if I ask it to do something stupid like -j1.

@rui314
Copy link
Owner

rui314 commented Dec 24, 2024

In the whole thread, I was trying to explain to you that "just respect -jN" or something like that is not applicable to mold because of the impedance mismatch between the concept of the make jobserver's job and what mold does. You may be thinking that what we should do with the jobserver is obvious, but it's not. That's what I was trying to say.

@orlitzky
Copy link
Author

I understand. There is already an impedance mismatch between the way most build systems work, and limiting the total number of "jobs." But that is irrelevant to me. If I ask my toolchain to limit the number of jobs, I'd like it to do that, even if doing so is suboptimal. Is there any problem with limiting the number of threads used by mold, other than that it may be inefficient in some cases?

@rui314
Copy link
Owner

rui314 commented Dec 24, 2024

So you are assuming that the job in mold's context refers to thread, and you seem to believe that this is a simple fact and obvious to everyone, which led you to say to me like why don't you just do that? I was trying to convince you that it's not an obvious fact or anything. If you think that the details (or the explanation of why it’s not obvious) are irrelevant to you, I don’t know what else I can do for you. Sorry.

@orlitzky
Copy link
Author

I wouldn't say it's obvious, but in the context of parallel make, I think it makes sense to think of a job as anything that can be run in parallel. I'm not trying to annoy you with my repeated replies, I'm just trying to ensure that I am communicating what I intend to communicate. If I have succeeded and if you still don't like the idea, we can stop and I'll thank you for listening.

@Ext3h
Copy link

Ext3h commented Jan 9, 2025

The jobserver protocol aims to solve the memory oversubscription issue by limiting the total number of processes to the number of available cores, but that's not our problem. [...] The jobserver protocol aims to solve the memory oversubscription issue while keeping all cores busy, assuming that each job consumes some amount of memory, and that doesn't make much sense for our threads.

It's not that simple. Please reconsider your understanding of what a job token represents in the jobserver protocol represents. It's not j/nth of the systems memory, but more generically j/nth of the scarcest resource in the system.

Memory oversubscription is the worst offender, but oversubscription of runnable tasks is likewise bad as it results in unpredictable latency for some of those tasks. Which leads to a pretty bad case of trashing yet again, and does actually negate the positive aspects of threading in mold, as the oldest and most heavily threaded processes will typically loose in scheduling as they scrape on the bottom of their time slice bucket.

Especially on larger build servers (not so much on individual developers workstations that are woefully under-equipped in terms of RAM per logical CPU core, but rather on server systems that are nowadays going into tripple-digits number of CPU cores and 4GB+ of RAM per logical per CPU core) that keep being fed with fresh work by a build orchestrator, properly "buying CPU cores" with job tokens is important to maintain at least soft real time qualities (i.e. for unit tests and other bottlenecks such as CPU bound IO tasks like packaging and network transfers) by strictly avoiding oversubscription on the CPU time as well.

There has been plenty of work invested accross various build systems pushing that as far as even sharing a central, machine wide job server across otherwise isolated docker containers and alike. And it's a pain manually having to dial in resource limits, priorities and alike when it mostly "just works" with jobserver by keeping the oversubscription strictly under wraps.

Mold theoretically being able to greedily grab more job tokens (but then only use as many threads as it could reserve CPU cores) would be greatly appreciated in those situations. They way gcc dispatches more workers on -flto via an internal call to make - and thus hooking onto the jobserver - is actually a great fit for those systems.


I do understand your reservation though how it is actually more complicated with mold, as the TBB library has the pretty severe limitation of being unable to up-scale an arena once it has been initialized, and using oneapi::tbb::global_control(oneapi::tbb::global_control::max_allowed_parallelism,threads_num) with a static, application wide control object is unfortunately not threadsafe. So the gcc/make approach of greedily grabbing more and more job tokens as they become available is unfortunately not working...

However, the whole concept of an arena in TBB exists also just for the exact same reason that over-subscribing on available cores is a pretty bad thing to do as it makes you dependent on the OS scheduler which will rarely "do the right thing". So it's kind of hard not to see the parallels between the in-process pool of threads managed by TTB and the tokens in the jobserver protocol.

@rui314
Copy link
Owner

rui314 commented Jan 10, 2025

Thank you for your comment. It seems everyone, myself likely included, has strong opinions on what constitutes the "job" in the context of multi-threaded processes within the make jobserver. However, setting that aside, if you define a job as a thread, how do you manage this in mold?

By default, mold tries to use all available threads. Say we have a 16-core machine. Waiting for all 16 threads to become available would clearly be inefficient.

What if mold were to spawn the same number of threads as the available jobserver jobs? While this might seem like a reasonable approach, in practice, mold would only reserve a single job at startup. This happens because mold is launched by make after a single job has been freed. As a result, mold would effectively run as a single-threaded process, which is clearly not the desired behavior.

Adjusting the number of running threads dynamically using tbb::global_control is not straightforward. As you mentioned, dynamically increasing the thread count as more jobs become available is not easy.

@rui314
Copy link
Owner

rui314 commented Jan 10, 2025

This is, after all, a job scheduling problem, and various scheduling policies can be thought to achieve different objectives, just like other scheduling problems. Some may prioritize reducing total build time, even if it comes at the cost of other performance characteristics, such as system responsiveness. Others might focus on limiting the total number of threads to meet their specific requirements. There could be other policies for other goals.

Given that, it looks like we are addressing the problem at the wrong level. In my opinion, the build process, which has a global view of the progress and is responsible for invoking child processes, should be the entity making decisions based on a user-specified policy. The centralized perspective enables more effective scheduling than doing it at the leaf level. For better scheduling, the build system should be aware that mold is multi-threaded (the issue we are discussing, after all, stems from a lack of this awareness), and pass --threads=N option if necessary to control the number of threads that the build system permits the child to use.

Therefore, I believe that extending the schedulers of build systems, such as ninja, to enable better scheduling decisions is the correct approach. By enhancing their awareness of resource usage and workload characteristics, these systems can make more informed global decisions, and that improves overall efficiency and ensures not individual subprocess but the entire workloads align with specified policies.

@orlitzky
Copy link
Author

What if mold were to spawn the same number of threads as the available jobserver jobs? While this might seem like a reasonable approach, in practice, mold would only reserve a single job at startup. This happens because mold is launched by make after a single job has been freed. As a result, mold would effectively run as a single-threaded process, which is clearly not the desired behavior.

It is a reasonable approach. It's doing what I've asked it to do.

You're imagining the worst case, where there are a large number of link jobs and a small number of jobserver slots. The other extreme is also possible, for example if there are a large number of compile jobs and the results are linked into one executable when compiling is done. In that case, mold will be allowed to use many threads, because the other jobs will have completed before linking starts. If I'm using the job server, I accept both possibilities.

For me the alternative is not using mold in some theoretically optimal performance mode, the alternative is using GNU ld. If mold supported the job server, I could use it in every scenario where I now use GNU ld, and it would always perform better. Not optimally maybe, but better, and without causing any new problems.

For better scheduling, the build system should be aware that mold is multi-threaded (the issue we are discussing, after all, stems from a lack of this awareness), and pass --threads=N option if necessary to control the number of threads that the build system permits the child to use.

The outcome of this is the same as if mold supported the job server, and I think that teaching mold to use the job server is a lot more feasible than trying to teach every build system about the command-line flags of every toolchain component.

@rui314
Copy link
Owner

rui314 commented Jan 12, 2025

@orlitzky There are many different scheduling policies, and I already know you strongly favor one and believe it's the right choice. I’ve tried multiple times to explain that this issue is not as simple as you think, but it seems you’re just reiterating your opinion repeatedly. Please refrain from doing that. I do appreciate your interest in mold, but if this continues, I’ll have to close and freeze this feature request.

@bgemmill
Copy link

Therefore, I believe that extending the schedulers of build systems, such as ninja, to enable better scheduling decisions is the correct approach. By enhancing their awareness of resource usage and workload characteristics, these systems can make more informed global decisions, and that improves overall efficiency and ensures not individual subprocess but the entire workloads align with specified policies.

It's probably worth pointing out that ninja is trying to solve this problem by working towards make jobserver support.

It might not be the best scheduling policy, but it does seem to be a very common one. That itself is valuable to clients who would want to use mold by dropping it into a larger build system.

@rui314 rui314 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants