Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mold concurrency issues with ninja #117

Closed
cruisercoder opened this issue Sep 21, 2021 · 51 comments
Closed

mold concurrency issues with ninja #117

cruisercoder opened this issue Sep 21, 2021 · 51 comments

Comments

@cruisercoder
Copy link

Mold works great when compiling for a single target binary, but when we do a complete build repo wide (ninja with no target), things start to grind to a halt toward the end of the build when concurrent linking is occurring. It seems like it might be best to turn off concurrent linking when using ninja as ninja is already trying to use available cores for its own parallelism. I have tried disabling mold concurrency with -Wl,--no-threads but I don't think that is correct. Do you have any guidance for ninja+mold?

@rui314
Copy link
Owner

rui314 commented Sep 22, 2021

Thank you for raising an important point. mold is highly parallelized, and that performance characteristic is different from what the ordinary build system expects. Eventually we need to improve ninja so that it adjusts its expectation on linker's CPU usage.

For now, did you try ninja's -l N option? That is an option to tell ninja that it should not spawn new process if the load average is above N.

For the record, I'll leave random thoughts here on what is the problem and what we should do:

  • ninja tries to spawn N processes if the machine has N cores, so that subprocesses don't have to compete with each other for CPU resources. If a subprocess is not single-threaded but spawns N threads, the assumption becomes wrong. However, competing just for CPU shouldn't hurt throughput -- ignoring the small overhead of task switching, the total amount of time the build system needs from start to finish should remain the same.
  • If subprocesses compete for more memory, that's a real problem. They will try to kick each other out which greatly reduces the throughput. So there's no point of spawning more subprocesses once the cpu usage is saturated.
  • Currently, I believe the ninja's scheduling algorithm assumes that a subprocess is single-threaded. We need to tune it for multi-threaded linkers.
  • We can also add a new command line option to mold so that mold halts at the beginning of the main function until the load average is below N. But it doesn't feel like a right thing to do, as it is more like a build system's problem rather than the linker itself.

@brenoguim
Copy link

This might be of interest: https://ninja-build.org/manual.html#ref_pool

The usage example shows how to limit the number of parallel jobs when linking, because linkers themselves are already parallel (and memory hungry).

@ilyapopov
Copy link

@andrewchambers
Copy link

andrewchambers commented Sep 28, 2021

A fairly common thing to do is support the make job pool protocol. It is simply a semaphore implemented by a unix pipe:

https://www.gnu.org/software/make/manual/html_node/Job-Slots.html

Unfortunately ninja does not support this, but other tools, such as rust's cargo do (and gnu make of course).

@ilyapopov
Copy link

There was a PR to implement jobserver protocol in ninja but it has not been merged. See ninja-build/ninja#1140 ninja-build/ninja#1139

@glandium
Copy link

It's also worth mentioning that gcc/ld support the make jobserver for lto.

@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

I think one of the problems of jobserver is that a process that request a new job slot can block indefinitely until the request is satisfied. Requesting a job slot is just writing a byte to a fifo, so if mold wants to reserve 16 jobs for itself for example, then it blocks until 16 bytes are fully written to a fifo. I think that will leave lots of resources unused during compilation.

@andrewchambers
Copy link

andrewchambers commented Dec 17, 2021

Minor note, iirc requesting a job is reading a byte, you return the job by writing it back again.

@glandium
Copy link

I guess the idea would be to adjust your threadpool dynamically when the jobserver allows you to get a token, not to wait for all the tokens to be available at once.

@rui314
Copy link
Owner

rui314 commented Dec 18, 2021

@glandium Can you do that? I mean, is there any way to know how many bytes you can write to a fifo without writing to it?

@andrewchambers
Copy link

andrewchambers commented Dec 18, 2021

@rui314 Unless -j is more than PIPE_BUF I don't see how writing can block, besides, to get a job you read, so you can use select or poll to continuously request job slots and return them with write when you are done. I guess to do it you still need to make the read fd non blocking on read.

https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html#POSIX-Jobserver

@rui314
Copy link
Owner

rui314 commented Dec 18, 2021

@andrewchambers I think you are right (and I confused writing with reading again and again...)

Let me think about whether we should support job server protocol or not. It needs to be resolved in a big picture, and I'm not confident if the job server can solve it entirely, as the CPU is not the only resource that limits compilation speed. Memory is sometimes more strict constraint.

@bengtj
Copy link

bengtj commented Aug 22, 2022

@rui314 I wonder; does mold need all threads to start the same time because they communicate with each other? Otherwise I guess each thread should wait for a token from the make job server pipe separately.

In my experience the only thing that have worked really good for us was back when we used the make job server for everything in our big build. We have a top level Makefile orchestrating the whole build/packaging for different platforms etc. Then we introduced CMake and ninja which does not cooperate with make. After this we have parallel ninja processes running on the same machine called from the top level Makefile. We try to share the cpu-threads between the ninja jobs and not overcommit too much but it does not work that well (both loading too little and too much depending on ccache performance on a particular commit/platform). However the lto linking jobs actually helps even stuff out because ld.bfd creates that temporary makefile that makes the lto-linking obey the job server properly.

Also, in my experience the make load-average flag is superior to the ninja load-average flag. Make tries to estimate the load average when handling jobs but for ninja the cpu load graph looked a comb with spikes as there is a delay calculating load average. With ccache the compilation jobs can be very quick and the kernel cannot keep up I guess. This may have changed nowadays, IDK.

@rui314
Copy link
Owner

rui314 commented Aug 22, 2022

We might be able to add threads to the thread pool when a new thread becomes available via the job server pipe, but I don't think that could make things a lot better. Let's say we have four linker processes. The worst scenario is that the four processes running simultaneously and each process takes 4x more time to finish than it could have done on a silent machine. During the execution of the processes, they consume 4x memory which is the biggest problem.

So, what if we make a change to the linker so that each process gradually adds more threads from the thread pool? Since it is after all CPU constrained, the total execution time wouldn't change that much (besides the overhead of frequent process switches), and the total memory consumption would be the same too.

What we really want to do is to run less number of linker processes simultaneously. If the linker itself is scalable to more number of cores, there's not much point to run them simultaneously in the first place. If we run each linker one by one, it still takes 4x time in total, but the peak memory usage becomes flat instead of 4x.

So, as you can see, the problem is that the make job server's concept doesn't fit very well to mold. Job server assumes that each process is single-threaded and is trying to optimize the CPU usage in that problem space. mold broke that assumption. That's the fundamental problem, I think.

@bengtj
Copy link

bengtj commented Aug 22, 2022

@rui314 If I understand you correctly I agree in most cases. But I think our scenario is slightly different.

We have no control over how many linker processes are running at the same time as we have several ninja-instances for different platforms running at the same time orchestrated from a parent Makefile. It is also sharing the cpu with a number of other packaging scripts, test run scripts, 3pp builds and other things. In these scripts we try to use Makefile:s to parallelize whenever it is needed. And it works very well to keep cpu under control as long as everything is using make. Sometimes we also have multiple of these builds on the same machine where the load-average-flag comes into play.

In order to make the ninja pools work properly I guess you will always need to have a single ninja instance? This can be quite hard to accomplish for bigger build systems with a lot of moving parts. In our case we can have call chains like "Makefile -> ninja * 5 -> test run script -> Makefile -> test suite * 40" or "Makefile -> ninja * 7 -> g++ -> ld.bfd -> Generated Makefile -> Linker thread * 30" for example. The make job server really helps keeping cpu (and indirectly memory) under control during the different build phases.

My point is unless I understood ninja thread pools wrong the make job server is vastly superior to pools for our use case. It just works with recursive make files mixed with scripts. There are some ninja forks with support for make job server and we will probably want to go there. People are forking ninja to get this feature.

The method of dynamically allocating threads when you can read a token could fit in the make model. In practice I guess it is how it works in in the bfd/gold case with the generated temporary lto-makefiles that will not start a new lto-linking-"thread-process" unless make can get a token first. In both cases it will keep the token until that linking thread is done I guess.

@andrewchambers
Copy link

Invoking ninja from make seems more like an anti pattern to me.

@rui314
Copy link
Owner

rui314 commented Aug 23, 2022

Agreed. Ninja files are usually generated by CMake (or some other build file generator), so you may be able to generate Makefile and use Make throughout your project.

@bengtj
Copy link

bengtj commented Aug 23, 2022

I agree that make calling ninja can be seen as an antipattern. It would be nice to not have to tune threading for ninja. But in real life it is often not that simple to change...

When migrating to CMake we started using Makefiles but we had to switch to ninja as this would almost halve the build time.

The reason is a combination of ccache and how CMake generated Makefiles which would call the compiler twice; first to get deps and then to compile. This would cause a sysload as high as 4/5 of the cpu due to the excessive file handling when you have a good ccache hit rate (mostly only preprocessing). The generated ninja files would only call the compiler once and get the deps as a sideffect. (It is possible that CMake can generate smarter (gnu) Makefiles nowadays but since then we have started to use the nice ninja dependency tree for queries and stuff and are kind of stuck.)

There are other scenarios where we don't have control over the build system. For example calling 3pp build systems and legacy parts. It would be a lot of work trying to squeeze everything into a single CMake-generated ninja file and then tune pools. We haven't really had any big RAM issues so we haven't had any reason to restrict linking, but we have 384G on our build machines.

I'm just trying to argue that the make job server is for us the adhoc standard that makes it possible to maintain big highly parallellized build system with lots of moving parts where some are out of our control. Make job server is one of the first things I look for in new tools using multithreading. For a linker I look at if it supports gcc lto as well.

For our scenario it looks like mold would fit very nice!!! We rely heavily on ccache so linking is most often the build bottleneck. But this is also the reason why we would not want to restrict linking that much; when ccache has good hit rate we want to use the whole machine for linking. For example, during build of the test platform we link ~1500 so-libs and ~2200 executables (excluding 3pp:s) so linking performance is very important :)

@marxin
Copy link
Contributor

marxin commented Aug 23, 2022

I would like to point out that GNU make has recently added a promising named pipe jobserver. Please see more in the following Ninja post: ninja-build/ninja#1139 (comment).

@marxin
Copy link
Contributor

marxin commented Aug 31, 2022

I've just implemented an experimental patch but it has an unexpected implication as mold can run GCC with LTO mode that then can't read any jobserver token and we end up with a serial phase. That said, one (mold) needs to actively returns tokens when waiting for LTO :/

mold-jobserver.patch.txt

@bengtj
Copy link

bengtj commented Aug 31, 2022

@marxin: I'm not sure I follow. Do you mean that job server tokens will be allocated during phases when mold does not need/use the threads?

Quickly looking at the patch it looks like it grabs as many threads as possible from the job token server. I guess this approach can starve other processes in the build. I would kind of hope that you could affect the max threading somehow, preferably also by reading "-j" and "-l" from $MAKEFLAGS.

But to mimic make I guess each started thread should probably grab a token by reading blocking from the token pipe, in practice hang until a token is available. And then put it back when the thread exits. But this is without knowing how threading works in mold, guess it could lock up everything when threads do not start until they get a token. It could possibly be even more fine grained if a thread reads a token when it needs to actually do some processing and writes it back when done processing. But maybe the threading in mold is not some kind of message loop so this model does not fit at all.

@marxin
Copy link
Contributor

marxin commented Aug 31, 2022

@marxin: I'm not sure I follow. Do you mean that job server tokens will be allocated during phases when mold does not need/use the threads?

Yes, at least in my prototype implementation.

Quickly looking at the patch it looks like it grabs as many threads as possible from the job token server. I guess this approach can starve other processes in the build. I would kind of hope that you could affect the max threading somehow, preferably also by reading "-j" and "-l" from $MAKEFLAGS.

Yes, my implementation is naive and uses a greedy approach :/ Note -j from MAKEFLAGS would be also set to e.g. 16, which is a maximum value.

But to mimic make I guess each started thread should probably grab a token by reading blocking from the token pipe, in practice hang until a token is available.

It depends. It's likely better to not wait and start processing "jobs" in a single thread and probably ask for another token once there's another opportunity for threading.

And then put it back when the thread exits. But this is without knowing how threading works in mold, guess it could lock up everything when threads do not start until they get a token. It could possibly be even more fine grained if a thread reads a token when it needs to actually do some processing and writes it back when done processing. But maybe the threading in mold is not some kind of message loop so this model does not fit at all.

Yep, it should likely return it when not needed. But I think the token manipulation should be integrated into the underlying TBB library which knows more about threads spawning and can dynamically increase/decrease it during the run of the mold linker.

Anyway, it's not as trivial as I thought.

@andrewchambers
Copy link

andrewchambers commented Aug 31, 2022

If you plan to take this greedy approach you could probably write a helper program that simply takes jobserver jobs and then executes another command (mold in this case) with the right flag for how many jobs it was able to grab.

@bengtj
Copy link

bengtj commented Sep 1, 2022

@andrewchambers: I agree, such a program could be useful in itself. However, in order to make it really useful for efficiently limiting total load during a build while while not starving other jobs a more dynamic approach is needed. But that could prove to be a bit too complex and maybe does not fit the mold threading model at all.

(We had an attempt to use ninja pools for linking for another case where we only have one ninja instance. It was hard to tune and was always slower than the "free threading" approach. The linking jobs vary a lot in time/size so it feels clunky to tune as well.)

@orlitzky
Copy link

orlitzky commented Nov 8, 2022

I'm glad to see someone else has already suggested the GNU jobserver here. We run into this problem in Gentoo because our package manager is source-based. To get a predictable level of CPU usage, users generally want to configure the number of simultaneous CPU-intensive jobs that can be run at any one time, regardless of the nature of those jobs (compile, link, whatever). Without coordination, you wind up in a situation where every tool wants to fire off N jobs, where N is something like the number of CPUs on the system. But if one of those N jobs launches another N jobs that themselves launch N jobs...

The GNU jobserver is the closest thing to a standard way to coordinate between the different processes, and hopefully ninja will finally adopt it soon.

@rui314
Copy link
Owner

rui314 commented Nov 8, 2022

I'm thinking of something easier to implement and understand. That is, what if we just allow only one instance of mold at a time? There's not much point of running multiple mold instances on a single machine simultaneously. So, if mold is invoked while other mold instance is running, we can just make it to wait until the first process finishes.

@mathstuf
Copy link

mathstuf commented Nov 8, 2022

How do you suppose that is to work when containers and multiple users are involved? That is, what mechanism will be used to measure "uniqueness"? And why is mold vs. mold the only concern? Are there no other tools that look at nproc to determine their task count? I think interacting with a jobserver that can handle -l (load average) flags and more than one process threatening an NxN problem is more robust than mold trying to solve problems in a little sandbox ignoring the state of the rest of the system.

@orlitzky
Copy link

orlitzky commented Nov 8, 2022

I'm thinking of something easier to implement and understand. That is, what if we just allow only one instance of mold at a time? There's not much point of running multiple mold instances on a single machine simultaneously. So, if mold is invoked while other mold instance is running, we can just make it to wait until the first process finishes.

I consider this more a matter of predictability than anything else, and even one instance of mold is unpredictable at the moment.

If I have 8 cores, I might (for example) configure my package manager to launch at most two simultaneous jobs via MAKEOPTS, and then expect to be able to run two instances of the package manager without using more than half of my CPUs. That will be more or less true so long as the compiler, linker, and other build tools are single-threaded (or use the jobserver). But if I use ld.mold instead of ld.bfd, both package manager instances will eventually try to link something and mold will use up all of the available CPUs. Having one mold process stall while the other runs would do nothing to change that. What we'd really like is for each mold process to use either one or two threads, depending on how many other jobs are running.

That may not be the optimal use of my resources, but it matches most closely my intent when I set e.g. MAKEOPTS="-j2" before starting a build.

I can of course force mold to use only one thread, but that's overly pessimistic: if at any point linking is all that's happening, I want mold to be able to use as many threads as there are job slots available.

@mathstuf
Copy link

mathstuf commented Nov 8, 2022

Note that there's also things that mold doesn't (and cannot) know. For example, maybe there's one really expensive link that holds up some large part of the build graph. If it stalls waiting for, say, dozens of test programs to link, you're really hampering the wall clock time of the build. I think any mold-specific solution would also need to consider starvation and unfairness in whatever solution it takes; even if it did, it cannot know which link is "more important" to finish first for the entire task at hand. The jobserver should also be much less contentious since any job mold does get ends up not "fighting" with other work the build tool thinks still the machine still has room for.

@rui314
Copy link
Owner

rui314 commented Nov 8, 2022

Keep in mind that there's no point of running multiple mold instances simultaneous unless your machine has really a lot of cores like 64 or 128. If you share 16 cores with two mold instances, they will take 8 cores each, and each of them takes 2x more time than it would have taken with 16 cores. The peak memory usage is 2x because you are running two processes simultaneously. If you run two processes serially, you can reduce the peak memory usage by half. In general, multi-threaded, scalable programs don't need to be invoked in parallel with others to shorten the overall latency.

@orlitzky
Copy link

orlitzky commented Nov 8, 2022

Keep in mind that there's no point of running multiple mold instances simultaneous unless your machine has really a lot of cores like 64 or 128. If you share 16 cores with two mold instances, they will take 8 cores each, and each of them takes 2x more time than it would have taken with 16 cores. The peak memory usage is 2x because you are running two processes simultaneously. If you run two processes serially, you can reduce the peak memory usage by half. In general, multi-threaded, scalable programs don't need to be invoked in parallel with others to shorten the overall latency.

That's exactly the issue, but it's not just multiple mold processes that are a problem. If one mold process is using all of my available CPU, it's also increasingly pointless to launch further compilation processes. The jobserver protocol lets gcc, mold, etc. all pull from the same pool so that no more threads are used than would be useful (or than have been specified by the user; maybe I want to build two programs in two terminals and allocate to each of them half of my cores).

@rui314
Copy link
Owner

rui314 commented Nov 8, 2022

So there are two types of processes: single-threaded and multi-threaded. Jobserver is designed only with the former in mind and doesn't work well to coordinate the latter. We for example don't want to run mold with 3 threads when the jobserver has three available cores and later invoke another mold instance with a single thread when one core becomes available. We want to run the first mold instance with all available cores instead so that it finishes as quickly as possible. This can be achieved by reserving as many cores as available in the mold startup routine, preventing other mold processes to run, and forgeting about the jobserver.

@rui314
Copy link
Owner

rui314 commented Nov 8, 2022

So, if N cores are available, we want to run up to N single-threaded processes and zero or one multi-threaded process (which is not affected by how large N is). That should achieve the maximum CPU utilization while minimizing the peak memory usage.

@orlitzky
Copy link

orlitzky commented Nov 8, 2022

The number of physical cores is a red herring I think. We want to run as many jobs as the user asks us to. If he has two terminals open and invokes one build system in each terminal with MAKEOPTS="-j2", he should get roughly two concurrent processes/threads in each. In no case do we want mold (or any other tool) to automatically use up as many cores as are available, because that hinders all of the other jobs, whether or not they're multi-threaded.

That may be "suboptimal," but only in one specific sense. For us (and in the original issue report), predictability is more important than maximal resource usage at all times.

We for example don't want to run mold with 3 threads when the jobserver has three available cores and later invoke another mold instance with a single thread when one core becomes available.

I think that's what should happen :)

If a core (job slot) becomes available and if a linking job is queued up next, we should start it. When the other cores become available, there will generally be other jobs waiting to eat them up. The machine may wind up idle at some point, but if the user asks for (say) two jobs, we shouldn't try to outsmart him.

@rui314
Copy link
Owner

rui314 commented Nov 9, 2022

I think that the notion of "jobs" is actually a red herring. Build tools has long been single-threaded. make runs multiple processes concurrently as a workaround for it. If each tool can saturate the computing power and the IO isn't a bottleneck, make doesn't have to run them concurrently. It would have a negative impact to the performance because the working set size increases as the number of concurrent processes increases.

So there are really two types of processes: traditional, single-threaded one and modern, multi-threaded one. The latter scales by itself and doesn't need the workaround.

We for example don't want to run mold with 3 threads when the jobserver has three available cores and later invoke another mold instance with a single thread when one core becomes available.

I think that's what should happen :)

I disagree. If you just apply the notion of "jobs" to multi-threaded, scalable processes, I understand that you would reach that conclusion. But the notion of jobs isn't directly applicable to such processes.

@orlitzky
Copy link

orlitzky commented Nov 9, 2022

I think that the notion of "jobs" is actually a red herring. Build tools has long been single-threaded. make runs multiple processes concurrently as a workaround for it. If each tool can saturate the computing power and the IO isn't a bottleneck, make doesn't have to run them concurrently. It would have a negative impact to the performance because the working set size increases as the number of concurrent processes increases.

We don't want to saturate the CPU! I cannot stress that enough. We want to do what the user asks. That may saturate the CPU or not.

I disagree. If you just apply the notion of "jobs" to multi-threaded, scalable processes, I understand that you would reach that conclusion. But the notion of jobs isn't directly applicable to such processes.

Everyone is aware of the difference. Quoting https://www.gnu.org/software/make/manual/html_node/Job-Slots.html,

Tools that make invokes which are also able to run multiple operations in parallel, either using multiple threads or multiple processes, can be enhanced to participate in GNU make’s job management facility to ensure that the total number of active threads/processes running on the system does not exceed the maximum number of slots provided to GNU make.

You're focused on squeezing the maximum performance out of the build. That's the right default, and is why people want to use mold in the first place. But what we're asking for is something different: if the user explicitly asks (with -j2) to run two jobs, that mold be able to respect that request.

@rui314
Copy link
Owner

rui314 commented Nov 9, 2022

Saturating CPU is fine. Rather, we want to always saturate CPU. Unlike memory whose overallocation causes thrashing, having many threads waiting for CPU cycles is desirable, because otherwise we would just waste CPU cycles to run idle loops.

If you want to make some processes to use less CPU resources to allocate more CPU resources to other groups of processes, you should use nice.

@Alcaro
Copy link

Alcaro commented Nov 9, 2022

No, it's not.

  • There could be a system watchdog that emails the administrator (who may or may not be the user) if the load average or cpu utilization goes too high
  • The system could be under power constraints; I've seen a few laptops whose maximum power use exceeds the charger's capacity
  • The system could be under thermal constraints, where maxing two cores is fine, but maxing all eight panics the kernel because the cooling is broken or insufficient (or just spins up the fans, which the user doesn't want to listen to)
  • Some other reason I didn't think of

If the user says the CPU should not be saturated, then the CPU should not be saturated.

@rui314
Copy link
Owner

rui314 commented Nov 9, 2022

(1) may make sense but (2) and (3) are not. If the maximum CPU utilization is enforced by some hardware constraint, it really needs to be enforced by some kernel-level mechanism, such as just downclocking the CPU. make -jN is extremely unreliable as a thermal throttling mechanism (like, what if someone accidentally run two make -jN).

@mathstuf
Copy link

mathstuf commented Nov 9, 2022

There is also make -lN to not start new work if the machine load gets too high. This is something that "leaks" across containers and allows for cooperative "backoff" for overloaded machines.

@samuelpmish
Copy link

samuelpmish commented Mar 16, 2023

Hi, I've recently started using mold, and it's certainly a big win for incremental builds!

What's the current best practice for using mold + ninja on a project with many targets that require linking?

@mathstuf
Copy link

What's the current best practice for using mold + ninja on a project with many targets requiring linking?

Put all mold-using rules into a job pool of size 1.

@rui314
Copy link
Owner

rui314 commented Mar 17, 2023

In the above commit, I added an experimental feature to limit the number of active mold process to 1. By setting the environment variable MOLD_JOBS to 1, you can tell mold to acquire a file lock (it creates ~/.mold-lock) before doing anything meaningful.

Setting this environment variable to 1 should reduce the peak memory usage.

Can you guys try this environment variable to see if it's useful?

@samuelpmish
Copy link

I'm not sure if I'm using it right, but I built main in Release mode w/ g++ 12.1 here's some timings from a C++ project:

ld: 45s
mold: 42s
mold (new, with export MOLD_JOBS=1): 41s

So, slightly faster with the most recent commit (but that might be unrelated to the MOLD_JOBS part), but not a huge difference.

For reference, here's a zip file of the 3 traces (viewable on https://ui.perfetto.dev/) for the three builds above:
traces.zip
You can see the differences in concurrency / link time for each target at the end of the traces.

Anecdotally, a coworker of mine is seeing about a 50% slowdown using mold on a larger project with ninja.

@rui314
Copy link
Owner

rui314 commented Mar 18, 2023

That change is not supposed to improve build speed because it decreases peak memory usage by sacrificing the parallelism a little bit. So if your system has enough RAM, it could even have a slightly negative impact to the performance. However, if your build is under a memory pressure, the situation might be different.

@rui314
Copy link
Owner

rui314 commented Oct 20, 2023

I recently tried to build LLVM on Asahi Linux on my Mac Mini, which is equipped with a 8-core processor and 16 GiB of RAM. It's a relatively powerful machine but has limited memory. On that machine, I found that MOLD_JOBS works very well. Without it, compiler/linker processes would frequently get killed due to out-of-memory errors, but it compiles smoothly with MOLD_JOBS=1. So the benefit seems real.

Has anyone else had a similar experience?

@medhefgo
Copy link

In the above commit, I added an experimental feature to limit the number of active mold process to 1. By setting the environment variable MOLD_JOBS to 1, you can tell mold to acquire a file lock (it creates ~/.mold-lock) before doing anything meaningful.

Urgh. This kind of file belong in $XDG_RUNTIME_DIR or at least /tmp, but definitely not in $HOME.

@rui314
Copy link
Owner

rui314 commented Oct 22, 2023

Urgh. This kind of file belong in $XDG_RUNTIME_DIR or at least /tmp, but definitely not in $HOME.

Then a user on the same system could do a mild DoS attack against you by creating your lockfile in /tmp before you do and make it unreadable.

@NobodyXu
Copy link

Then a user on the same system could do a mild DoS attack against you by creating your lockfile in /tmp before you do and make it unreadable.

XDG_RUNTIME_DIR is usually aet to /run/user/$UID on systemd, only the current user has access to.

For the lockfile in /tmp, maybe putting the uid in /tmp is reasonable to prevent DoS attack?

@rui314
Copy link
Owner

rui314 commented Oct 22, 2023

@NobodyXu Thanks for the info! I wasn't aware that XDG_RUNTIME_DIR is a per-user directory. That'll work for us.

@rui314
Copy link
Owner

rui314 commented May 4, 2024

Now that we have the MOLD_JOBS environment variable to control concurrency, I think we can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests