Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process#run broken with defunct child process with preview_mt #12392

Open
phil294 opened this issue Aug 16, 2022 · 2 comments
Open

Process#run broken with defunct child process with preview_mt #12392

phil294 opened this issue Aug 16, 2022 · 2 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency

Comments

@phil294
Copy link

phil294 commented Aug 16, 2022

Bug Report

Hi,
this seems very similar to #12241, however that issue is about the interpreter. This one is not, and the setup is different.

Crystal 1.5.0 (2022-07-21). LLVM: 14.0.6. Default target: x86_64-pc-linux-gnu

This source

spawn do
	i = 0
	50_000_000_000_i64.times do
		i += 1
	end
end

150.times do
	stdout = IO::Memory.new
	puts "before run"
	Process.run("ls", output: stdout)
	puts "after run"
	sleep 10.milliseconds
end

With -Dpreview_mt, in development (without --release), this is the output:

 before run
 after run
 before run

And then it freezes. That is, until the top thread finishes, after which the remaining processes get created properly.

Sleep in the code is not necessary, but increasing sleep also does not help.

This does not occur without the upper blocking thread. Doesn't matter what it's doing, as long as it is blocking. In my case it's just some C event loop. Also does not occur without the process output option.

preview_mt is necessary - normal mode and replacing spawn with Thread.new instead works fine.

Increasing CRYSTAL_WORKERS leads more iterations. In other words, every Process.run occupies a worker thread but does not release it again.

ps shows a single [ls] <defunct>.

When you build with --release however, the blocking is gone and it executes as expected. (edit: Okay, I'm now also seeing this with release mode, but in a more complicated setup, leading me to believe it's not directly coupled to release)

@phil294 phil294 added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Aug 16, 2022
@straight-shoota
Copy link
Member

straight-shoota commented Aug 16, 2022

When the stdout parameter is an IO, the process implementation creates a file system pipe, hands that to the forked process and then spawns a fiber to copy from the pipe into the IO.

That's the most likely reason for this to break.
I suspect one of the copy fibers gets scheduled on the same thread as the blocking work fiber. The mt_preview scheduler does not do work stealing, so it sits there until the blocking fiber finishes. And Process.run in the main fiber blocks until all copy fibers are completed.

For this reason, spawning a fiber with blocking code is not recommended. Fibers in Crystal are expected to be cooperative. The documentation requests to insert Fiber.yield into computation-heavy tasks (https://crystal-lang.org/api/1.5.0/Fiber.html#cooperative). A blocking C function cannot fulfill that.

We're currently missing a proper way to implement blocking function calls in a dedicated thread. This is discussed in #11778.

Using the undocumented Thread API is probably the best solution for your use case.
Alternatively, you could perhaps replace the IO for stdout with a pipe directly and read from that directly without the involvement of copy fibers.

@ysbaddaden
Copy link
Contributor

It can also block if the fiber that handles signals is on the same thread as the CPU-bound fiber, because Process.run waits for SIGCHLD to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency
Projects
None yet
Development

No branches or pull requests

3 participants