-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSP: importing occasionally gets stuck: mill seems deadlocked #3100
Comments
Thanks for reporting this. Yeah, a reproducer would be nice, even if it's not small. My suspicion is that, although the task graph is ordered, there is no guarantee that in two different task graphs all elements are ordered the same. |
Besides the BSP client sending requests in parallel, do you also have parallel task processing enabled? I suspect not. It could be that the locked tasks not really depend on each other and that enabling To understand the issue, we should concentrate on the ordering of the task execution plan. I think the ordering of tasks is not total, which means the ordering of task that do not depend on each other is not defined. The deadlock might be obvious, if we could look at the execution plans. |
Still posting the stuff below in case I'm wrong, but I just now realized: isn't this stack trace from creating the actual task objects, rather than evaluating any tasks? I can reliably make a deadlock with this example: import mill._
import scalalib._
object left extends Module {
def task = T { right.otherTask() }
def otherTask = T { 0 }
}
object right extends Module {
def task = T { left.otherTask() }
def otherTask = T { 0 }
}
object bspDummy1 extends JavaModule {
def sources = T {
left.task()
Seq.empty[PathRef]
}
}
object bspDummy2 extends JavaModule {
def resources = T {
right.task()
Seq.empty[PathRef]
}
} if I manually send diff --git a/main/define/src/mill/define/Task.scala b/main/define/src/mill/define/Task.scala
index c8c41d746e..338d5b54ce 100644
--- a/main/define/src/mill/define/Task.scala
+++ b/main/define/src/mill/define/Task.scala
@@ -356,16 +356,39 @@ object Target extends Applicative.Applyer[Task, Task, Result, mill.api.Ctx] {
val lhs = Applicative.impl0[Task, T, mill.api.Ctx](c)(reify(Result.create(t.splice)).tree)
- mill.moduledefs.Cacher.impl0[Target[T]](c)(
- reify(
- new TargetImpl[T](
- lhs.splice,
- ctx.splice,
- rw.splice,
- taskIsPrivate.splice
+ val taskName = c.enclosingDef.symbol.fullName
+ val slowTaskConstructions = Set(
+ "millbuild.build.right.otherTask",
+ "millbuild.build.left.task",
+ "millbuild.build.left.otherTask",
+ "millbuild.build.right.task",
+ )
+
+ if(slowTaskConstructions.contains(taskName)) {
+ mill.moduledefs.Cacher.impl0[Target[T]](c)(
+ reify({
+ Thread.sleep(500)
+ new TargetImpl[T](
+ lhs.splice,
+ ctx.splice,
+ rw.splice,
+ taskIsPrivate.splice
+ )
+ })
+ )
+ } else {
+ mill.moduledefs.Cacher.impl0[Target[T]](c)(
+ reify(
+ new TargetImpl[T](
+ lhs.splice,
+ ctx.splice,
+ rw.splice,
+ taskIsPrivate.splice
+ )
)
)
- )
+ }
}
def targetResultImpl[T: c.WeakTypeTag](c: Context)(t: c.Expr[Result[T]])( This sequences the construction of task objects such that Just my reduced I've reduced our build and it still reproduces about 1 in 5 times now, but it no longer needs sources / has long source generation tasks and whatnot, so I'd call it a net win :). This is with One example of two task dependency chains (obtained from stack traces) that cause a deadlock in this example now is:
The former locks
No, just the default build.scimport $ivy.`com.lihaoyi::mill-contrib-buildinfo:`
import os._
import mill._
import scalalib._
import contrib.buildinfo.BuildInfo
object viper extends ScalaModule {
def scalaVersion = "2.13.12"
object silver extends ScalaModule {
override def scalaVersion = "2.13.10"
}
object silicon extends ScalaModule {
object buildInfo extends BuildInfo with ScalaModule {
def scalaVersion = "2.13.12"
def buildInfoPackageName = "viper.silicon"
}
object common extends ScalaModule {
override def scalaVersion = "2.13.10"
override def moduleDeps = Seq(silver)
}
override def scalaVersion = "2.13.10"
override def moduleDeps = Seq(silver, common, buildInfo)
}
object carbon extends ScalaModule {
override def scalaVersion = "2.13.10"
}
override def moduleDeps = Seq(silver, silicon, carbon)
}
object vercors extends Module {
object hre extends ScalaModule {
def scalaVersion = "2.13.12"
override def moduleDeps = Seq(pprofProto)
object pprofProto extends ScalaModule {
def scalaVersion = "2.13.12"
}
}
object col extends ScalaModule {
def scalaVersion = "2.13.12"
override def moduleDeps = Seq(serialize)
}
object serialize extends ScalaModule{
def scalaVersion = "2.13.12"
}
object main extends ScalaModule {
def scalaVersion = "2.13.12"
override def moduleDeps = Seq(col, buildInfo)
object buildInfo extends BuildInfo with ScalaModule {
def scalaVersion = "2.13.12"
def callOrElse(command: Shellable*)(alt: => String): String =
try {
os.proc(command: _*).call().out.text().trim
} catch {
case _: SubprocessException => alt
}
def gitVersionTag = T.input {
val tags = callOrElse("sleep", "0.001")("").split("\n")
tags.collectFirst {
case tag if tag.matches("^v[0-9].*") => tag
case "dev-prerelease" => "dev-prerelease"
}
}
def gitVersion = T.input {
gitVersionTag() match {
case Some(tag) if tag.startsWith("v") => tag.substring(1)
case _ => "9999.9.9-SNAPSHOT"
}
}
def buildInfoPackageName = "vct.main"
override def buildInfoMembers = T {
Seq(
BuildInfo.Value("version", gitVersion()),
BuildInfo.Value("scalaVersion", main.scalaVersion()),
)
}
}
}
} |
Just hit this as well using IntelliJ import, on Mill's own codebase
|
I think the issue is that concurrent lazy initialization of the module tree can cause deadlocks in a scenario like object Foo{
val bar = Baz
}
object Baz{
val qux = Foo
} In this case, A thread accessing
Not totally sure, but that's what it seems the stack trace is describing. In the case of Mill, the |
From the stack trace, it seems the issue is that IntelliJ is asking Mill to compute both Maybe we can put a big coarse-grained lock around BSP, so that Mill only serves one ongoing BSP request at a time? In general I don't think Mill's codebase is designed to handle concurrency at the moment, and running the requests sequentially should be fast enough due to caching that BSP-request-level parallelism isn't necessary. While we could put in the work to make the entire codebase threadsafe, it feels like just forcing a coarse-grained lock is good enough. That's what Bazel does, to an even greater extent (they disallow multiple processes running at the same time as well) |
Maybe, we can lock the module-initialization phase and after that revert to the current behavior? |
That works too. I guess it depends on how much we really want to support concurrent evaluation. My feeling is that the long tail of concurrency issues we're going to be dealing with would outweigh the marginal benefits of occasionally being able to run the evaluator in parallel in a single JVM, which basically only happens during BSP and nowhere else (since the CLI runs concurrent Mill commands in separate JVMs anyway) |
One thing to consider is that because we only run parallel Mill evaluations within the same JVM in BSP, that means we should expect it to continually surface bugs that do not get caught otherwise in interactive use. If we ran Mill parallel within the same JVM in normal usage, at least both BSP and CLI would go through the same code paths, and bugs found and fixed in one would stay fixed in the other. But if we continue to have BSP go thread-parallel while CLI go process-parallel, the bugs in each case would be independent, and we would find it impossible to reproduce BSP concurrency issues via the CLI Given that, it seems pretty clear to me that putting a coarse-grained lock around the BSP code path is the right thing to do. Better one standardized well-tested code path than having two totally different concurrency model, especially given how tricky concurrency issues are to surface, reproduce, and fix. I think if people had to make a choice between slightly-slower-BSP-import v.s. nondeterministic deadlocks and race conditions, they would pick the slightly-slower-BSP-imports every time |
Importing our project via BSP seems to get stuck. Various tasks like "sources", "dependency sources" etc. sometimes stay running forever, then gets killed by IntelliJ after 10 minutes. I'd say this happens about 60% of the time, varying by machine / the weather.
Inspecting the mill process with VisualVM reports there is a deadlock:
(Full Thread Dump)
It reports:
These threads are rooted in:
mill.bsp.worker.MillBuildServer$anonfun$buildTargetDependencySources$2(MillBuildServer.scala:346)
mill.bsp.worker.MillBuildServer.$anonfun$buildTargetResources$2(MillBuildServer.scala:404)
respectively, so indeed two different bsp requests. Here are some lines from the
dependencySources
stack trace from the full report above:The other stack trace has a similar sequence of cache misses, and it so happens that the other one locks the
viper.silicon.buildInfo
module earlier in the stack, and then tries to lockviper.silicon
(already locked here). I don't even really understand how it's possible that tasks have cache misses like this after they are ordered, but maybe my mental model is off.I should say I'm happy to provide
build.sc
etc. but it's quite large, and I'm having trouble reproducing this outside IntelliJ, or with smaller examples. Maybe you already have some thoughts on the cause, or I can try a bit harder to reducebuild.sc
, feel free to let me know. Thanks!The text was updated successfully, but these errors were encountered: