Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul build file management with new build.mill/package.mill format #3426

Merged
merged 150 commits into from
Sep 3, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Aug 27, 2024

This PR overhauls how build files are handled, especially in subfolders. The goal is to come up with an approach that is (1) scalable to large builds and (2) works intuitively with how people expect things to work and (3) plays well with IDEs (4) converges with the Scala language to reduce the special casing we have to do.

There's a bunch of hackiness in the implementation, but the end result is that when I load the updated 10-multi-file-build project into IntelliJ, navigation around the multi-file project works seamlessly (after adding the .mill file association):

Screenshot 2024-08-31 at 4 35 16 PM

VSCode works as well, though it only understands .scala extensions and will likely need to be patched to work with .mill

Screenshot 2024-08-31 at 4 49 46 PM

This means that to begin with, all the IDE tooling around Mill just works, with the IDE being blissfully unaware of the nasty code transformations and other things Mill does. From that base, we can slowly look into removing boilerplate in an IDE/language-compliant way, and removing the backend hackiness while keeping the user experience mostly unchanged

Major user-facing changes:

  1. build.sc is now build.mill, sub-folders can define modules in package.mill files. Helper libraries live in files like foo/bar.mill

    • Long term Mill needs our own extension because .sc is overloaded to work with Ammonite/Scala-CLI scripts. That means IDEs like IntelliJ treat them specially, in ways that are incompatible with Mill (e.g. IDEs don't understand importing between script files). .sc is still supported for migration/compatibility reasons. Editors don't currently associate .mill with Mill Scala code, but it's a few clicks for the end user and a trivial PR to send to IntelliJ and Metals to support it
    • .scala as an extension could possibly work from a technical level, but there will still be user-facing confusion if we want Mill to be able to target non-Scala developers, and possibly technical confusion for tools that can't differentiate between Mill build files and application source files
  2. All .mill files need a package declaration at the top, e.g. build.mill needs package build, foo/bar/qux.sc needs package build.foo.bar. I left it optional for the root build.mill for migration purposes

    • This is necessary for full IDE support, as editors do not understand the "things in subfolders are automatically in packages" thing that Mill and Ammonite did in the past. We had hacks around import $file, but they never were super robust, and import $file is itself a hack we've discussed getting rid of
    • We can look into making package declarations optional again in future, but for now this gives us working IDE support with minimal effort, converging with vanilla Scala, at the cost of a little boilerplate per-file.
  3. No more import $file: any files with .mill (or .sc) extensions adjacent to a build.mill or package.mill are treated as Mill files

    • This simplifies file discovery considerably, v.s. the previous approach that required multiple-passes of parsing and traversing all imported scripts as a graph. It also aligns things more with how Scala and other JVM projects work, where files are typically picked up by folder regardless of imports

Major internal changes:

  1. We move the handling of RootModule from runtime reflection/classpath-scanning/custom-resolve-logic to code-generation logic that "unpacks" any object module, object build, or object foo extends RootModule into the enclosing wrapper class

    • We can do this as a compiler plugin, which would help preserve line numbers and avoid showing generated code, but it also makes things terribly hard to debug when things go wrong
    • This allows us to ensure RootModules are handled uniformly from Scala code and from the command line, where previously Scala code would need to write bar.qux.module.mytask suffix while the CLI would just write bar.qux.mytask
    • By making the unpacking name-based, rather than type-based, this removes the "what should we name the root module?" degree of freedom that really shouldn't be necessary and ensures everyone names them consistently
  2. Replace the package objects previously generated for root modules with normal objects named build or module (reflecting the file names)

    • package objects are on their way out in Scala 3 https://docs.scala-lang.org/scala3/reference/dropped-features/package-objects.html. No concrete timeline, but good to stop using them. They also are a common source of bugs around compilation, incremental compilation, etc. since even their present-day implementation isn't terribly robust
    • This also removes the need to refer to build as build.package since it is now a normal object
  3. Generate final def aliases in the generated objects in order to make code references and module discovery work

    • These aliases allow both Scala-code references to modules without needing this .build or .module suffix, as well as allow the Resolve.scala logic to work
    • We can thus eliminate a whole bunch of gnarly code plumbing multiple-root-modules all over the Mill codebase and gnarly classpath-scanning code to try and identify the sub-folder root modules and wire them up.
    • These aliases should more reliably generate conflicts than overlapping package object/objects, which seemed pretty flaky

Limitations:

  • Partially migrating subfolders out of a parent build.sc into a child module.sc no longer works. The aliases we generate for subfolder module.sc files always conflicts at compile time with any locally-defined modules

  • References to helper files e.g. foo/bar.sc with def qux would get referenced via build_.foo.bar.qux is kind of awkward. I expect this to go away once we get Scala 3 support, and we can use export clauses to allow referencing them via build.foo.qux, which is more in line with how Scala code normally works where different files in the same folder are all part of the same combined namespace.

  • To work around package object weirdness e.g. needing to reference things via foo.bar.package, we build our own parallel object hierarchy via codegen and provide that to the user. This is awkward, but it looks similar enough people shouldn't notice it, and we can look into somehow fixing the package object issues upstream in scala/scala3 in future

  • The package build prefix is kind of verbose, but it's necessary so we have some way of reliably referencing other files by their fully qualified names. e.g. a file in util/package.mill can be referenced by build.util.*, but cannot be referenced by util.* because import mill._ pulls in mill.util which shadows it.

@lihaoyi lihaoyi force-pushed the multi-build-package branch from 80ec89e to 5b46280 Compare August 29, 2024 00:53
@lihaoyi lihaoyi changed the title Overhaul multi-file builds Overhaul build file management with new build.mill/package.mill format Sep 3, 2024
@lihaoyi
Copy link
Member Author

lihaoyi commented Sep 3, 2024

Going to merge this and proceed with the dogfooding, rebootstrapping, and cutting of RC1. There's definitely issues we'll discover, and further things we need to do. But that can happen during the RC period

@lihaoyi lihaoyi marked this pull request as ready for review September 3, 2024 06:31
@lihaoyi lihaoyi merged commit 9baac5d into com-lihaoyi:main Sep 3, 2024
33 checks passed
@lefou lefou added this to the 0.12.0 milestone Sep 3, 2024
@nafg
Copy link
Contributor

nafg commented Sep 3, 2024

.scala as an extension could possibly work from a technical level, but there will still be user-facing confusion if we want Mill to be able to target non-Scala developers, and possibly technical confusion for tools that can't differentiate between Mill build files and application source files

Have you considered something like .mill.scala?

This way most tools (e.g. bat) will continue working, but I think that solves the concern in the quote?

@lolgab
Copy link
Member

lolgab commented Sep 3, 2024

.scala as an extension could possibly work from a technical level, but there will still be user-facing confusion if we want Mill to be able to target non-Scala developers, and possibly technical confusion for tools that can't differentiate between Mill build files and application source files

Have you considered something like .mill.scala?

This way most tools (e.g. bat) will continue working, but I think that solves the concern in the quote?

That's a great idea! Otherwise, even the most basic syntax highlighter needs to know what Mill is. This happens already for sbt, but .mill.scala could disambiguate Mill files while avoiding making all editors in the world that are aware of Scala, having to know that .mill is an extension for Scala.

I see the value of hiding Scala to Java developers, but it is no doubt that the build file is a Scala file and to master it you need to know some Scala :)

Gradle started out with build.gradle and now it is migrating to Kotlin and it's build file is called... build.gradle.kt, which is exactly equivalent to build.mill.scala. It's another example which makes me think it is the right thing to do.

@lihaoyi
Copy link
Member Author

lihaoyi commented Sep 3, 2024

I considered .mill.scala and .mill.sc, but decided to go with .mill

I think net-net, the cost of adding a new file extension is not that large. Sure, initially you get zero support and everything sucks, but most editors allow you to associate a file extension manually, and it's usually a trivial PR to each of the various editors or environments (e.g. Github linguist) to associate the new extension.

The major thing that I think we need to be more careful about is the semantics. Teaching an editor a a file extension is trivial; teaching an editor custom file-type identification logic or scoping/identifier-resolution logic (which is what would be needed to properly support the old import $file-based scripts) is much more difficult. There's a reason that after 7 years, IntelliJ still doesn't know whether something is an Mill build, Ammonite script, Scala-CLI script, or IntellIJ worksheet. And it's not for lack of trying: the IntelliJ folks implemented import $file support (kinda), let it bitrot, and had to be pestered to reverse engineer it later to bring it back (the original implementors had already left). Better to make things easier for them so they don't need to jump through so many hoops to support Mill.

Overall, a new file extension seems big, but I bet in the next week or so we'll be able to send PRs to most of the major ones, and that's that. By the time 0.12.0-RC1 is even cut it should already be a non-issue. Whether we use .mill or .mill.sc or .mill.scala then becomes immaterial. After that the benefits of conciseness (.mill.scala is quite a mouthful) and encapsulation (it's a Mill script, not a Scala application) win out

@lefou
Copy link
Member

lefou commented Sep 3, 2024

There's a reason that after 7 years, IntelliJ still doesn't know whether something is an Mill build, Ammonite script, Scala-CLI script, or IntellIJ worksheet.

I guess that's because build.sc files were already supported as Ammonite script very early. And even Ammonite re-used .sc from workbooks before.

You make it sound easy, but making a new file extension recognized is hard, esp. if there are already other file extensions for the same type. (I know many editors prefer to guide the user to use one standard instead of many, e.g .adoc vs .asciidoc.)

While *.sc files are scripts paired with some semantics, *.scala files are never standalone, they need a version and a classpath to have any support in IDEs. And any IDE will need to have this support for Mill anyways, otherwise it's just syntax highlighting, which would work reasonably better with .mill.scala.

// Use Fastparse's Instrument API to identify top-level `object`s during a parse
// and fish out the start/end indices and text for parts of the code that we need
// to mangle and replace
private class ObjectDataInstrument(scriptCode: String) extends fastparse.internal.Instrument {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @bishabosha this parsing logic may need to be ported to the Scala 3 port. Not sure how the Scala 3 import $ivy parser works, but I assume you have some kind of AST, which may make things a lot easier than the visitor-style parsing I have to do here using ScalaParse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is probably next for me

@lefou
Copy link
Member

lefou commented Sep 4, 2024

I think net-net, the cost of adding a new file extension is not that large. Sure, initially you get zero support and everything sucks, but most editors allow you to associate a file extension manually, and it's usually a trivial PR to each of the various editors or environments (e.g. Github linguist) to associate the new extension.

I think this equation is wrong. What I know, avoiding the .mill extension on our side is rather cheap. Compared to the count of editors on the planet. And even if you get your PRs accepted eventually in five years to all these editors you think are relevant, you often can't use the latest version or need sometimes work with even another editor or a formatter or tools/libs like editorconfig. We even don't have an idea what editors are used, e.g. I often use a simple CLI editor like nano to make bumps in maintained projects. Some editors I use are implemented in languages I'm not proficient in. I don't think opening a PR or whatever the workflow needed for it is easy at all. Let alone the social factor. How to convince an arbitrary editor maintainer who has to maintain hundrets of languages, that she needs to add yet another extention for Scala, just for a little niche build tools she never has heard about? How to convince anyone, if we even don't agree in our team? You're choice seems pragmatic, but dealing with it for others is everything but easy. We loose what we already had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants