Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new include() directive to MODULE.bazel files #21855

Closed
wants to merge 1 commit into from

Conversation

Wyverald
Copy link
Member

@Wyverald Wyverald commented Mar 29, 2024

This new directive allows the root module to divide its MODULE.bazel into multiple segments. This directive can only be used by root modules; only files in the main repo may be included; variable bindings are only visible in the file they occur in, not in any included or including files. See the docs for include() (in ModuleFileGlobals.java) for more details.

In follow-ups, we'll need to address:

  1. Enforcing the loaded files to have some sort of naming format (tentatively foo.MODULE.bazel where foo is anything)
  2. Making bazel mod tidy work with included files

RELNOTES: Added a new include() directive to MODULE.bazel files.

Fixes #17880.

@Wyverald Wyverald requested a review from meteorcloudy as a code owner March 29, 2024 02:14
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Mar 29, 2024
@Wyverald
Copy link
Member Author

Wyverald commented Mar 29, 2024

TODO:

  • Bikeshed. I'm not married to the name module_import. I'd almost prefer something like include, although this is more sanitary than C's #include.
  • Documentation, PR message, commit message, etc.
  • Figure out what to do with bazel mod tidy. We'll need to generate proper fixup events for imported/included segments, and then somehow get ModCommand to run a complex Skyframe evaluation without actually using Skyframe... ugh.
  • EDIT: just realized that I forgot about lockfile support (i.e. modifying an imported segment should trigger re-resolution). But hopefully with concurrent work on the new lockfile format, that won't be necessary.

@fmeum
Copy link
Collaborator

fmeum commented Mar 29, 2024

As this is our last chance to do so, are there any further restrictions we would want to apply? Some ideas:

  • module has to be called prior to the first module_import (in particular, not in an imported file).
  • can't use_extension an extension defined in a module that hasn't been added as a bazel_dep before or in the current file (in fact, we could enforce declaring the bazel_dep first in imported files without affecting backwards compatibility).

I'm also not sure whether BFS is the most intuitive ordering. If I branch out a cc.bzl with all my C++ setup and then further branch out embedded_toolchains.bzl and regular_toolchains.bzl from that, I wouldn't necessarily expect these toolchains to be registered after those in my top-level import java_native_toolchains.bzl.

Bikeshed. I'm not married to the name module_import. I'd almost prefer something like include, although this is more sanitary than C's #include.

I find module_import a bit too close to bazel_dep. In particular, it doesn't make it clear that you are importing a module file, not a module itself as an abstract concept.
I think that include does a good job at this and using a regular function call rather than some alien syntax gets the improved hygiene across.

We'll need to generate proper fixup events for imported/included segments

We have to make a choice here as to whether we want to build this logic into Bazel or buildozer. The latter doesn't operate on files that aren't explicitly mentioned as cli args yet and I don't know what it would take to change this. In any case, we'll have to deal with the complexity that the repo mapping for extension label resolution is spread across multiple files (see my second point on restrictions we may want to apply).

and then somehow get ModCommand to run a complex Skyframe evaluation without actually using Skyframe... ugh

Are you referring to the lockfile update logic? With the new lockfile format, use_repo changes and formatting would no longer affect the lockfile at all. I think that we could just get rid of the logic.

@fmeum
Copy link
Collaborator

fmeum commented Mar 29, 2024

More thoughts:

  • If we enforce that extension proxies are exported in imported files, this would greatly simplify the buildozer integration: We could just identify proxies by their name, which sidesteps the repo mapping complexity.
  • Buildifier has special formatting rules for files called MODULE.bazel. How about enforcing a certain naming scheme for the imported files, say MODULE.whatever.bazel? That's really the only easy way I see to ensure that the formatting is applied consistently.

@Wyverald
Copy link
Member Author

module has to be called prior to the first module_import (in particular, not in an imported file).

This is already enforced (module_import itself is a non-module call).

can't use_extension an extension defined in a module that hasn't been added as a bazel_dep before or in the current file (in fact, we could enforce declaring the bazel_dep first in imported files without affecting backwards compatibility).

Are you saying that, to use_extension("@rules_go..., a module file segment has to have bazel_dep(name="rules_go", first? That sounds overly restrictive -- since a bazel_dep on rules_go can essentially only appear once in all segments combined, this means that only one segment gets to use extensions from rules_go.

I'm also not sure whether BFS is the most intuitive ordering.

BFS is just used to load the segment files (and compile them). The actual ordering is "source order", so essentially DFS.

I think that include does a good job at this and using a regular function call rather than some alien syntax gets the improved hygiene across.

Agreed. If other people concur, I'll probably change it to something more like include. That does pose a bit of a backwards-compatibility concern -- right now I'm essentially treating module_import as a keyword, so it can't be used anywhere other than in a top-level call. But include is more likely to have been used as, say, a tag class name. Maybe include_segment? Or is that too unwieldy?

Are you referring to the lockfile update logic? With the new lockfile format, use_repo changes and formatting would no longer affect the lockfile at all. I think that we could just get rid of the logic.

Ah yeah, that would be great!

If we enforce that extension proxies are exported in imported files, this would greatly simplify the buildozer integration: We could just identify proxies by their name, which sidesteps the repo mapping complexity.

Could you clarify this a bit more? Why is this a new thing for imported files only?

Buildifier has special formatting rules for files called MODULE.bazel. How about enforcing a certain naming scheme for the imported files, say MODULE.whatever.bazel? That's really the only easy way I see to ensure that the formatting is applied consistently.

I don't know if enforcing a scheme is a good idea, especially if the scheme is anything more than a suffix check (MODULE\.\w+\.bazel is too weird to me). Formatting is kind of an ecosystem concern (as in, not enforced by Bazel itself), so IMO just having a convention should work fine enough.

@fmeum
Copy link
Collaborator

fmeum commented Apr 2, 2024

Are you saying that, to use_extension("@rules_go..., a module file segment has to have bazel_dep(name="rules_go", first? That sounds overly restrictive -- since a bazel_dep on rules_go can essentially only appear once in all segments combined, this means that only one segment gets to use extensions from rules_go.

Not necessarily in the same segment, but one that came before it. I.e., make sure that the repo mapping entries applied to labels are evident when reading the "concatenated" segments in order. But it's not a must if we can solve the bazel mod tidy in a different way, for example by going with exported proxies.

But include is more likely to have been used as, say, a tag class name. Maybe include_segment? Or is that too unwieldy?

I find include_segment reasonable too, but we should resolve the name clash issue separately. Could we make it so that include (or whatever we name it) is only treated as the special keyword until the first assignment statement involving include? That would give it the same semantics as any other global and avoid us having to use an unwieldy name just to make collisions less likely.

Could you clarify this a bit more? Why is this a new thing for imported files only?

In MODULE.bazel, you can currently do use_repo(use_extension(...), "foobar") just fine, so Bazel won't always have the exported name available to generate buildozer fixes with. We prohibited this for isolated usages and I think that doing so for module file segments would make sense. We could also enforce it for MODULE.bazel itself as unexported usages won't work with bazel mod tidy anyway, but that could be mildly breaking (I haven't seen this in practice yet though).

Formatting is kind of an ecosystem concern (as in, not enforced by Bazel itself), so IMO just having a convention should work fine enough.

It's unfortunately not that easy: bazel mod tidy applies buildifier formatting through buildozer, so Bazel does in fact enforce formatting (and there is not really any way around this since buildozer doesn't allow edits without formatting).

@Wyverald Wyverald force-pushed the wyv-module-import branch from daadadb to 0063cd4 Compare April 4, 2024 19:03
@Wyverald Wyverald changed the title Add a new module_import() directive to MODULE.bazel files Add a new include() directive to MODULE.bazel files Apr 4, 2024
@Wyverald
Copy link
Member Author

Wyverald commented Apr 4, 2024

Not necessarily in the same segment, but one that came before it. I.e., make sure that the repo mapping entries applied to labels are evident when reading the "concatenated" segments in order. But it's not a must if we can solve the bazel mod tidy in a different way, for example by going with exported proxies.

I think the "exported proxy" idea is probably better. We probably don't even need to enforce it -- if you use something like use_repo(use_extension(...)), mod tidy would just not work for you and you can't blame anyone else.

I find include_segment reasonable too, but we should resolve the name clash issue separately. Could we make it so that include (or whatever we name it) is only treated as the special keyword until the first assignment statement involving include? That would give it the same semantics as any other global and avoid us having to use an unwieldy name just to make collisions less likely.

I'm not worried about include = use_extension(...); that's very easy to fix. I'm worried about myext.include(...) or myext.mytag(include = ...), since these can't be changed without changing the API of the extension.

But anyway, I went with include for now and explicitly allowed include to be used as a tag name. This is kind of like Java making var a "contextual keyword"; include is basically a keyword in MODULE.bazel files except that it can be used as a tag class name or an attr name.

It's unfortunately not that easy: bazel mod tidy applies buildifier formatting through buildozer, so Bazel does in fact enforce formatting (and there is not really any way around this since buildozer doesn't allow edits without formatting).

IMO that's fine? Still doesn't necessarily mean we need to enforce a convention for segments.

@fmeum
Copy link
Collaborator

fmeum commented Apr 5, 2024

IMO that's fine? Still doesn't necessarily mean we need to enforce a convention for segments.

We might get away with no convention, but I'm not fully certain: In addition to conditional formatting (say spacing between extension usages and their use_repo), the file type also determines the types of commands (add_use_repo) and cleanups (removing empty use_repo statements that result from commands) that buildozer will run on a given file. We may have to remove these restrictions for bazel mod tidy to run on a segmented file, which could, at least in theory, have implications for the formatting of unrelated .bzl files.

Independent of the technical challenges, I'm also a strong proponent of first-class tool integration (think go fmt). We certainly don't have to make the tools behave perfectly on day 1 of merging this PR, but I feel bad about adding new "implicit" file types that have no clear path for tools support. For example, how could an LSP implementation know that the file being edited should have access to module file globals rather than build globals? By the time someone has implemented this in a Starlark LS, everyone will already have their own unenforced conventions (think .cpp/.cc/.cxx/...) and the LS will have a hard time.

How about a MODULE.bazel. prefix instead of a suffix? @vladmos Do you happen to have insights about tooling support for new file types that you could share?

@Wyverald
Copy link
Member Author

Hmm, I understand the concern better now. I think maybe acutally <whatever>.MODULE.bazel would work. It's very clear that it's some kind of a MODULE.bazel file, and the .bazel suffix suggests that it's a BUILD-like file (i.e. Starlark with no if, def, etc). And I'd be okay with enforcing that; if it SGTY, I can make the changes either in this PR or later (but before 7.2).

Anyway, I just added some more documentation, and the implementation should be ready for review.

if (node.getName().equals(INCLUDE_IDENTIFIER)) {
// If we somehow reach the `include` identifier but NOT as the other allowed cases above,
// cry foul.
error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that extension proxies called include will break with this. I think that's fine, but technically not backwards compatible for 7.2.0. But maybe that's okay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... that's indeed annoying. I kind of just brushed it off (because who names their extension/proxy "include"??) but it's indeed a hard breakage, and potentially non-addressable if the usage is in a dep (not your root module).

I suppose we could further soften the impact by allowing include to be the sole occupant of the left-hand side of an assignment. But with code like

my_ext = use_extension(...)
include = my_ext.include
include(attr = "abc")

we'll still incorrectly complain about keyword arguments being used with the include directive. I guess that's a much smaller blast radius.

This crucially still doesn't allow something like

blah = include
blah("//:abc.MODULE.bazel")

which would cause a runtime error because we don't pre-load //:abc.MODULE.bazel but will need it during evaluation. That was what I wanted to avoid the most.

Copy link
Collaborator

@fmeum fmeum Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we return from the visitor after the first assignment to include? That would be similar to how you can override global symbols in Starlark.

Edit: and don't handle any include from that point on in a special way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea -- basically removes the risk. done.

* TODO: This field is a hack. It's not needed by anything other than {@code ModCommand}, during
* the {@code bazel mod tidy} command. Doing it this way assumes that {@code bazel mod tidy}
* cannot touch any included segments. This is unsatisfactory; we should do it properly at some
* point, although that seems quite difficult.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it as a follow-up. Should we file an issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #22063

public void addOverride(String moduleName, ModuleOverride override) throws EvalException {
ModuleOverride existingOverride = overrides.putIfAbsent(moduleName, override);
if (existingOverride != null) {
throw Starlark.errorf("multiple overrides for dep %s found", moduleName);
}
}

public InterimModule buildModule() throws EvalException {
public InterimModule buildModule(@Nullable Registry registry) throws EvalException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just a mostly unrelated mini-refactor because the registry isn't used until the very end where the module is built. Before, we passed in registry to the constructor of the thread context, which is a little bit unnecessary.

@meteorcloudy
Copy link
Member

@Wyverald Is it the right time to update the PR description now? I have a rough understanding by reading the code, but want to better confirm the ideas in this PR.

@Wyverald
Copy link
Member Author

ah sorry, I updated the commit message but forgot about the PR description. Updated

This new directive allows the root module to divide its MODULE.bazel into multiple segments. This directive can only be used by root modules; only files in the main repo may be included; variable bindings are only visible in the file they occur in, not in any included or including files. See the docs for `include()` (in `ModuleFileGlobals.java`) for more details.

RELNOTES: Added a new `include()` directive to MODULE.bazel files.

Fixes #17880.
Copy link
Collaborator

@fmeum fmeum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with few comments

@Override
public void visit(DotExpression node) {
visit(node.getObject());
if (includeWasAssigned || !node.getField().getName().equals(INCLUDE_IDENTIFIER)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the includeWasAssigned needed here? If so, please update the comment.

@Test
public void checkSyntax_good_includeIdentifierReassigned() throws Exception {
String program = """
include = print
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add one legitimate call to include before this assignment?

scratch.overwriteFile(
rootDirectory.getRelative("MODULE.bazel").getPathString(),
"module(name='aaa')",
"include('@haha//java:MODULE.bazel.segment')");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could already update the naming scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry, I didn't realize you left new comments. I'll send a follow-up

[
'module(name="foo")',
'bazel_dep(name="bbb", version="1.0")',
'include("//java:MODULE.bazel.segment")',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Apr 22, 2024
@brentleyjones
Copy link
Contributor

Will this go into 7.2?

@Wyverald Wyverald deleted the wyv-module-import branch April 22, 2024 15:05
@Wyverald
Copy link
Member Author

Will this go into 7.2?

yes -- I'll prepare the cherry-pick PRs later

Wyverald added a commit that referenced this pull request Apr 30, 2024
This new directive allows the root module to divide its `MODULE.bazel` into multiple segments. This directive can only be used by root modules; only files in the main repo may be included; variable bindings are only visible in the file they occur in, not in any included or including files. See the docs for `include()` (in `ModuleFileGlobals.java`) for more details.

In follow-ups, we'll need to address:
1. Enforcing the loaded files to have some sort of naming format (tentatively `foo.MODULE.bazel` where `foo` is anything)
2. Making `bazel mod tidy` work with included files

RELNOTES: Added a new `include()` directive to `MODULE.bazel` files.

Fixes #17880.

Closes #21855.

PiperOrigin-RevId: 627034184
Change-Id: Ifc2f616cf0791445daeeac9ca5ec4478e83382aa
Kila2 pushed a commit to Kila2/bazel that referenced this pull request May 13, 2024
This new directive allows the root module to divide its `MODULE.bazel` into multiple segments. This directive can only be used by root modules; only files in the main repo may be included; variable bindings are only visible in the file they occur in, not in any included or including files. See the docs for `include()` (in `ModuleFileGlobals.java`) for more details.

In follow-ups, we'll need to address:
1. Enforcing the loaded files to have some sort of naming format (tentatively `foo.MODULE.bazel` where `foo` is anything)
2. Making `bazel mod tidy` work with included files

RELNOTES: Added a new `include()` directive to `MODULE.bazel` files.

Fixes bazelbuild#17880.

Closes bazelbuild#21855.

PiperOrigin-RevId: 627034184
Change-Id: Ifc2f616cf0791445daeeac9ca5ec4478e83382aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support (restrictive) load in MODULE.bazel
4 participants