Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove import-path guessing from ProtoCompileActionBuilder #10939

Closed

Conversation

Yannic
Copy link
Contributor

@Yannic Yannic commented Mar 10, 2020

Instead of trying to guess the correct source root and import-path of a proto file,
we save the real source root together with the .proto source when creating the
ProtoInfo provider and re-use that information for codegen actions.

@Yannic Yannic requested a review from lberki as a code owner March 10, 2020 21:37
@Yannic Yannic force-pushed the proto_deterministic_import_path branch from 11087ad to 2a94760 Compare March 10, 2020 23:23
@lberki
Copy link
Contributor

lberki commented Mar 16, 2020

I seem to remember that my concern with this exact change was RAM use (that's why we use the otherwise-not-very-reasonable "null as import path" pattern). Let me run the numbers.

@Yannic
Copy link
Contributor Author

Yannic commented Mar 17, 2020

Let me actually change this patch to always use null if source_root + import_path == artifact.exec_path and see whether there are cases left where that doesn't hold. If not, we can get rid of the map altogether.

@lberki
Copy link
Contributor

lberki commented Mar 17, 2020

How would using null if source_root + import_path == exec_path help? Then you only have the exec path when you generate the command line, and you can't tell where the import path starts. You might be able to get away with some smarter guessing logic -- the important use case at Google is when the import root is the root of the main repository, so if you signal that with a null, then RAM use becomes much less.

@Yannic
Copy link
Contributor Author

Yannic commented Mar 17, 2020

Let's go a step back and see what we want to achieve here.

The goal of this is to get the import paths of all protos without guessing it. I see two possible options here:

  • Compute them at proto_library level and save them in ProtoInfo, or
  • Make it possible to derive them from the artifact and its proto_source_root.

The first option would be the cleanest, but it seems like that's unfeasible because of memory consumption.
For the second option, we need to ensure that proto_source_root + import_path == exec_path and have a map proto_source_root -> files (Map<PathFragment, NestedSet<Artifact>> [1]) in ProtoInfo. That would give us access to everything needed to derive the import path and build the command-line.

[1] possibly implemented as NestedSet<Pair<PathFragment, NestedSet<Artifact>>> and only flattened at execution time to keep construction O(direct deps) instead of O(transitive deps). Actually, we may not even need to flatten it into a map at all.

@lberki
Copy link
Contributor

lberki commented Mar 17, 2020

Yeah, that NestedSet<Pair<PathFragment, NestedSet<Artifact>>> seems like a reasonable way to go. Then you'd still store duplicate information in ProtoInfo because e.g. transitiveProtoSources and transitiveProtoSourceRoots are derivable from it, but I think that's an acceptable amount of collateral damage if the memory use is not too large (the asymptotic memory use is certainly good, but the constant factor may still kill this)

@Yannic Yannic force-pushed the proto_deterministic_import_path branch from 3a14067 to cfbd0c6 Compare March 18, 2020 13:24
Instead of trying to guess the correct import-path of a proto file,
we save their associated source root in ProtoInfo and compute the
import path by taking the proto files's exec path relative to that
source root.
@Yannic Yannic force-pushed the proto_deterministic_import_path branch from cfbd0c6 to 7ab5bd3 Compare March 18, 2020 15:42
@Yannic
Copy link
Contributor Author

Yannic commented Mar 18, 2020

Modified the change to use NestedSet<Pair<PathFragment, ImmutableList<Artifact>>> instead of guessing the import path. I was also able to remove some fields from ProtoInfo that are no longer needed (strictImportableProtoSources* also becomes obsolete, but these are a bit trickier to remove so I'll leave that for a follow-up). ptal.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

Good job!

I ran our internal benchmark and this results in net improvement of memory use: 0.1%. This significantly exceeds my expectations of any such change having an unacceptable amount of memory overhead. Let me review the pull request in more detail.

private final String outputDirectory;
private final NestedSet<String> directProtoSourceRoots;
private final boolean siblingRepositoryLayout;
static final class ExpandProtosUnderSingleSourceRootToImportPathsArgsFn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unused now (or I just can't find the references to this?)

protoInfo.getStrictImportableProtoSourceRoots(),
protoInfo.getTransitiveProtoSources(),
siblingRepositoryLayout);
areDepsStrict ? protoInfo.getImportableProtos() : null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind removing the siblingRepositoryLayout variable from line 266? (it's unused now as far as I can tell)

private final Artifact directDescriptorSet;
private final NestedSet<Artifact> transitiveDescriptorSets;
private final NestedSet<Pair<PathFragment, ImmutableList<Artifact>>> importableProtos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add back the strict prefix, i.e. call this strictImportableProtos

@@ -682,72 +617,52 @@ private static String guessProtoPathUnderRoot(
}
};

private static String computeImportPath(PathFragment protoSourceRoot, Artifact proto) {
PathFragment importPath = proto.getExecPath().relativeTo(protoSourceRoot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not inline the importPath variable?


@AutoCodec
@AutoCodec.VisibleForSerialization
static final class ExpandToImportPathsArgsFn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt: you can make these lambdas instead of inner classes it you want

NestedSet<Pair<PathFragment, ImmutableList<Artifact>>> exportedProtos) {
NestedSetBuilder<Pair<PathFragment, ImmutableList<Artifact>>> protos =
NestedSetBuilder.stableOrder();
protos.addTransitive(exportedProtos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with how transitiveProtoSources (the NestedSet<Artifact>) is computed: that is only local sources + deps and this one is local sources + deps + exports. Make this consistent (or else explain in a comment why the discrepancy exists)

this.sourceRoot = sourceRoot;
this.importPathSourcePair = importPathSourcePair;
}

public ImmutableList<Artifact> getSources() {
return sources;
return ImmutableList.copyOf(Iterables.transform(importPathSourcePair, p -> p.first));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, opt: you can replace p -> p.first with Pair::getFirst (I'm not sure if it's an improvement)

for (ProtoInfo info : ruleContext.getPrerequisites("exports", TARGET, ProtoInfo.PROVIDER)) {
protos.addTransitive(info.getExportedProtos());
}
if (library.getSources().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that computeExportedProtos at HEAD does not special-case rules without sources and this one does. Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

I see a bunch of failures in our internal test battery (unfortunately, I am too pressed for time to investigate myself). There are two kinds:

  1. When a command line is expected to contain --allowed_public_imports= but instead contains --allowed_public_imports <list of .proto files>
  2. One where the command line is expected to contain -Ifoo/bar.proto=blaze-out/k8-fastbuild/bin/foo/bar.proto, but instead it contains -Iblaze-out/k8-fastbuild/bin/foo/bar.proto=blaze-out/k8-fastbuild/bin/foo/bar.proto

My guess is that the latter is related to buggy import part computation (I can fix it when I import it if you have no idea) and the former is related to the export set computation you changed (I'll take a look, but for now, I'd like to know why you changed it in the first place)

I was somewhat surprised, but then I realized that we don't have external tests for exports, so no surprise you broke it with this change :( Once I know why you changed the logic, I'll take a look myself.

@Yannic
Copy link
Contributor Author

Yannic commented Mar 20, 2020

Argh, turns out I misunderstood exports to behave like java_library's exports, but it's only used to indicate that direct sources may public import it (iff it's also in deps). I can fix that.

For the second kind of failure you're seeing, I see two possible explanations:

  • The tests don't construct ProtoInfo correctly and blindly set proto_source_root to "" and only add the correct source root to transitive_proto_path. I had to fix that in ProtoCompileActionBuilderTest, so hopefully, that's what you're seeing as well.
  • There are cases inside Google where ProtoInfo.proto_source_root is not a prefix of (all) direct sources (e.g. because the files are generated). This is a precondition for this change and not fixable from my side.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

I have a test case for the second failure (add this to BazelProtoLibraryTest.java):

  @Test
  public void testSourceAndGeneratedProtoFiles() throws Exception {
    scratch.file("a/BUILD",
        "genrule(name='g', srcs=[], outs=['g.proto'], cmd = '')",
        "proto_library(name='p', srcs=['s.proto', 'g.proto'])");

    Iterable<String> commandLine = paramFileArgsForAction(getDescriptorWriteAction("//a:p"));
    String genfiles = getTargetConfiguration().getGenfilesFragment().toString();
    assertThat(commandLine).containsAtLeast(
        "-Ia/s.proto=a/s.proto",
        "-Ia/g.proto=" + genfiles + "/a/g.proto");
  }

This is because if you have both source and generated .proto files, you need more than one source root for a single proto_library.

The trick is to fix this in such a way that one doesn't need to create virtual proto source roots for every proto_library rule with generated sources (so that RAM use is under control, although if you prefer, I can run the benchmarks that way, too, and see what happens)

I think what would work is to have multiple ProtoCommon.Library instances for proto_library rules that have both source and generated .proto files. This would make .direct_source_root a lie, but not more of a lie than it currently is.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

Re, the first, it appears to be the case. The simplest test case appears to be to test whether ProtoInfo.getExportedProtos() returns the .proto files in the current rule or not.

Although you should probably not change any behavior in this change, I couldn't resist and tried to grok how the exports logic works.

It appears that the proto compile action has the strict importable protos of the rules in its exports in the --allowed_public_imports flag. And the strict importable protos in turn are computed as:

  • If the rule has sources, the sources of the rule and the strict importable protos for dependents of deps
  • If the rule has no sources, the strict importable protos of deps

And the strict importable protos for dependents are:

  • The sources if the rule has them
  • The strict importable protos for dependents of the deps if the rule has no sources

So it appears that sources of deps of exports are allowed public imports if we assume that every proto_library rule has sources? That doesn't make much sense... but that's what the methods computeExportedProtos, computeStrictImportableProtos and computeStrictImportableProtosImportPathsForDependents methods seem to do?

@Yannic
Copy link
Contributor Author

Yannic commented Mar 20, 2020

I have a test case for the second failure (add this to BazelProtoLibraryTest.java):

  @Test
  public void testSourceAndGeneratedProtoFiles() throws Exception {
    scratch.file("a/BUILD",
        "genrule(name='g', srcs=[], outs=['g.proto'], cmd = '')",
        "proto_library(name='p', srcs=['s.proto', 'g.proto'])");

    Iterable<String> commandLine = paramFileArgsForAction(getDescriptorWriteAction("//a:p"));
    String genfiles = getTargetConfiguration().getGenfilesFragment().toString();
    assertThat(commandLine).containsAtLeast(
        "-Ia/s.proto=a/s.proto",
        "-Ia/g.proto=" + genfiles + "/a/g.proto");
  }

This is because if you have both source and generated .proto files, you need more than one source root for a single proto_library.

The trick is to fix this in such a way that one doesn't need to create virtual proto source roots for every proto_library rule with generated sources (so that RAM use is under control, although if you prefer, I can run the benchmarks that way, too, and see what happens)

I think what would work is to have multiple ProtoCommon.Library instances for proto_library rules that have both source and generated .proto files. This would make .direct_source_root a lie, but not more of a lie than it currently is.

Is .direct_source_root really a lie, though. At least Bazel seems to already put all proto_library targets with generated sources into a virtual proto source root directory [1,2].
I think having multiple (direct) source roots for a single proto_library target will make things way more complicated, so yes, please run the benchmark.

[1] #9215
[2] https://source.bazel.build/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/proto/ProtoCommon.java;l=276?q=ProtoCommon

@Yannic
Copy link
Contributor Author

Yannic commented Mar 20, 2020

Re, the first, it appears to be the case. The simplest test case appears to be to test whether ProtoInfo.getExportedProtos() returns the .proto files in the current rule or not.

Although you should probably not change any behavior in this change, I couldn't resist and tried to grok how the exports logic works.

It appears that the proto compile action has the strict importable protos of the rules in its exports in the --allowed_public_imports flag. And the strict importable protos in turn are computed as:

  • If the rule has sources, the sources of the rule and the strict importable protos for dependents of deps
  • If the rule has no sources, the strict importable protos of deps

And the strict importable protos for dependents are:

  • The sources if the rule has them
  • The strict importable protos for dependents of the deps if the rule has no sources

So it appears that sources of deps of exports are allowed public imports if we assume that every proto_library rule has sources? That doesn't make much sense... but that's what the methods computeExportedProtos, computeStrictImportableProtos and `computeStrictImportableProtosImportPathsForDependents' methods seem to do?

Yes, that seems to be what those methods do.
I don't think my previous understanding of --direct_dependencies=srcs + exportedSources(deps) and --allowed_public_imports=srcs + exportedSources(exports) is necessarily a bad idea. It would make proto_library more consistent with other <lang>_library rules that allow exports. I agree that that change shouldn't be part of this PR, though.

I'll refrain from updating the PR for this until we've figured out the other issue.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

Yep, I agree on both counts :)

What's the "other issue"? The -I<something under bazel-out> I mentioned above?

I'll run the benchmark as soon as you update this change. What's the plan? As long as you don't create virtual proto source roots for every proto_library with generated sources, I'm fairly certain that RAM usage will be acceptable.

@lberki
Copy link
Contributor

lberki commented Mar 20, 2020

/cc @cheister
This will probably conflict with #10966 (fortunately, it's only a textual conflict, no complex refactoring will be needed)

@Yannic
Copy link
Contributor Author

Yannic commented Mar 20, 2020

Yes, the -I<something under bazel-out> is the "other issue".

Can you check what's the status-quo on creating virtual source roots for generated protos? Unless I'm missing something really obvious here, Bazel already creates virtual source roots for all targets that have at least one generated source. We're still putting the original source in the Pair<Artifact, ImportPath>, but that doesn't make a difference because the virtual file is still there. What am I missing?

@lberki
Copy link
Contributor

lberki commented Mar 23, 2020

What you are missing is the very sad existence of the generatedProtosInVirtualImports argument of ProtoCommon.createProtoInfo(). There are two reasons for that. One is memory use of the extra symlink actions, which I hope is okay. It probably is.

The second, more hairy one is that of implicit outputs: if you write proto_library(srcs=["foo.proto"]) in e.g. a/BUILD, Blaze will automatically have a target //a:foo.pb.cc and the root-relative path of that file must be a/foo.pb.cc and thus the .proto file cannot be in a virtual imports directory.

@c-mita is working on removing this (in fact, I was hoping he'd be done with it last week), and then we can hopefully make Blaze and Bazel consistent.

@lberki
Copy link
Contributor

lberki commented Mar 23, 2020

...and as to the fix, you can either wait until @c-mita is done, although that will take at least a week until the submits the change and it percolates through our release process, but more like two, or use the "two Library instances" approach.

WDYT?

@Yannic
Copy link
Contributor Author

Yannic commented Mar 23, 2020

I'll wait until @c-mita is finished and we can try to put all generated .proto files into a virtual source root.

In the meantime, could we move forward with the "artifact -> artifact symlink PR" (#10695)?

@lberki
Copy link
Contributor

lberki commented Apr 1, 2020

@c-mita is done with the removal of implicit outputs! (well, he submitted the change a week ago, but we had to wait for our internal release to make sure that it doesn't break anything). Your path is free!

@Yannic
Copy link
Contributor Author

Yannic commented Apr 14, 2020

Found another neat solution: Instead of using Pair<Artifact, nullable_import_path>, we can save Pair<Artifact, real_source_root> (ProtoSource for readability) and always derive the import path. Memory usage should be the same. ProtoInfo.proto_source_root is still the same lie as before, but for all ProtoSource: ProtoSource.source.exec_path == ProtoSource.source_root + ProtoSource.import_path.

@Yannic
Copy link
Contributor Author

Yannic commented May 10, 2020

@lberki ptal

@Yannic Yannic force-pushed the proto_deterministic_import_path branch from e0e9fec to c47521c Compare May 27, 2020 12:47
@Yannic
Copy link
Contributor Author

Yannic commented May 27, 2020

pinging this again

@philwo
Copy link
Member

philwo commented Oct 21, 2020

@lberki Can we merge this?

@Yannic
Copy link
Contributor Author

Yannic commented Oct 26, 2020

(rebased to fix conflicts)

@lberki
Copy link
Contributor

lberki commented Oct 26, 2020

Thanks for rebasing! I wanted to look at it today, but it's not a trivial change so I'll need to find a chunk of time in which to do it :( I do want to merge this, though, if at all possible.

@lberki
Copy link
Contributor

lberki commented Oct 29, 2020

Update: I started reviewing this change, had to resolve conflicts when importing it, then found a bug in javac (!) while fixing an internal test case. Let's see if I will have enough time and brain to continue the review today.

Copy link
Contributor

@lberki lberki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also ran some memory benchmarks and there is even a slight improvement!

I'll need to take a closer look, but if you don't mind, I'll wait until you shuffe ProtoInfo back so that it's easier to see what's going on.

protoInfo.getExportedProtoSourcesImportPaths();
if (protosInExports.isEmpty()) {
NestedSet<ProtoSource> publicImportSources = protoInfo.getPublicImportSources();
if (publicImportSources.isEmpty()) {
// This line is necessary to trigger the check.
result.add("--allowed_public_imports=");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case in our internal test battery fails. I can't export it at the moment, but the BUILD file it uses is as follows:

proto_library(
  name = 'myProto',
  srcs = ['myProto.proto'],
)

then verifies that its proto compile action has --allowed_public_imports=. This fails, and instead, it has --allowed_public_imports test/myProto.proto. I think I know why (see the other comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you're right about the culprit of the failure which would allow public imports of srcs while they aren't allowed with the current behavior.

I had to update BazelProtoLibraryTest#testExportedStrippedImportPrefixes though because it checked that the direct proto source root was part of the exported source roots (but we don't rely on a set of proto source roots anymore for computing public imports, so it's not a behavioral change). You may have to do the same for internal tests :(.

Copy link
Contributor Author

@Yannic Yannic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, took me some time to get this stuff back into memory. Tried my best to shuffle everything back. PTAL

I also ran some memory benchmarks and there is even a slight improvement!

I'm curious how much of an improvement this is. Any chance you can share the numbers?

protoInfo.getExportedProtoSourcesImportPaths();
if (protosInExports.isEmpty()) {
NestedSet<ProtoSource> publicImportSources = protoInfo.getPublicImportSources();
if (publicImportSources.isEmpty()) {
// This line is necessary to trigger the check.
result.add("--allowed_public_imports=");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you're right about the culprit of the failure which would allow public imports of srcs while they aren't allowed with the current behavior.

I had to update BazelProtoLibraryTest#testExportedStrippedImportPrefixes though because it checked that the direct proto source root was part of the exported source roots (but we don't rely on a set of proto source roots anymore for computing public imports, so it's not a behavioral change). You may have to do the same for internal tests :(.

@lberki
Copy link
Contributor

lberki commented Nov 9, 2020

It was a ~0.1% improvement, if my memory serves well, so ntohing very significant, but it was still a pleasant surprise.

I tried to import this, then SharedProtoLibraryTest failed because of some completely unnecessary complexity in our test setup.So I recommend moving those tests back to BazelProtoLibraryTest for the time being so that sorting out our test framework is not a dependency of submitting this change.

WDYT?

@Yannic
Copy link
Contributor Author

Yannic commented Nov 9, 2020

Sure, merged them into BazelProtoLibraryTest and added a TODO for you ;).

@lberki
Copy link
Contributor

lberki commented Nov 9, 2020

Wow, eight months already! I fixed some linter errors and I could nitpick a bit more, but instead let's submit this and see if it sticks. It should, but it's an intricate enough problem that I'm not sure.

@bazel-io bazel-io closed this in abc4984 Nov 10, 2020
@lberki
Copy link
Contributor

lberki commented Nov 10, 2020

Submitted. Let's see whether this stands the test of our internal source tree! (we'll know tomorrow evening with reasonable certainty in Europe if all goes well, if not, a day or two after that)

Thanks for your work here, I really appreciate this :)

@Yannic Yannic deleted the proto_deterministic_import_path branch November 12, 2020 12:26
Yannic added a commit to Yannic/bazel that referenced this pull request Dec 17, 2020
The information is already part of another field.

Cleanup after bazelbuild#12431 and bazelbuild#10939
Yannic added a commit to Yannic/bazel that referenced this pull request Jul 21, 2021
The information is already part of another field.

Cleanup after bazelbuild#12431 and bazelbuild#10939
bazel-io pushed a commit that referenced this pull request Jul 22, 2021
The information is already part of another field.

Cleanup after #12431 and #10939

Closes #12727.

PiperOrigin-RevId: 386185741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants