Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach the FilesystemValueChecker about remote outputs #7269

Closed
wants to merge 3 commits into from

Conversation

buchgr
Copy link
Contributor

@buchgr buchgr commented Jan 28, 2019

The FilesystemValueChecker is used by Bazel to detect modified
outputs before a command. This change teaches it about remote
outputs that don't exist in the output base. That is if
SkyFrame has metadata about a remote output file which does not
exist in the output base it will not invalidate the output. However,
if the file exists in the output base it will be taken as the source
of truth.

Progress towards #6862

for (Map.Entry<TreeFileArtifact, FileArtifactValue> e : childFileValues.entrySet()) {
isRemote = isRemote && e.getValue().isRemote();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like things get strange when some files within the tree are remote and others non-remote. can you add some comments about when that would happen and the expected semantics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that happen? A TreeArtifact can only ever be created by one piece by an action.

Can it maybe happen that only parts of it are materialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I ll add some comments (and checks maybe). I think it should be either remote or local and not a mix of both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@janakdr
Copy link
Contributor

janakdr commented Feb 5, 2019

Can you comment on the need for this? How often is the output tree modified inter-build? Inside google, it feels like we're moving towards a world where we implicitly assume the output tree is immutable.

Copy link
Contributor

@ericfelly ericfelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused code at this point, right? there's nothing in bazel codebase creating these artifact values with isRemote() == true.

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure to add integration tests internally.

@@ -417,6 +422,14 @@ private boolean actionValueIsDirtyWithDirectSystemCalls(ActionExecutionValue act
try {
ArtifactFileMetadata fileMetadata =
ActionMetadataHandler.fileMetadataFromArtifact(file, null, tsgm);
FileArtifactValue fileValue = actionValue.getArtifactValue(file);
boolean lastSeenRemotely = fileValue != null && fileValue.isRemote();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is fileValue null here? Is my understanding correct that this is for cases when we don't actually need to call stat() to get the metadata of the output of an action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. IIRC fileValue here is none null only for injected metadata.

for (Map.Entry<TreeFileArtifact, FileArtifactValue> e : childFileValues.entrySet()) {
isRemote = isRemote && e.getValue().isRemote();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that happen? A TreeArtifact can only ever be created by one piece by an action.

Can it maybe happen that only parts of it are materialized?

@@ -417,6 +422,14 @@ private boolean actionValueIsDirtyWithDirectSystemCalls(ActionExecutionValue act
try {
ArtifactFileMetadata fileMetadata =
ActionMetadataHandler.fileMetadataFromArtifact(file, null, tsgm);
FileArtifactValue fileValue = actionValue.getArtifactValue(file);
boolean lastSeenRemotely = fileValue != null && fileValue.isRemote();
if (!fileMetadata.exists() && lastSeenRemotely) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expected semantics when:

  1. A file is created remotely
  2. The file is later materialized
  3. The file is then deleted by rm

Is Bazel expected to notice the deletion in that case?

Copy link
Contributor Author

@buchgr buchgr Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Bazel expected to notice the deletion in that case?

It wouldn't notice it as an ActionExecutionValue is immutable and we can't and shouldn't modify it after action execution. It's a valid point you are raising though and I think there are two ways forwards:

  1. Output files that are materialized after action execution will remain in the output base and Bazel will think that they are actually stored remotely and won't notice the scenario you are describing.
  2. The remote module keeps track of all outputs it's materializing after action execution and deletes them again at the end of a build. That way Bazel's in memory state would match what's in the output base after the build.

@jin jin added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Feb 19, 2019
@buchgr
Copy link
Contributor Author

buchgr commented Feb 25, 2019

@janakdr

Apologies for the delay. I was out of office.

Can you comment on the need for this?

The need for this arrives from not download all build outputs for remote builds but instead injecting metadata about build outputs. In Bazel remote execution we'd like to have it so that some build outputs are downloaded locally (i.e. top level) and others are not and only live remotely.

How often is the output tree modified inter-build?

It's my understanding that this is not a supported scenario by Bazel aka undefined behavior. Why are you asking?

@buchgr
Copy link
Contributor Author

buchgr commented Feb 25, 2019

this is unused code at this point, right? there's nothing in bazel codebase creating these artifact values with isRemote() == true.

yep. I broke this change out from a larger patch set. I ll be injecting RemoteArtifactValues in the remote module in a follow up change.

@benjaminp
Copy link
Collaborator

How often is the output tree modified inter-build?

It's my understanding that this is not a supported scenario by Bazel aka undefined behavior. Why are you asking?

Isn't this code you're modifying in FilesystemValueChecker there explicitly to support that? It stats the outputs of every action in Skyframe before every build to see if the in-memory state is out of sync with reality. This behavior is controlled by --experimental_check_output_files. It's also not fool-proof. For example, runfiles trees aren't tracked.

@buchgr
Copy link
Contributor Author

buchgr commented Feb 28, 2019

@benjaminp you are right. I shall learn to read better. In my head I mapped inter-build to mean during the build.

@janakdr I wouldn't be able to quantify inter-build modifications to the output-tree. Ideally these don't happen, but as Benjamin correctly wrote it's hard a best effort mechanism that isn't fool proof.

@buchgr buchgr force-pushed the teach-remote-outputs branch from a714e8d to 481db5e Compare March 1, 2019 10:48
buchgr added 3 commits March 6, 2019 13:12
The FilesystemValueChecker is used by Bazel to detect modified
outputs before a command. This change teaches it about remote
outputs that don't exist in the output base. That is if
SkyFrame has metadata about a remote output file which does not
exist in the output base it will not invalidate the output. However,
if the file exists in the output base it will be taken as the source
of truth.

Progress towards bazelbuild#6862
@buchgr buchgr force-pushed the teach-remote-outputs branch from 481db5e to a38b6c2 Compare March 6, 2019 12:12
@bazel-io bazel-io closed this in 0a0c962 Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Core Skyframe, bazel query, BEP, options parsing, bazelrc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants