Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External repository fetching failure should be reported in BEP and as a root cause #6670

Closed
ittaiz opened this issue Nov 13, 2018 · 12 comments
Closed
Assignees
Labels
category: BEP P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request

Comments

@ittaiz
Copy link
Member

ittaiz commented Nov 13, 2018

Description of the problem / feature request:

When a repository rule fetching fails I'd like the BEP to report it (with the relevant log part) and specify it as a root cause for other targets that depend on it.

Feature requests: what underlying problem are you trying to solve with this feature?

We use ResultStore (a google based service which is bazel target based) and when the build fails because of a repository rule fetching it's unclear what happened. We need to go and grep through the log. The root cause will help see this from dependencies.

cc @aehlig which I discussed this with

@aehlig
Copy link
Contributor

aehlig commented Nov 13, 2018

cc @lfpino @ulfjack who also work on the BEP.

@irengrig irengrig added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged labels Nov 13, 2018
@lfpino lfpino self-assigned this Nov 14, 2018
@lfpino
Copy link
Contributor

lfpino commented Nov 14, 2018

Hi @ittaiz, it'd be great if you could post an example repro here and the expected output in the BEP for future reference.

@dslomov dslomov added type: feature request P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Dec 3, 2018
@ittaiz
Copy link
Member Author

ittaiz commented Jan 16, 2019

@aehlig can you please post the expected output like we discuss?

@aehlig
Copy link
Contributor

aehlig commented Jan 22, 2019

@aehlig can you please post the expected output like we discuss?

Well, we expect fetching failures to follow the same pattern as other failures. That is, if fetching external repository foo fails, we expect an event reporting the failure. This should have an id that identifies the failed target, i.e., an UnconfiguredLabelId with label //external:foo and a payload Aborted (probably reason being LOADING_FAILURE, and description somewhat human readable). All user requested (configured) targets, we have a TargetComplete event with success = false and as child event the root cause, i.e., the mentioned event indicating the fetch failure.

In other words, the expectation is that failure to fetch a repository is just like any other root cause that causes a requested target to fail.

@ittaiz
Copy link
Member Author

ittaiz commented Jan 29, 2019

Thanks! @lfpino any update?

@lfpino
Copy link
Contributor

lfpino commented Jan 29, 2019

Hi Ittai, no updates so far and unfortunately. I don't have much bandwidth this quarter so I don't expect any changes in this area at least until early Q2.

@ulfjack
Copy link
Contributor

ulfjack commented Jan 29, 2019

We're focusing on stability (not crashing) and performance at this time, rather than new features. We want to migrate Google to use it so that we can delete a bunch of legacy code (that isn't open source).

We'll need to think carefully about how to integrate repository setup into the protocol - there are still some discussions ongoing about changing the current model.

@lfpino lfpino added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc category: BEP and removed team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Feb 7, 2019
@lfpino lfpino removed their assignment May 9, 2019
@ericfelly
Copy link
Contributor

Klaus, how should we move forward here? Is there a blaze event that BEP should be listening to to formulate the right BEP data?

@aehlig
Copy link
Contributor

aehlig commented May 29, 2019

Klaus, how should we move forward here?

I'm not sure what @ulfjack meant with with not focusing on new features, as the expectation is within the realm of the current protocol.

Is there a blaze event that BEP should be listening to to formulate the right BEP data?

There isn't one; but the hard thing is not to add an event, whenever a fetch fails; that could be done write where the error message is generated. However, the hard part is to get the root-cause association right. I'm still busy trying to find a good way to get the information recorded correctly in the whole chain of catch and rethrow as different exception within the bazel code.

@ulfjack
Copy link
Contributor

ulfjack commented May 29, 2019

@aehlig, at the time I wrote it, we were focusing on rolling out BEP to all Googlers and getting performance and stability up to par. We're basically there, so now is a good time to discuss this feature request.

@jin
Copy link
Member

jin commented Jun 7, 2019

9b0e64a, for some reason, suppresses the error message when the Android SDK is not configured correctly.

Previously, users will see this if android_sdk_repository is not configured with a path or the ANDROID_HOME environment variable isn't set:

ERROR: /usr/local/google/home/jingwen/code/rules_jvm_external/examples/android_instrumentation_test/WORKSPACE:3:1: no such package '@androidsdk//': Either the path attribute of android_sdk_repository or the ANDROID_HOME environment variable must be set. and referenced by '//external:android/sdk'

After 9b0e64a, that error message is suppressed and replaced with

ERROR: /usr/local/google/home/jingwen/code/rules_jvm_external/examples/android_instrumentation_test/WORKSPACE:3:1: //external:android/sdk depends on @androidsdk//:sdk in repository @androidsdk which failed to fetch

The new error message is a lot less descriptive than the original one, with no actionables.

The peculiar thing is that we do have tests in android_sdk_integration_test testing for this specific error message:

https://github.com/bazelbuild/bazel/blob/74bff3ae8f37c6cb081bbb79855835d93ae24624/src/test/shell/bazel/android/android_sdk_integration_test.sh#L51:L61

It's not clear to me yet why we didn't catch this error. The error message I get as an interactive user is the new error message ("..@AndroidSDK which failed to fetch"), but the test log still receives the old message ("..the ANDROID_HOME environment variable must be set").

By applying the following patch, we get both error messages back:

diff --git a/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeDependencyResolver.java b/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeDependencyResolver.java
index e67fbb5811..b9b01b71fa 100644
--- a/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeDependencyResolver.java
+++ b/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeDependencyResolver.java
@@ -147,9 +147,9 @@ public final class SkyframeDependencyResolver extends DependencyResolver {
                           fromTarget.getLabel(),
                           label,
                           label.getPackageIdentifier().getRepository())));
-          continue;
+        } else {
+          rootCauses.add(new LoadingFailedCause(label, e.getMessage()));
         }
-        rootCauses.add(new LoadingFailedCause(label, e.getMessage()));
         missingEdgeHook(fromTarget, entry.getKey(), label, e);
         continue;
       }
$ gbazel test //... 
ERROR: /usr/local/google/home/jingwen/code/rules_jvm_external/examples/android_instrumentation_test/WORKSPACE:3:1: //external:android/sdk depends on @androidsdk//:sdk in repository @androidsdk which failed to fetch
ERROR: /usr/local/google/home/jingwen/code/rules_jvm_external/examples/android_instrumentation_test/WORKSPACE:3:1: no such package '@androidsdk//': Either the path attribute of android_sdk_repository or the ANDROID_HOME environment variable must be set. and referenced by '//external:android/sdk'

@aehlig was omitting the missingEdgeHook during a RepositoryFetchException intentional?

cc @ahumesky

@aehlig
Copy link
Contributor

aehlig commented Jun 14, 2019

@aehlig was omitting the missingEdgeHook during a RepositoryFetchException intentional?

It was, because I believed that the new error message contained all the information needed. What I did miss, however, was adding e.getMessage() to the message string as well.

What about fixing it that way by extending this message by e.getMessage()?

bazel-io pushed a commit that referenced this issue Jun 17, 2019
…suppress contextual console output

See #6670 (comment)

Closes #8584.

Change-Id: I5d69d73ac41d71c4ee75351aa52a7c5e3bccd6b4
PiperOrigin-RevId: 253645460
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
We already have an internal event reporting the failure of a
fetch of an external repository. Make this a BuildEvent so
that it also gets reported in the Build Event Protocol; also,
refer to this event as root cause in case of fetch failures
rather than to any needed file of the external repository
individually.

Fixes bazelbuild#6670.

Change-Id: I4f31940fe0c5c709673a99e5657274e5db922cff
PiperOrigin-RevId: 251846972
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
…suppress contextual console output

See bazelbuild#6670 (comment)

Closes bazelbuild#8584.

Change-Id: I5d69d73ac41d71c4ee75351aa52a7c5e3bccd6b4
PiperOrigin-RevId: 253645460
siberex pushed a commit to siberex/bazel that referenced this issue Jul 4, 2019
…suppress contextual console output

See bazelbuild#6670 (comment)

Closes bazelbuild#8584.

Change-Id: I5d69d73ac41d71c4ee75351aa52a7c5e3bccd6b4
PiperOrigin-RevId: 253645460
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
We already have an internal event reporting the failure of a
fetch of an external repository. Make this a BuildEvent so
that it also gets reported in the Build Event Protocol; also,
refer to this event as root cause in case of fetch failures
rather than to any needed file of the external repository
individually.

Fixes bazelbuild#6670.

Change-Id: I4f31940fe0c5c709673a99e5657274e5db922cff
PiperOrigin-RevId: 251846972
irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
…suppress contextual console output

See bazelbuild#6670 (comment)

Closes bazelbuild#8584.

Change-Id: I5d69d73ac41d71c4ee75351aa52a7c5e3bccd6b4
PiperOrigin-RevId: 253645460
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this issue Sep 4, 2022
… suppress contextual console output

    See bazelbuild/bazel#6670 (comment)

    Closes #8584.

    Change-Id: I5d69d73ac41d71c4ee75351aa52a7c5e3bccd6b4
    PiperOrigin-RevId: 253645460
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: BEP P2 We'll consider working on this in future. (Assignee optional) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: feature request
Projects
None yet
Development

No branches or pull requests

8 participants