From 636fb5e2dd6ca26bf86b959c4eadd4c60aa608bb Mon Sep 17 00:00:00 2001 From: nharmata Date: Fri, 24 Jul 2020 15:30:08 -0700 Subject: [PATCH] Fix massive performance problem in BzlLoadFunction's inlining implementation. The old code tried to avoid inlining the same bzl file more than once, but was able to do so profitably only when there were no missing Skyframe deps. Instead, when there were missing Skyframe deps we would crawl each unique path in the full load graph, and this was exacerbated by PackageFunction and BzlLoadFunction trying to do as much work as possible on missing Skyframe deps! To fix this performance bug, we partition the set of encountered bzls into 3 disjoint sets: - "loadStack": Bzls we're in the process of loading but haven't fully visited - "successfulLoads": Bzls we've fully visited and successfully loaded to a value - "unsuccessfulLoads": Bzls we've fully visited but did not successfully load to a value, due to either a Skyframe error or a missing Skyframe dep If we are about to visit a bzl that is already in "unsuccessfulLoads" we skip over it. We also have to slightly change PackageFunction's and BzlLoadFunction's aforementioned logic for handling missing Skyframe deps. Missing Skyframe deps are relatively the biggest problem in the "cold Skyframe" situation, so that situation is the worst-case and therefore the most interesting benchmark: A internal BUILD file with a modest 567 bzl files in its load graph and a "cold Skyframe" package load time of ~3.38s with inlining disabled took ~56.66s with inlining enabled before this change (~1575% worse!). After this change, it took ~7.99s (~136% worse). This is still really bad. The remaining performance problem is due to BzlLoadFunction inlining *entailing* lots of Skyframe restarts for PackageFunction. Also make a few minor comment improvements while I am here. RELNOTES: None PiperOrigin-RevId: 323084365 --- .../build/lib/skyframe/BzlLoadFunction.java | 150 ++++++++++++------ .../build/lib/skyframe/PackageFunction.java | 19 ++- .../lib/skylark/StarlarkIntegrationTest.java | 47 ++++++ 3 files changed, 160 insertions(+), 56 deletions(-) diff --git a/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java b/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java index c063b59e81c8eb..7a1474be6e406d 100644 --- a/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java +++ b/src/main/java/com/google/devtools/build/lib/skyframe/BzlLoadFunction.java @@ -64,6 +64,7 @@ import com.google.devtools.build.skyframe.SkyValue; import com.google.devtools.build.skyframe.ValueOrException; import java.util.HashMap; +import java.util.HashSet; import java.util.LinkedHashSet; import java.util.List; import java.util.Map; @@ -213,7 +214,7 @@ public SkyValue compute(SkyKey skyKey, Environment env) * trying to resolve its top-level {@code load()} statements. Although this work proceeds in a * single thread, multiple calling contexts may evaluate .bzls in parallel. To avoid redundant * work, they share a single (global to this Skyfunction instance) cache in lieu of the regular - * Skyframe cache. + * Skyframe cache. Unlike the regular Skyframe cache, this cache stores only successes. * *

If two calling contexts race to compute the same .bzl, each one will see a different copy of * it, and only one will end up in the shared cache. This presents a hazard: Suppose A and B both @@ -224,24 +225,35 @@ public SkyValue compute(SkyKey skyKey, Environment env) * weren't concerned about racing, A may also reevaluate previously computed items due to cache * evictions. * - *

To solve this, we keep a second cache, {@link InliningState#visitedBzls}, that is local to - * the current calling context, and which never evicts entries. This cache is always checked in - * preference to the shared one; it may deviate from the shared one in some of its entries, but - * the calling context won't know the difference. (If bzl inlining is only used for the loading - * phase, we don't need to worry about Starlark values from different packages interacting.) The - * cache is stored as part of the {@code inliningState} passed in by the caller; the caller can - * obtain this object using {@link InliningState#create}. + *

To solve this, we keep a second cache, {@link InliningState#successfulLoads}, that is local + * to the current calling context, and which never evicts entries. Like the global cache discussed + * above, this cache stores only successes. This cache is always checked in preference to the + * shared one; it may deviate from the shared one in some of its entries, but the calling context + * won't know the difference. (Since bzl inlining is only used for the loading phase, we don't + * need to worry about Starlark values from different packages interacting.) The cache is stored + * as part of the {@code inliningState} passed in by the caller; the caller can obtain this object + * using {@link InliningState#create}. * - *

As an aside, note that we can't avoid having a second cache by simply naively blocking - * evaluation of .bzls on retrievals from the shared cache. This is because two contexts could - * deadlock while trying to evaluate an illegal {@code load()} cycle from opposite ends. It would - * be possible to construct a waits-for graph and perform cycle detection, or to monitor slow - * threads and do detection lazily, but these do not address the cache eviction issue. - * Alternatively, we could make Starlark tolerant of reloading, but that would be tantamount to - * implementing full Starlark serialization. + *

As an aside, note that we can't avoid having {@link InliningState#successfulLoads} by simply + * naively blocking evaluation of .bzls on retrievals from the shared cache. This is because two + * contexts could deadlock while trying to evaluate an illegal {@code load()} cycle from opposite + * ends. It would be possible to construct a waits-for graph and perform cycle detection, or to + * monitor slow threads and do detection lazily, but these do not address the cache eviction + * issue. Alternatively, we could make Starlark tolerant of reloading, but that would be + * tantamount to implementing full Starlark serialization. * - * @return the requested {@code BzlLoadValue}, or null if there is a Skyframe restart or error + *

Since our local {@link InliningState#successfulLoads} stores only successes, a separate + * concern is that we don't want to unsuccessfully visit the same .bzl more than once in the same + * context. (A visitation is unsuccessful if it fails due to an error or if it cannot complete + * because of a missing Skyframe dep.) To address this concern we maintain a separate {@link + * InliningState#unsuccessfulLoads} set, and use this set to return null instead of duplicating an + * unsuccessful visitation. + * + * @return the requested {@code BzlLoadValue}, or null if there was a missing Skyframe dep, an + * unspecified exception in a Skyframe dep request, or if this was a duplicate unsuccessful + * visitation */ + // TODO(brandjon): Pick one of the nouns "load" and "bzl" and use that term consistently. @Nullable BzlLoadValue computeInline(BzlLoadValue.Key key, Environment env, InliningState inliningState) throws InconsistentFilesystemException, BzlLoadFailedException, InterruptedException { @@ -253,15 +265,17 @@ BzlLoadValue computeInline(BzlLoadValue.Key key, Environment env, InliningState } /** - * Retrieves or creates the requested {@link CachedBzlLoadData} object, entering it into the - * local and shared caches. This is the entry point for recursive calls to the inline code path. + * Retrieves or creates the requested {@link CachedBzlLoadData} object for the given bzl, entering + * it into the local and shared caches. This is the entry point for recursive calls to the inline + * code path. * *

Skyframe calls made underneath this function will be logged in the resulting {@code * CachedBzlLoadData) (or its transitive dependencies). The given Skyframe environment must not * be a {@link RecordingSkyFunctionEnvironment}, since that would imply that calls are being * logged in both the returned value and the parent value. * - *

Returns null on Skyframe restart or error. + * @return null if there was a missing Skyframe dep, an unspecified exception in a Skyframe dep + * request, or if this was a duplicate unsuccessful visitation */ @Nullable private CachedBzlLoadData computeInlineCachedData( @@ -270,28 +284,42 @@ private CachedBzlLoadData computeInlineCachedData( // Note to refactorors: No Skyframe calls may be made before the RecordingSkyFunctionEnvironment // is set up below in computeInlineForCacheMiss. - // Try the caches. We must try the thread-local cache before the shared one. - CachedBzlLoadData cachedData = inliningState.visitedBzls.get(key); + // Try the caches of successful loads. We must try the thread-local cache before the shared, for + // consistency purposes (see the javadoc of #computeInline). + CachedBzlLoadData cachedData = inliningState.successfulLoads.get(key); if (cachedData == null) { cachedData = cachedBzlLoadDataManager.cache.getIfPresent(key); if (cachedData != null) { // Found a cache hit from another thread's computation; register the recorded deps as if our // thread required them for the current key. Incorporate into visitedBzls any transitive // cache hits it does not already contain. - cachedData.traverse(env::registerDependencies, inliningState.visitedBzls); + cachedData.traverse(env::registerDependencies, inliningState.successfulLoads); } } - // If that didn't work, compute it ourselves and add to the caches on success. + // See if we've already unsuccessfully visited the bzl. + if (inliningState.unsuccessfulLoads.contains(key)) { + return null; + } + + // If we're here, the bzl must have never been visited before in this calling context. Compute + // it ourselves, updating the other data structures as appropriate. if (cachedData == null) { - cachedData = computeInlineForCacheMiss(key, env, inliningState); - if (cachedData != null) { - inliningState.visitedBzls.put(key, cachedData); - cachedBzlLoadDataManager.cache.put(key, cachedData); + try { + cachedData = computeInlineForCacheMiss(key, env, inliningState); + } finally { + if (cachedData != null) { + inliningState.successfulLoads.put(key, cachedData); + cachedBzlLoadDataManager.cache.put(key, cachedData); + } else { + inliningState.unsuccessfulLoads.add(key); + // Either propagate an exception or fall through for null return. + } } } - // On success, notify the parent CachedBzlLoadData of its new child. + // On success (from cache hit or from scratch), notify the parent CachedBzlLoadData of its new + // child. if (cachedData != null) { inliningState.childCachedDataHandler.accept(cachedData); } @@ -429,28 +457,50 @@ private static ContainingPackageLookupValue getContainingPackageLookupValue( */ static class InliningState { /** - * The set of .bzls we're currently in the process of loading. This is used for cycle detection - * since we don't have the benefit of Skyframe's internal cycle detection. The set must use - * insertion order for correct error reporting. + * The set of bzls we're currently in the process of loading but haven't fully visited yet. This + * is used for cycle detection since we don't have the benefit of Skyframe's internal cycle + * detection. The set must use insertion order for correct error reporting. + * + *

This is disjoint with {@link #successfulLoads} and {@link #unsuccessfulLoads}. + * + *

This is local to current calling context. See {@link #computeInline}. */ // Keyed on the SkyKey, not the label, since label could theoretically be ambiguous, even though // in practice keys from BUILD / WORKSPACE / @builtins don't call each other. (Not sure if // WORKSPACE chunking can cause duplicate labels to appear, but we're robust regardless.) private final LinkedHashSet loadStack; + /** + * Cache of bzls that have been fully visited and successfully loaded to a value. + * + *

This and {@link #unsuccessfulLoads} partition the set of fully visited bzls. + * + *

This is local to current calling context. See {@link #computeInline}. + */ + private final Map successfulLoads; + + /** + * Set of bzls that have been fully visited, but were not successfully loaded to a value. + * + *

This and {@link #successfulLoads} partition the set of fully visited bzls, and is disjoint + * with {@link #loadStack}. + * + *

This is local to current calling context. See {@link #computeInline}. + */ + private final HashSet unsuccessfulLoads; + /** Called when a transitive {@code CachedBzlLoadData} is produced. */ private final Consumer childCachedDataHandler; - /** Cache local to current calling context. See {@link #computeInline}. */ - private final Map visitedBzls; - private InliningState( LinkedHashSet loadStack, - Consumer childCachedDataHandler, - Map visitedBzls) { + Map successfulLoads, + HashSet unsuccessfulLoads, + Consumer childCachedDataHandler) { this.loadStack = loadStack; + this.successfulLoads = successfulLoads; + this.unsuccessfulLoads = unsuccessfulLoads; this.childCachedDataHandler = childCachedDataHandler; - this.visitedBzls = visitedBzls; } /** @@ -460,12 +510,15 @@ private InliningState( static InliningState create() { return new InliningState( /*loadStack=*/ new LinkedHashSet<>(), - /*childCachedDataHandler=*/ x -> {}, // No parent value to mutate. - /*visitedBzls=*/ new HashMap<>()); + /*successfulLoads=*/ new HashMap<>(), + /*unsuccessfulLoads=*/ new HashSet<>(), + // No parent value to mutate + /*childCachedDataHandler=*/ x -> {}); } private InliningState createChildState(Consumer childCachedDataHandler) { - return new InliningState(loadStack, childCachedDataHandler, visitedBzls); + return new InliningState( + loadStack, successfulLoads, unsuccessfulLoads, childCachedDataHandler); } /** Records entry to a {@code load()}, throwing an exception if a cycle is detected. */ @@ -760,8 +813,10 @@ private static List computeBzlLoadsWithSkyframe( /** * Computes the BzlLoadValue for all given keys by reusing this instance of the BzlLoadFunction, - * bypassing traditional Skyframe evaluation, returning {@code null} if Skyframe deps were missing - * and have been requested. + * bypassing traditional Skyframe evaluation. + * + * @return null if there was a missing Skyframe dep, an unspecified exception in a Skyframe dep + * request, or if this was a duplicate unsuccessful visitation */ @Nullable private List computeBzlLoadsWithInlining( @@ -799,10 +854,13 @@ private List computeBzlLoadsWithInlining( continue; } if (cachedData == null) { - Preconditions.checkState(env.valuesMissing(), "no starlark load value for %s", key); - // We continue making inline calls even if some requested values are missing, to maximize - // the number of dependent (non-inlined) SkyFunctions that are requested, thus avoiding a - // quadratic number of restarts. + // A null value for `cachedData` can occur when it (or its transitive loads) has a Skyframe + // dep that is missing or in error. It can also occur if there's a transitive load on a bzl + // that was already seen by inliningState and which returned null (note: in this case, it's + // not necessarily true that there are missing Skyframe deps because this bzl could have + // already been visited unsuccessfully). In both these cases, we want to continue making our + // inline calls, so as to maximize the number of dependent (non-inlined) SkyFunctions that + // are requested and avoid a quadratic number of restarts. valuesMissing = true; } else { bzlLoads.add(cachedData.getValue()); diff --git a/src/main/java/com/google/devtools/build/lib/skyframe/PackageFunction.java b/src/main/java/com/google/devtools/build/lib/skyframe/PackageFunction.java index 6b7bbada206fbe..39d03f904c0b30 100644 --- a/src/main/java/com/google/devtools/build/lib/skyframe/PackageFunction.java +++ b/src/main/java/com/google/devtools/build/lib/skyframe/PackageFunction.java @@ -725,7 +725,6 @@ private static List computeBzlLoadsWithInlining( throws InterruptedException, BzlLoadFailedException, InconsistentFilesystemException { List bzlLoads = Lists.newArrayListWithExpectedSize(keys.size()); Exception deferredException = null; - boolean valuesMissing = false; // Compute BzlLoadValue for each key, sharing the same inlining state, i.e. cache of loaded // modules. This ensures that each .bzl is loaded only once, regardless of diamond dependencies // or cache eviction. (Multiple loads of the same .bzl would screw up identity equality of some @@ -734,7 +733,8 @@ private static List computeBzlLoadsWithInlining( for (BzlLoadValue.Key key : keys) { SkyValue skyValue; try { - // Will complete right away if it's already cached in inliningState. + // Will complete right away if this key has been seen before in inliningState -- regardless + // of whether it was evaluated successfully, had missing deps, or was found to be in error. skyValue = bzlLoadFunctionForInlining.computeInline(key, env, inliningState); } catch (BzlLoadFailedException | InconsistentFilesystemException e) { // For determinism's sake while inlining, preserve the first exception and continue to run @@ -742,15 +742,14 @@ private static List computeBzlLoadsWithInlining( deferredException = MoreObjects.firstNonNull(deferredException, e); continue; } - if (skyValue == null) { - checkState(env.valuesMissing(), "no starlark load value for %s", key); - // We continue making inline calls even if some requested values are missing, to - // maximize the number of dependent (non-inlined) SkyFunctions that are requested, thus - // avoiding a quadratic number of restarts. - valuesMissing = true; - } else { + if (skyValue != null) { bzlLoads.add((BzlLoadValue) skyValue); } + // A null value for `skyValue` can occur when it (or its transitive loads) has a Skyframe dep + // that is missing or in error. It can also occur if there's a transitive load on a bzl that + // was already seen by inliningState and which returned null. In both these cases, we want to + // continue making our inline calls, so as to maximize the number of dependent (non-inlined) + // SkyFunctions that are requested and avoid a quadratic number of restarts. } if (deferredException != null) { Throwables.throwIfInstanceOf(deferredException, BzlLoadFailedException.class); @@ -758,7 +757,7 @@ private static List computeBzlLoadsWithInlining( throw new IllegalStateException( "caught a checked exception of unexpected type", deferredException); } - return valuesMissing ? null : bzlLoads; + return env.valuesMissing() ? null : bzlLoads; } private static int getOriginalWorkspaceChunk( diff --git a/src/test/java/com/google/devtools/build/lib/skylark/StarlarkIntegrationTest.java b/src/test/java/com/google/devtools/build/lib/skylark/StarlarkIntegrationTest.java index f2063988d1c53a..eeda43080a28a0 100644 --- a/src/test/java/com/google/devtools/build/lib/skylark/StarlarkIntegrationTest.java +++ b/src/test/java/com/google/devtools/build/lib/skylark/StarlarkIntegrationTest.java @@ -3435,4 +3435,51 @@ private void setUpCustomMallocRule() throws IOException { "", "malloc_rule(name = 'malloc')"); } + + // Test for an interesting situation for the inlining implementation's attempt to process + // subsequent load statements even when an earlier one has a missing Skyframe dep. + @Test + public void bzlFileWithErrorsLoadedThroughMultipleLoadPathsWithTheLatterOneHavingMissingDeps() + throws Exception { + scratch.file("test/starlark/error.bzl", "nope"); + scratch.file("test/starlark/ok.bzl", "ok = 42"); + scratch.file( + "test/starlark/loads-error-and-has-missing-deps.bzl", + "load('//test/starlark:error.bzl', 'doesntmatter')", + "load('//test/starlark:ok.bzl', 'ok')"); + scratch.file( + "test/starlark/BUILD", + "load('//test/starlark:error.bzl', 'doesntmatter')", + "load('//test/starlark:loads-error-and-has-missing-deps.bzl', 'doesntmatter')"); + + reporter.removeHandler(failFastHandler); + BuildFileContainsErrorsException e = + assertThrows( + BuildFileContainsErrorsException.class, () -> getTarget("//test/starlark:BUILD")); + assertThat(e).hasMessageThat().contains("Extension 'test/starlark/error.bzl' has errors"); + } + + // Test for an interesting situation for the inlining implementation's attempt to process + // subsequent load statements even when an earlier one has a missing Skyframe dep. + @Test + public void bzlFileWithErrorsLoadedThroughMultipleLoadPathsWithTheLatterOneNotHavingMissingDeps() + throws Exception { + scratch.file("test/starlark/error.bzl", "nope"); + scratch.file("test/starlark/ok.bzl", "ok = 42"); + scratch.file( + "test/starlark/loads-error-and-has-missing-deps.bzl", + "load('//test/starlark:error.bzl', 'doesntmatter')", + "load('//test/starlark:ok.bzl', 'ok')"); + scratch.file( + "test/starlark/BUILD", + "load('//test/starlark:ok.bzl', 'ok')", + "load('//test/starlark:error.bzl', 'doesntmatter')", + "load('//test/starlark:loads-error-and-has-missing-deps.bzl', 'doesntmatter')"); + + reporter.removeHandler(failFastHandler); + BuildFileContainsErrorsException e = + assertThrows( + BuildFileContainsErrorsException.class, () -> getTarget("//test/starlark:BUILD")); + assertThat(e).hasMessageThat().contains("Extension 'test/starlark/error.bzl' has errors"); + } }