Understanding overhead showing in Chrome trace #10702

kastiglione · 2020-02-03T22:53:51Z

Description of the problem request:

We have a long final action at the end of our build that we'd like to speed up. When looking at a Chrome trace of our build, that final action has much more Bazel overhead than other actions in the build.

My reading of this is that the action (BundleTreeApp a python script in rules_apple) is ~6s, and then after that Bazel does some post-processing in ActionContinuation.execute and then yet more post-processing in the top level invocation (skyframe?).

In the above screenshot, ActionContinuation.execute appears to have 2s of post-processing "overhead", and "Bundling, processing and signing" has another 2s of post-processing "overhead".

Feature requests: what underlying problem are you trying to solve with this feature?

When looking at a Chrome trace: How can we understand how and why Bazel overhead exists, and what can be done about it.

We see overhead at the beginning of a build as well, but this issue is focused on how to understand performance factors for poorly performing actions like this.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

You'd need a sufficiently large iOS project that has a non-trivial .app bundle.

What operating system are you running Bazel on?

macOS

What's the output of `bazel info release`?

release 2.0.0

The text was updated successfully, but these errors were encountered:

ghvg1313 · 2020-02-05T22:09:33Z

Don't know if this is related, but we are also experiencing similar overhead when calling build and test action separately (bazel build <target> then bazel test <target>) with tree-artifacts turned on.

Traces:
test trace with tree-artifacts turned off
test trace with tree-artifacts turned on

The overhead is not too bad considering this is only one target, but you can clearly see multiple action dependency checking distributed across threads before test runner executes, which, you don't see when tree artifacts turned off.
The performance of dependency checking gets significantly worse when there're multiple targets under test

brentleyjones · 2020-05-06T15:56:57Z

Looking into this, it seems the post processing is Bazel collecting the contents of the declare_directory?

BalestraPatrick · 2021-12-16T21:24:53Z

This is still a valid issue. Our iOS app bundle with 16K files takes over 75s to complete this action, even though the rules_apple BundleTreeApp action was optimized to only take 10s, we have no easy way to optimize the rest of time that Bazel spends in collecting the contents of all directories and files.

chiragramani · 2022-10-01T00:52:13Z

Yeah, we are seeing this issue as well.

Linking took ~8.5 seconds but actuallyCompleteAction took 6.4 seconds. (this makes the overall linking action take ~15 seconds)
Symlink creation takes significant time. (~12.5 seconds)

Any optimizations here would give us really good wins, especially for the incremental builds.

meisterT · 2022-10-06T13:13:24Z

@chiragramani do you have an example repository that we can try this with?

brentleyjones · 2022-10-06T14:45:41Z

git clone https://github.com/buildbuddy-io/rules_xcodeproj.git
cd rules_xcodeproj
git checkout bj/overhead-demo
cd examples/integration

bazel build //iOSApp

You can then look at the Linking iOSApp/Source/iOSApp_bin, Creating symlink bazel-out/applebin_ios-ios_sim_arm64-fastbuild-ST-4e6c2a19403f/bin/iOSApp/Source/iOSApp_lipobin, and Creating symlink bazel-out/applebin_ios-ios_sim_arm64-fastbuild-ST-4e6c2a19403f/bin/iOSApp/Source/iOSApp-intermediates/iOSApp_ExecutableName actions:

The blank space at the end of all of them scale with the size of the linked binary (so this example will be milliseconds, 73 in my case on each action, but on real projects it grows to multiple seconds, and is unnecessary overhead).

wilwell · 2022-10-20T12:13:52Z

Thanks @brentleyjones for reproduction code.
I was able to reproduce your example only on M1 machine.
I investigated Linking iOSApp... action. On that case the difference between ActionContinuation.execute and subprocess.run is sandbox.delete and sandbox.CopyOutputs, which is reasonable.

But actuallyCompleteAction following ActionContinuation.execute is more interesting. I'm not sure what exactly going there, but I suppose that this is some manipulations with output files, e.g. flush of ActionFileSysyem.

The profile is here.

brentleyjones · 2022-10-20T12:16:18Z

My concern is more the symlinks actions taking the same amount of extra time. Those shouldn't be sandbox copy and delete times, since it's just a symlinks. So I agree, the more concerning part isn't the sandbox but the actuallyCompleteAction portion.

wilwell · 2022-10-20T12:28:20Z

@tjgq maybe you know more about of symbolic links sandbox manipulations during action execution/action completion.

I will try to add more detailed partition on actuallyCompleteAction, to understand what is going on inside that part.

chiragramani · 2022-10-24T21:41:40Z

Came across this issue - #14125 that also talks about some overhead involved withactuallyCompleteAction

Creating the output takes .75s, but postprocessing (checksumming?) via actuallyCompleteAction takes 5s

tjgq · 2022-10-25T12:24:48Z

Sadly, ctx.actions.symlink with a declare_file or declare_directory output parameter doesn't actually optimize out the copy. The action itself does create a symlink in the filesystem, but Skyframe dereferences it and loses track of the fact that it was a symlink when the action outputs are collected (which explains the additional checksumming and copying overhead).

One possible solution is to generalize 32b0f5a to also short-circuit the filesystem traversal for symlinks in the local execution case. I know that this isn't trivial to do because I tried (and failed) to do that in an earlier iteration of that CL.

tjgq · 2022-12-13T12:45:24Z

I'm going to close this as a duplicate of #14125.

larsrc-google · 2023-02-24T10:08:56Z

If you don't already, set --reuse_sandbox_directories to cut down on the sandboxing overhead.

brentleyjones · 2023-02-24T14:02:03Z

@larsrc-google Should the default on that be changed?

larsrc-google · 2023-02-24T14:36:14Z

There has been a problem with what seems to be bad interactions with async deletes that has made me hold off on flipping it.

brentleyjones · 2023-02-24T21:27:23Z

Would those errors look like this 😅?

ERROR: /Users/brentley/Developer/rules_xcodeproj/tools/generator/BUILD:20:14: Linking tools/generator/libgenerator.library.a [for tool] failed: I/O exception during sandboxed execution: /Users/brentley/Developer/rules_xcodeproj/bazel-output-base/rules_xcodeproj/build_output_base/sandbox/darwin-sandbox/43/execroot/com_github_buildbuddy_io_rules_xcodeproj/bazel-out (No such file or directory)

larsrc-google · 2023-02-27T08:03:22Z

Yes, that looks similar. I've spent some time looking into it but not found a viable hypothesis yet.

brentleyjones · 2023-02-27T14:57:52Z

I could very easily reproduce it. Would a repro be of value to you?

larsrc-google · 2023-02-27T15:02:38Z

Yes please, I've had trouble getting a stable repro.

brentleyjones · 2023-02-27T19:11:57Z

https://github.com/buildbuddy-io/rules_xcodeproj/tree/repro/reuse-sandbox

bazel clean --expunge && bazel run --config=cache //test/fixtures:update && bazel run --config=cache //tools/generator:xcodeproj

The last command for me consistently fails. If it doesn't fail, rerun the whole thing (to get the expunge in).

After it fails, just run bazel run --config=cache //tools/generator:xcodeproj to repro it again.

Just an FYI, the executable that is run for bazel run invokes a nested bazel build, which is the one that fails. Its output base will be at $WORKSPACE/bazel-output-base/rules_xcodeproj/build_output_base, and you can modify shared.bazelrc to have settings affect the inner invocation.

kastiglione mentioned this issue Feb 3, 2020

Use multiprocessing to speed up bundletool bazelbuild/rules_apple#698

Closed

irengrig added team-Performance Issues for Performance teams untriaged labels Feb 4, 2020

meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: documentation (cleanup) and removed untriaged labels Apr 29, 2020

larsrc-google assigned wilwell Sep 28, 2021

tjgq mentioned this issue Oct 25, 2022

FR?: ctx.actions.symlink action time scales w/ size of input; should be constant #14125

Closed

tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2022

brentleyjones mentioned this issue Feb 24, 2023

Use the --reuse_sandbox_directories flag MobileNativeFoundation/rules_xcodeproj#1784

Merged

chiragramani mentioned this issue Jun 8, 2023

Enhance "actuallyCompleteAction" algorithm for better build performance #18614

Open

brentleyjones mentioned this issue Aug 9, 2023

--experimental_reuse_sandbox_directories causes missing execution root when switching between bzlmod disabled and enabled #19217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding overhead showing in Chrome trace #10702

Understanding overhead showing in Chrome trace #10702

kastiglione commented Feb 3, 2020 •

edited

Loading

ghvg1313 commented Feb 5, 2020

brentleyjones commented May 6, 2020

BalestraPatrick commented Dec 16, 2021

chiragramani commented Oct 1, 2022

meisterT commented Oct 6, 2022

brentleyjones commented Oct 6, 2022

wilwell commented Oct 20, 2022

brentleyjones commented Oct 20, 2022 •

edited

Loading

wilwell commented Oct 20, 2022

chiragramani commented Oct 24, 2022

tjgq commented Oct 25, 2022 •

edited

Loading

tjgq commented Dec 13, 2022

larsrc-google commented Feb 24, 2023

brentleyjones commented Feb 24, 2023

larsrc-google commented Feb 24, 2023

brentleyjones commented Feb 24, 2023

larsrc-google commented Feb 27, 2023

brentleyjones commented Feb 27, 2023

larsrc-google commented Feb 27, 2023

brentleyjones commented Feb 27, 2023

Understanding overhead showing in Chrome trace #10702

Understanding overhead showing in Chrome trace #10702

Comments

kastiglione commented Feb 3, 2020 • edited Loading

Description of the problem request:

Feature requests: what underlying problem are you trying to solve with this feature?

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

ghvg1313 commented Feb 5, 2020

brentleyjones commented May 6, 2020

BalestraPatrick commented Dec 16, 2021

chiragramani commented Oct 1, 2022

meisterT commented Oct 6, 2022

brentleyjones commented Oct 6, 2022

wilwell commented Oct 20, 2022

brentleyjones commented Oct 20, 2022 • edited Loading

wilwell commented Oct 20, 2022

chiragramani commented Oct 24, 2022

tjgq commented Oct 25, 2022 • edited Loading

tjgq commented Dec 13, 2022

larsrc-google commented Feb 24, 2023

brentleyjones commented Feb 24, 2023

larsrc-google commented Feb 24, 2023

brentleyjones commented Feb 24, 2023

larsrc-google commented Feb 27, 2023

brentleyjones commented Feb 27, 2023

larsrc-google commented Feb 27, 2023

brentleyjones commented Feb 27, 2023

kastiglione commented Feb 3, 2020 •

edited

Loading

What's the output of `bazel info release`?

brentleyjones commented Oct 20, 2022 •

edited

Loading

tjgq commented Oct 25, 2022 •

edited

Loading