Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding overhead showing in Chrome trace #10702

Closed
kastiglione opened this issue Feb 3, 2020 · 20 comments
Closed

Understanding overhead showing in Chrome trace #10702

kastiglione opened this issue Feb 3, 2020 · 20 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Performance Issues for Performance teams type: documentation (cleanup)

Comments

@kastiglione
Copy link
Contributor

kastiglione commented Feb 3, 2020

Description of the problem request:

We have a long final action at the end of our build that we'd like to speed up. When looking at a Chrome trace of our build, that final action has much more Bazel overhead than other actions in the build.

Screen Shot 2020-02-03 at 2 43 37 PM

My reading of this is that the action (BundleTreeApp a python script in rules_apple) is ~6s, and then after that Bazel does some post-processing in ActionContinuation.execute and then yet more post-processing in the top level invocation (skyframe?).

In the above screenshot, ActionContinuation.execute appears to have 2s of post-processing "overhead", and "Bundling, processing and signing" has another 2s of post-processing "overhead".

Feature requests: what underlying problem are you trying to solve with this feature?

When looking at a Chrome trace: How can we understand how and why Bazel overhead exists, and what can be done about it.

We see overhead at the beginning of a build as well, but this issue is focused on how to understand performance factors for poorly performing actions like this.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

You'd need a sufficiently large iOS project that has a non-trivial .app bundle.

What operating system are you running Bazel on?

macOS

What's the output of bazel info release?

release 2.0.0

@ghvg1313
Copy link

ghvg1313 commented Feb 5, 2020

Don't know if this is related, but we are also experiencing similar overhead when calling build and test action separately (bazel build <target> then bazel test <target>) with tree-artifacts turned on.

Traces:
test trace with tree-artifacts turned off
test trace with tree-artifacts turned on

The overhead is not too bad considering this is only one target, but you can clearly see multiple action dependency checking distributed across threads before test runner executes, which, you don't see when tree artifacts turned off.
The performance of dependency checking gets significantly worse when there're multiple targets under test

@meisterT meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: documentation (cleanup) and removed untriaged labels Apr 29, 2020
@brentleyjones
Copy link
Contributor

Looking into this, it seems the post processing is Bazel collecting the contents of the declare_directory?

@BalestraPatrick
Copy link
Member

This is still a valid issue. Our iOS app bundle with 16K files takes over 75s to complete this action, even though the rules_apple BundleTreeApp action was optimized to only take 10s, we have no easy way to optimize the rest of time that Bazel spends in collecting the contents of all directories and files.
Screenshot 2021-12-16 at 10 23 03 PM

@chiragramani
Copy link
Contributor

Yeah, we are seeing this issue as well.

image

  • Linking took ~8.5 seconds but actuallyCompleteAction took 6.4 seconds. (this makes the overall linking action take ~15 seconds)
  • Symlink creation takes significant time. (~12.5 seconds)
    image

Any optimizations here would give us really good wins, especially for the incremental builds.

@meisterT
Copy link
Member

meisterT commented Oct 6, 2022

@chiragramani do you have an example repository that we can try this with?

@brentleyjones
Copy link
Contributor

git clone https://github.com/buildbuddy-io/rules_xcodeproj.git
cd rules_xcodeproj
git checkout bj/overhead-demo
cd examples/integration

bazel build //iOSApp

You can then look at the Linking iOSApp/Source/iOSApp_bin, Creating symlink bazel-out/applebin_ios-ios_sim_arm64-fastbuild-ST-4e6c2a19403f/bin/iOSApp/Source/iOSApp_lipobin, and Creating symlink bazel-out/applebin_ios-ios_sim_arm64-fastbuild-ST-4e6c2a19403f/bin/iOSApp/Source/iOSApp-intermediates/iOSApp_ExecutableName actions:

CleanShot 2022-10-06 at 09 41 57@2x

The blank space at the end of all of them scale with the size of the linked binary (so this example will be milliseconds, 73 in my case on each action, but on real projects it grows to multiple seconds, and is unnecessary overhead).

@wilwell
Copy link

wilwell commented Oct 20, 2022

Thanks @brentleyjones for reproduction code.
I was able to reproduce your example only on M1 machine.
I investigated Linking iOSApp... action. On that case the difference between ActionContinuation.execute and subprocess.run is sandbox.delete and sandbox.CopyOutputs, which is reasonable.

But actuallyCompleteAction following ActionContinuation.execute is more interesting. I'm not sure what exactly going there, but I suppose that this is some manipulations with output files, e.g. flush of ActionFileSysyem.

The profile is here.

@brentleyjones
Copy link
Contributor

brentleyjones commented Oct 20, 2022

My concern is more the symlinks actions taking the same amount of extra time. Those shouldn't be sandbox copy and delete times, since it's just a symlinks. So I agree, the more concerning part isn't the sandbox but the actuallyCompleteAction portion.

@wilwell
Copy link

wilwell commented Oct 20, 2022

@tjgq maybe you know more about of symbolic links sandbox manipulations during action execution/action completion.

I will try to add more detailed partition on actuallyCompleteAction, to understand what is going on inside that part.

@chiragramani
Copy link
Contributor

Came across this issue - #14125 that also talks about some overhead involved withactuallyCompleteAction

Creating the output takes .75s, but postprocessing (checksumming?) via actuallyCompleteAction takes 5s

@tjgq
Copy link
Contributor

tjgq commented Oct 25, 2022

Sadly, ctx.actions.symlink with a declare_file or declare_directory output parameter doesn't actually optimize out the copy. The action itself does create a symlink in the filesystem, but Skyframe dereferences it and loses track of the fact that it was a symlink when the action outputs are collected (which explains the additional checksumming and copying overhead).

One possible solution is to generalize 32b0f5a to also short-circuit the filesystem traversal for symlinks in the local execution case. I know that this isn't trivial to do because I tried (and failed) to do that in an earlier iteration of that CL.

@tjgq
Copy link
Contributor

tjgq commented Dec 13, 2022

I'm going to close this as a duplicate of #14125.

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2022
@larsrc-google
Copy link
Contributor

If you don't already, set --reuse_sandbox_directories to cut down on the sandboxing overhead.

@brentleyjones
Copy link
Contributor

@larsrc-google Should the default on that be changed?

@larsrc-google
Copy link
Contributor

There has been a problem with what seems to be bad interactions with async deletes that has made me hold off on flipping it.

@brentleyjones
Copy link
Contributor

Would those errors look like this 😅?

ERROR: /Users/brentley/Developer/rules_xcodeproj/tools/generator/BUILD:20:14: Linking tools/generator/libgenerator.library.a [for tool] failed: I/O exception during sandboxed execution: /Users/brentley/Developer/rules_xcodeproj/bazel-output-base/rules_xcodeproj/build_output_base/sandbox/darwin-sandbox/43/execroot/com_github_buildbuddy_io_rules_xcodeproj/bazel-out (No such file or directory)

@larsrc-google
Copy link
Contributor

Yes, that looks similar. I've spent some time looking into it but not found a viable hypothesis yet.

@brentleyjones
Copy link
Contributor

I could very easily reproduce it. Would a repro be of value to you?

@larsrc-google
Copy link
Contributor

Yes please, I've had trouble getting a stable repro.

@brentleyjones
Copy link
Contributor

https://github.com/buildbuddy-io/rules_xcodeproj/tree/repro/reuse-sandbox

bazel clean --expunge && bazel run --config=cache //test/fixtures:update && bazel run --config=cache //tools/generator:xcodeproj

The last command for me consistently fails. If it doesn't fail, rerun the whole thing (to get the expunge in).

After it fails, just run bazel run --config=cache //tools/generator:xcodeproj to repro it again.

Just an FYI, the executable that is run for bazel run invokes a nested bazel build, which is the one that fails. Its output base will be at $WORKSPACE/bazel-output-base/rules_xcodeproj/build_output_base, and you can modify shared.bazelrc to have settings affect the inner invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Performance Issues for Performance teams type: documentation (cleanup)
Projects
None yet
Development

No branches or pull requests

10 participants