Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel performance regression caused by b321444 #9270

Closed
joeleba opened this issue Aug 28, 2019 · 31 comments
Closed

Bazel performance regression caused by b321444 #9270

joeleba opened this issue Aug 28, 2019 · 31 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules

Comments

@joeleba
Copy link
Member

joeleba commented Aug 28, 2019

Description of the problem / feature request:

The nightly benchmarking today showed a regression in MacOS env from commit e2a9e4a to 242acd6:
https://perf.bazel.build/bazel/2019/08/27_orig/report.html

After running the benchmarking again for the range of commits that potentially regressed Bazel performance, there's significant evidence to suggest that b321444 caused a regression of 4-7% when doing build //src:bazel:

https://perf.bazel.build/bazel/2019/08/27/report.html

(it's clearest for Mac environment).

The run on buildkite: https://buildkite.com/bazel-trusted/bazel-bench-bisect/builds/1#9f5c44f7-4955-4bd5-a8e4-704e8a34ccc3
I've extracted the result tables here:

ubuntu16
 
RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  127.753s             127.437s               1.742s
  system:    9.230s               9.200s               0.231s
  memory:   88.500s              89.000s               1.628s
     cpu:   88.381s              88.375s               3.245s
 
Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  128.879s ( +0.88%)   129.864s ( +1.90%)     3.043s    0.68715
  system:    9.355s ( +1.35%)     9.455s ( +2.77%)     0.323s    0.88916
  memory:   89.100s ( +0.68%)    89.000s ( +0.00%)     0.300s    0.32492
     cpu:   88.834s ( +0.51%)    88.800s ( +0.48%)     3.881s    0.02521
 
Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  132.352s ( +2.69%)   132.454s ( +1.99%)     2.899s    0.88916
  system:    9.260s ( -1.02%)     9.225s ( -2.43%)     0.268s    0.68715
  memory:   89.000s ( -0.11%)    89.000s ( +0.00%)     0.632s    0.02521
     cpu:   88.941s ( +0.12%)    88.250s ( -0.62%)     5.643s    0.32492
 
Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  131.656s ( -0.53%)   131.805s ( -0.49%)     1.609s    0.02521
  system:    9.445s ( +2.00%)     9.425s ( +2.17%)     0.290s    0.68715
  memory:   89.000s ( +0.00%)    89.000s ( +0.00%)     0.000s    0.02521
     cpu:   88.999s ( +0.07%)    88.865s ( +0.70%)     3.346s    0.32492
================================================================================================
 
macos
 
RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  309.548s             309.460s               1.494s
  system:   71.081s              71.339s               0.566s
  memory:   87.100s              87.000s               0.300s
     cpu:   82.248s              81.194s               4.033s
 
Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  310.194s ( +0.21%)   310.045s ( +0.19%)     1.742s    0.32492
  system:   70.743s ( -0.48%)    70.582s ( -1.06%)     0.652s    0.88916
  memory:   87.000s ( -0.11%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   80.573s ( -2.04%)    81.925s ( +0.90%)     2.718s    0.68715
 
Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  332.208s ( +7.10%)   332.197s ( +7.14%)     1.000s    0.99998
  system:   70.697s ( -0.07%)    70.609s ( +0.04%)     0.610s    0.02521
  memory:   87.000s ( +0.00%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   81.668s ( +1.36%)    81.578s ( -0.42%)     2.602s    0.32492
 
Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  331.374s ( -0.25%)   331.298s ( -0.27%)     2.149s    0.68715
  system:   70.640s ( -0.08%)    70.630s ( +0.03%)     0.405s    0.32492
  memory:   87.000s ( +0.00%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   82.442s ( +0.95%)    81.965s ( +0.47%)     4.983s    0.32492
 
================================================================================================
 
RBE (ubuntu16, JDK8)
 
RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  130.368s             129.868s               2.859s
  system:    9.331s               9.255s               0.262s
  memory:   89.100s              89.000s               0.300s
     cpu:   97.250s              97.645s               2.393s
 
Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  129.073s ( -0.99%)   128.737s ( -0.87%)     1.623s    0.32492
  system:    9.305s ( -0.28%)     9.270s ( +0.16%)     0.433s    0.32492
  memory:   88.900s ( -0.22%)    89.000s ( +0.00%)     0.300s    0.00000
     cpu:   93.273s ( -4.09%)    92.355s ( -5.42%)     2.708s    0.96895
 
Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  134.438s ( +4.16%)   133.944s ( +4.04%)     2.091s    0.99998
  system:    9.376s ( +0.76%)     9.250s ( -0.22%)     0.394s    0.02521
  memory:   88.800s ( -0.11%)    89.000s ( +0.00%)     0.400s    0.00000
     cpu:   95.020s ( +1.87%)    96.560s ( +4.55%)     5.693s    0.68715
 
Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  133.231s ( -0.90%)   133.382s ( -0.42%)     1.327s    0.32492
  system:    9.533s ( +1.67%)     9.420s ( +1.84%)     0.376s    0.88916
  memory:   88.900s ( +0.11%)    89.000s ( +0.00%)     0.700s    0.02521
     cpu:   94.191s ( -0.87%)    95.625s ( -0.97%)     3.352s    0.68715
 
================================================================================================
 
ubuntu18
 
RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  136.049s             137.951s               6.202s
  system:   10.050s              10.050s               0.538s
  memory:   88.300s              89.000s               1.487s
     cpu:   96.344s              94.405s               6.372s
 
Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  127.523s ( -6.27%)   126.273s ( -8.47%)     4.430s    0.99878
  system:    9.095s ( -9.50%)     8.865s (-11.79%)     0.580s    0.99310
  memory:   88.200s ( -0.11%)    89.000s ( +0.00%)     1.470s    0.00000
     cpu:   91.227s ( -5.31%)    89.835s ( -4.84%)     4.968s    0.88916
 
Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  136.405s ( +6.96%)   134.654s ( +6.64%)     4.864s    0.99983
  system:    9.467s ( +4.09%)     9.320s ( +5.13%)     0.550s    0.96895
  memory:   88.800s ( +0.68%)    89.000s ( +0.00%)     0.400s    0.02521
     cpu:   92.182s ( +1.05%)    89.760s ( -0.08%)     7.173s    0.02521
 
Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  134.061s ( -1.72%)   133.913s ( -0.55%)     2.584s    0.68715
  system:    9.441s ( -0.27%)     9.365s ( +0.48%)     0.317s    0.02521
  memory:   88.700s ( -0.11%)    89.000s ( +0.00%)     0.458s    0.00000
     cpu:   91.768s ( -0.45%)    91.045s ( +1.43%)     5.100s    0.32492

A high pval suggests strong statistical evidence that there's a regression. From the report seems like most of the increase in wall time comes from the Execution phase.

@joeleba
Copy link
Member Author

joeleba commented Aug 28, 2019

@iirina could you help triage this?
cc @meisterT

@iirina
Copy link
Contributor

iirina commented Aug 28, 2019

Thanks for the report @joeleba. I can think of either #8964 or 10db812#diff-15230e5dad94ffaefa7b526c0c1e89e5 that could cause performance regressions from java_tools. How can I reproduce this? Do you think this is a release blocker for bazel 1.0?

cc @benjaminp @cushon

@iirina iirina added P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules labels Aug 28, 2019
@joeleba
Copy link
Member Author

joeleba commented Aug 29, 2019

To reproduce, you first need to clone the benchmarking tool: https://github.com/bazelbuild/bazel-bench.
Then from the bazel-bench project's root, run the following:

bazel run :benchmark \
--  \
--bazel_source=https://github.com/bazelbuild/bazel.git \
--project_source=https://github.com/bazelbuild/bazel.git \
--collect_memory \
--data_directory=/tmp/.bazel-bench/out \
--csv_file_name=perf_data.csv \
--runs=10 \
--bazel_commits=e2a9e4a3fc99d018e04753f702a6498a4099d2ae,177731fe7cee35bf86f047af4f49042ad176962a,b321444237467f8730d01db634d43c8b0ea8d91b,242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9 \
--project_commits=756096de03e165ef95b56728ae31afa2833caf5c \
-- build //src:bazel

The tool builds a bazel binary for each bazel commit, then performs bazel build //src:bazel with those binaries.

@iirina
Copy link
Contributor

iirina commented Aug 29, 2019

Thanks! Is there a way I can point to my local bazel repository? And what's the difference between project_source and bazel_source?

@joeleba
Copy link
Member Author

joeleba commented Aug 29, 2019

bazel_source can either be a local bazel repo or a remote one, from which you build your bazel binaries.
project_source is a local/remote repo on which you run benchmarking. In this case, project_source=https://github.com/bazelbuild/bazel.git because I'm benchmarking the performance of build //src:bazel.

@meisterT
Copy link
Member

meisterT commented Sep 2, 2019

I think this is a release blocker for Bazel 1.0. cc @dslomov as the release manager to decide.

@dslomov
Copy link
Contributor

dslomov commented Sep 2, 2019

I would like to understand what exactly caused the regression before making a call on 1.0 blocking

@iirina
Copy link
Contributor

iirina commented Sep 2, 2019

I'm running the performance script. It takes a while. I'll post the findings when it's done.

@meisterT
Copy link
Member

meisterT commented Sep 2, 2019

I did look at the action times in the JSON profile (on Mac). Some actions are faster, some are slower, but the sum of differences is amounts to ~250s compared to the faster build. Divided by the number of threads (12) this is roughly the slowdown that we see.

I agree with Irina that #8964 is the most likely culprit.

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

The script finished but it doesn't show me the results for all the commits I wanted.

Here is the command I ran:

bazel run :benchmark -- --bazel_source=/usr/local/google/home/elenairina/bazel --project_source=https://github.com/bazelbuild/bazel.git  --collect_memory --data_directory=/tmp/.bazel-bench/out --csv_file_name=perf_data.csv --runs=10 --bazel_commits=580038ec43b52c00ec704b1ff139924a6575ba20,403496540806f3f5404e20919202e28fb4b68f18,23f6295eaa724f5ce32dcce2f749978c0afd7bfd,523183e183b1e26b2e8e5c898ebd79949ae2f846 --project_commits=756096de03e165ef95b56728ae31afa2833caf5c -- build //src:bazel

I actually wanted results for the last 2 commits I specified, which are my local commits using two different java_tools versions. Is there a way I can debug and see what went wrong?

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

@joeleba helped me to debug the script and it doesn't work with branches other than master. I'll rebase into my local master and re-run the script. I hope we'll get results by the end of day.

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

I managed to get some results \o/

RESULTS:
Bazel commit: e6f2f5b17c85956db9c2318788b4e715767b7aca, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  286.434s             285.822s               6.220s            
  system:   19.618s              19.580s               0.222s            
  memory:   85.000s              85.000s               0.000s            
     cpu:   59.353s              59.395s               2.665s            


Bazel commit: 0bd7273c74b0385baee503a0199e4fac70df8468, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  262.399s ( -8.39%)   260.001s ( -9.03%)     7.569s    0.99983 
  system:   19.155s ( -2.36%)    19.005s ( -2.94%)     0.352s    0.99310 
  memory:   85.000s ( +0.00%)    85.000s ( +0.00%)     0.000s    0.00000 
     cpu:   57.724s ( -2.74%)    58.345s ( -1.77%)     2.109s    0.32492 


Bazel commit: 79892e8a6aba1a880e68c214edce628fb63dfd13, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  255.801s ( -2.51%)   255.778s ( -1.62%)     2.652s    0.96895 
  system:   19.110s ( -0.23%)    19.105s ( +0.53%)     0.206s    0.32492 
  memory:   85.000s ( +0.00%)    85.000s ( +0.00%)     0.000s    0.00000 
     cpu:   58.186s ( +0.80%)    58.165s ( -0.31%)     2.890s    0.02521 

0bd7273c74b0385baee503a0199e4fac70df8468 is the commit with java_tools built without Turbine update.
79892e8a6aba1a880e68c214edce628fb63dfd13 is the commit with java_tools built without G1 change.
I'm not sure the report is helpful 😕

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

There's a high pval for both commits.

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

I think I should have compared them individually with a baseline.

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

The commits are pushed here #9318

@joeleba
Copy link
Member Author

joeleba commented Sep 3, 2019

What is e6f2f5b17c85956db9c2318788b4e715767b7aca?

A high pval (0.99983) means there's strong evidence that there's a difference in the distribution of the wall time measured between 0bd7273c74b0385baee503a0199e4fac70df8468 and e6f2f5b17c85956db9c2318788b4e715767b7aca. In this case, 0bd7273c74b0385baee503a0199e4fac70df8468 is ~9.03% faster (median).

So you can say that removing the Turbine update made bazel build //src:bazel faster by ~9.03%.

Similarly, removing G1 and Turbine made bazel build //src:bazel faster by ~1.62% compared to just removing Turbine.

Also note that running benchmarking locally brings about a certain degree of noise to the data, as it is not an isolated environment.

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

e6f2f5b17c85956db9c2318788b4e715767b7aca is the commit that was at head when I started the script (e6f2f5b)

@iirina
Copy link
Contributor

iirina commented Sep 3, 2019

cc @cushon

Thanks Joe for the details! If I didn't mess anything up when performing the benchmark, the Turbine update in 10db812 made building bazel slower. Liam, do you know of any potential reasons why that might happen?

@joeleba
Copy link
Member Author

joeleba commented Sep 4, 2019

Btw, with bazelbuild/bazel-bench#66, you can now specify pre-built bazel binaries to benchmark (as an alternative to the rebasing-your-local-change-onto-master workflow). Example:

bazel run :benchmark \
-- \
--bazel_source=/usr/local/google/home/elenairina/bazel \
--project_source=https://github.com/bazelbuild/bazel.git  \
--collect_memory \
--data_directory=/tmp/.bazel-bench/out \
--csv_file_name=perf_data.csv \
--runs=10 \
--bazel_commits=580038ec43b52c00ec704b1ff139924a6575ba20,403496540806f3f5404e20919202e28fb4b68f18 \
--bazel_binaries=/path/to/custom/binary/23f6295eaa724f5ce32dcce2f749978c0afd7bfd,/path/to/custom/binary/523183e183b1e26b2e8e5c898ebd79949ae2f846 \
--project_commits=756096de03e165ef95b56728ae31afa2833caf5c \
-- build //src:bazel

@iirina
Copy link
Contributor

iirina commented Sep 4, 2019

That sounds great @joeleba! That should also make the script faster. Do you still need to specify bazel_commits when bazel_binaries are set?

@joeleba
Copy link
Member Author

joeleba commented Sep 4, 2019

In general:

  • --bazel_commits is for commits available in branch master of whichever repo you specified in --bazel_source. The script takes care of building the binaries from those commits.
  • --bazel_binaries can be used for any pre-built binaries (it could very well be a WIP, doesn't have to be associated with any commit). This provides more flexibility, but requires you to build the binaries yourself beforehand.

In this case, you can use a combination of these 2 flags: --bazel_commits for commits available on master, and --bazel_binaries for binaries built from your local changes.

Note that since bazelbuild/bazel-bench#66, if a commit is not available on master (like what happened in #9270 (comment)), the benchmarking script would fail.

@iirina
Copy link
Contributor

iirina commented Sep 4, 2019

Nice! Thanks for making these improvements!

@meisterT
Copy link
Member

872c11a is the actual culprit.

on a Mac (with 6 cores + HT):

  • before: 327s
  • after: 351s (+7.3%)

on a beefy Linux workstation (with 36 cores + HT):

  • before: 69s
  • after: 70s (+1.4%, but statistically insignificant)

on my specialist workstation (restricting to 6 virtual cores):

  • before: 247s
  • after: 257s (+4.0%)

@dslomov
Copy link
Contributor

dslomov commented Sep 27, 2019

Potential release blocker for 1.0 (#8573)

bazel-io pushed a commit that referenced this issue Oct 1, 2019
*** Reason for rollback ***

This change introduced a performance regression in bazel.

More details in #9270.

*** Original change description ***

Allow G1 in tools/jdk/BUILD.java_tools.

This is a followup to 01e6e05. (I think BUILD.java_tools was introduced in between the creation of the PR for 01e6e05 and its landing.)

Closes #8964.

PiperOrigin-RevId: 272156477
@dslomov dslomov added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Oct 1, 2019
@dslomov
Copy link
Contributor

dslomov commented Oct 1, 2019

Should I pick 872c11a for 1.0? (#8573)

@dslomov
Copy link
Contributor

dslomov commented Oct 1, 2019

(Oh I see it is not complete yet...)

@iirina
Copy link
Contributor

iirina commented Oct 1, 2019

@dslomov I'm starting a java_tools patch release. I'll let you know when the cherry pick is ready.

@dslomov
Copy link
Contributor

dslomov commented Oct 1, 2019

@dslomov I'm starting a java_tools patch release. I'll let you know when the cherry pick is ready.

thanks!

iirina added a commit to iirina/bazel that referenced this issue Oct 1, 2019
*** Reason for rollback ***

This change introduced a performance regression in bazel.

More details in bazelbuild#9270.

*** Original change description ***

Allow G1 in tools/jdk/BUILD.java_tools.

This is a followup to 01e6e05. (I think BUILD.java_tools was introduced in between the creation of the PR for 01e6e05 and its landing.)

Closes bazelbuild#8964.

PiperOrigin-RevId: 272156477
@iirina
Copy link
Contributor

iirina commented Oct 2, 2019

Bazel was updated with a java_tools patch release that disables G1. @meisterT @joeleba Can you please confirm the performance is improved?

@meisterT
Copy link
Member

meisterT commented Oct 2, 2019

Thanks Irina!

It looks much better again:
tPQP9S1xTiN

@iirina
Copy link
Contributor

iirina commented Oct 2, 2019

That's great, thanks for checking!

dslomov pushed a commit that referenced this issue Oct 2, 2019
Fixes #9270.

Closes #9457.

PiperOrigin-RevId: 272210868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules
Projects
None yet
Development

No branches or pull requests

4 participants