Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python zip action is memory hungry even if zipping not used - 80% of heap in our case #14890

Open
glukasiknuro opened this issue Feb 23, 2022 · 9 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: bug

Comments

@glukasiknuro
Copy link
Contributor

glukasiknuro commented Feb 23, 2022

Description of the problem / feature request:

While investigating memory usage during analysis phase noticed that py_binary and py_test rules were responsible for majority of memory usage in analysis phase. Tracked it down to zip action that is always created - irrespective whether --build_python_zip is used, or what platform it runs on.

Looks like the below code iterates over all runfiles and creates strings for those, if using a lot of dependencies this adds on pretty quickly:

for (Artifact artifact : runfilesSupport.getRunfilesArtifacts().toList()) {

Commenting out the above code caused 80% reduction of heap usage in our case as reported by bazel info used-heap-size-after-gc after building codebase with --nobuild option. Even simply calling .intern() on the created String reduced heap usage by 70%, but it maybe very code-base dependent.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

See first comment.

What operating system are you running Bazel on?

Ubuntu 20.04

What's the output of bazel info release?

bazel 5

@glukasiknuro
Copy link
Contributor Author

glukasiknuro commented Feb 23, 2022

I was able to reproduce the problem on part of tensorflow codebase, but there are additional factors to it.

Setup:

git clone [email protected]:tensorflow/tensorflow.git

Using https://bazel.build/rules/performance#enabling-memory-tracking

bazel ${STARTUP_FLAGS} build --nobuild //tensorflow/python/...

Released bazel 5

$ bazel ${STARTUP_FLAGS} dump --rules | grep py_
py_test                              1,654      14,967    380,780,088      230,217  
py_library                           1,085           0      1,048,712          966  
py_binary                               32         192      6,030,736      188,460  

$ bazel ${STARTUP_FLAGS} info used-heap-size-after-gc
626MB

Bazel 5 with change to intern strings

$ bazel ${STARTUP_FLAGS} dump --rules | grep py_
py_test                              1,654      14,967     65,358,744       39,515  
py_library                           1,085           0      1,572,952        1,449  
py_binary                               32         192        786,480       24,577  

$ bazel ${STARTUP_FLAGS} info used-heap-size-after-gc
289MB

It looks it gives some savings here.

Interesting part is that enabling Java 11 features in bazel (0c65082) makes the numbers different.

master

py_test                              1,654      14,967     74,761,696       45,200  
py_library                           2,018           0      2,097,168        1,039  
py_binary                               37         222      1,048,472       28,337  

669MB

master with patch

py_test                              1,654      14,967     30,975,608       18,727  
py_library                           2,018           0      1,573,152          779  
py_binary                               37         222      1,311,704       35,451  

335MB

I suspect it's caused by changes to string concatenation that may use less intermediate memory - https://openjdk.java.net/jeps/280. It also may mean that the reported values by dump --rules contain temporary memory that was allocated but possibly GC-ed, making those a little less useful.

EDIT: Using VisualVM confirmed that after migrating to Java 11 strings are still created and occupy space, those are just not counted by memory allocator - possibly due to concat optimizations in java, which call methods not instrumented by allocator (http://hg.openjdk.java.net/jdk/jdk11/file/jdk-11+28/src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java#l1633)

However the used heap space is still 2x lower afterwards, which seems suspicious, not sure how to explain 300M less memory usage in the case of code at master.

I'll send a PR shortly, maybe it would be useful to check how interning would work on Google's codebase (if the python rule is used there)?

@glukasiknuro
Copy link
Contributor Author

The --host_jvm_args=-Djava.lang.invoke.stringConcat=MH_SB_SIZED option can be used to get more accurate reading from memory tracker in java 11 (see http://hg.openjdk.java.net/jdk/jdk11/file/jdk-11+28/src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java#l128)

This supports the need for a fix - with the simple above patch py_tests goes from 396M to 60M, and heap as a whole from 669M to 335M.

Bazel at current master

$ bazel ${STARTUP_FLAGS} dump --rules | grep py_
py_test                              1,654      14,967    396,030,656      239,438  
py_library                           2,018           0      1,835,384          909  
py_binary                               37         222      6,822,776      184,399  

$ bazel ${STARTUP_FLAGS} info used-heap-size-after-gc
669MB

Bazel at current master with #14892

$ bazel ${STARTUP_FLAGS} dump --rules | grep py_
py_test                              1,654      14,967     60,102,504       36,337  
py_library                           2,018           0      1,310,616          649  
py_binary                               37         222      1,573,048       42,514  

$ bazel ${STARTUP_FLAGS} info used-heap-size-after-gc
335MB

@aiuto aiuto added team-Rules-Python Native rules for Python untriaged labels Feb 26, 2022
@comius comius added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Mar 10, 2022
@comius
Copy link
Contributor

comius commented Mar 10, 2022

cc @rickeylev

@rickeylev
Copy link
Contributor

What would probably work better is to use the Java equivalent of the "map_each" arg that Args.add_all has. This defers building any values until the command line is actually needed. It'd also avoid materializing the depset until later, too. I'm not familiar enough with the Bazel Java APIs to say how to do that in Java, though

@alex-torok
Copy link
Contributor

I've opened a PR on top of 5.0.0 that fixes this. Unfortunately it relies on code since removed on master. I'm hoping someone who has more experience than me (none) contributing to bazel can weigh in to help figure out how it should be done to land on master.

#15037

@rickeylev
Copy link
Contributor

FWIW, there are probably a couple more changes that would benefit memory usage. The key thing to remember is that, even if the output isn't used, the analysis-time work to set it up in case it is used is still done.

The biggest would be eliminating the remaining toList() calls.

  • The getRunfilesArtifacts().toList() is the biggest one -- that is still flattening the underlying depsets.
  • The getEmptyFilenames().toList() is probably much smaller -- max of about 1 per directory. One note here is I think getEmptyFilenames is actually lazily computed, so calling it at all ends up doing work even if toList() doesn't happen. Not 100% sure on that.

Omitting the executable and zipFile is probably best done by preventing those from being in the input depset to begin with. i.e., restructure the code so that it's something like:

base_files = depset(<everything but executable/zipfile>)
createPythonZipAction(base_files)
createExecutable(runfiles(files=base_files + [executable, zipfile]))

Otherwise, "input manifests" (RunfilesSuppliers class) maybe can be used to trim the set of inputs. Someone from Bazel should check me on this; I just remember seeing a comment somewhere saying that's what input manifests were for.

HTH and happy optimizing!

bazel-io pushed a commit that referenced this issue Apr 12, 2022
### Summary

This PR attempts to address #14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from #14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes #15037.

PiperOrigin-RevId: 441257695
philsc pushed a commit to philsc/bazel that referenced this issue May 12, 2022
### Summary

This PR attempts to address bazelbuild#14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from bazelbuild#14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes bazelbuild#15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)
philsc pushed a commit to philsc/bazel that referenced this issue May 18, 2022
### Summary

This PR attempts to address bazelbuild#14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from bazelbuild#14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes bazelbuild#15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)
ckolli5 pushed a commit that referenced this issue May 19, 2022
)

### Summary

This PR attempts to address #14890.

This updates the PythonZipper action to use a `CommandLineItem.CapturingMapFn` to defer expanding arguments until after analysis. I used `CapturingMapFn`, as it seemed like the only way to go about implementing the arg filtering that is done by the current code to exclude the `executable` and `zipFile` artifacts.

### Test Plan

Use the example from #14890 of looking at memory usage in tensorflow before/after this patch is applied on top of version `5.0.0`

Initial Setup
```
git clone [email protected]:tensorflow/tensorflow.git
cd tensorflow
python3 -m venv venv
source venv/bin/activate
pip install numpy
```

View memory usage at 5.0.0:
```bash
# STARTUP_FLAGS from https://bazel.build/rules/performance#enabling-memory-tracking
$ bazel ${STARTUP_FLAGS[@]} build --nobuild //tensorflow/python/...
$ bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390    413,338,632      241,154
py_library                           1,102           0      2,097,152        1,903
py_binary                               33         198      8,914,840      270,146
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
635MB
```

View memory usage at 5.0.0 with this patch applied:
```bash
$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} dump --rules | egrep '(RULE|py_)'
RULE                                 COUNT     ACTIONS          BYTES         EACH
py_test                              1,714      15,390     65,323,312       38,111
py_library                           1,102           0      2,359,576        2,141
py_binary                               33         198        524,400       15,890
py_runtime                               6           0              0            0
py_runtime_pair                          3           0              0            0
_concat_flatbuffer_py_srcs               2           2              0            0

$ ~/code/bazel/bazel-bin/src/bazel ${STARTUP_FLAGS[@]} info used-heap-size-after-gc
283MB
```

Ensure that the generated actions have not changed:
```bash
$ bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_out
$ ~/code/bazel/bazel-bin/src/bazel aquery --build_python_zip 'mnemonic("PythonZipper", //tensorflow/python/...)' > /tmp/bazel_5_patched_out
$ diff /tmp/bazel_5_out /tmp/bazel_5_patched_out
# note: no diff output
```

Closes #15037.

PiperOrigin-RevId: 441257695
(cherry picked from commit 9610ae8)

Co-authored-by: Alex Torok <[email protected]>
Copy link

github-actions bot commented Feb 7, 2024

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 7, 2024
@fmeum
Copy link
Collaborator

fmeum commented Feb 12, 2024

@rickeylev Do you think that the Starlark rewrite could have fixed this?

@rickeylev
Copy link
Contributor

Yes.

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: bug
Projects
None yet
Development

No branches or pull requests

7 participants