Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bevy_gizmos doc tests hang compute when run locally #12207

Open
alice-i-cecile opened this issue Feb 29, 2024 · 6 comments
Open

bevy_gizmos doc tests hang compute when run locally #12207

alice-i-cecile opened this issue Feb 29, 2024 · 6 comments
Labels
A-Build-System Related to build systems or continuous integration C-Bug An unexpected or incorrect behavior S-Blocked This cannot move forward until something else changes

Comments

@alice-i-cecile
Copy link
Member

Bevy version

b24ab2e

[Optional] Relevant system information

Windows

What you did

Run cargo run -p ci --doc.

What went wrong

Doc tests for initial crates pass, and then a dozen or so doc tests within bevy_gizmos get stuck, producing a warning that they've been running for more than 60 seconds.

My computer then runs out of RAM (over 12 GB!), freezes, and I have to hard reboot.

Additional information

Running with fewer threads via e.g. -j 4 improves responsiveness and slows down time-to-hang, but does not resolve the problem.

This works fine in CI, so it may be OS specific.

@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Build-System Related to build systems or continuous integration S-Needs-Investigation This issue requires detective work to figure out what's going wrong A-Gizmos Visual editor and debug gizmos O-Windows Specific to the Windows desktop operating system labels Feb 29, 2024
@alice-i-cecile alice-i-cecile changed the title bevy_gizmo doc tests hang compute when run locally bevy_gizmos doc tests hang compute when run locally Feb 29, 2024
@lynn-lumen
Copy link
Contributor

lynn-lumen commented Apr 17, 2024

This issue is related to 'Microsoft® Incremental Linker' - the linker used by the rust compiler on some (most?) windows machines.
The linkers in the following image use 18.8 GB of RAM!
gizmos-doc-test-task-manager

The one thing that sets bevy_gizmos apart from other crates like bevy_ecs is the increased "usage" of 'Microsoft® Incremental Linker'. Most other doc-tests also use 'Microsoft® Incremental Linker' but it does not use alot of memory there. Instead 'link.exe' uses more memory but remains within acceptable bounds (<1GB).
In addition, bevy_gizmos uses bevy_ecs::system::assert_is_system(system); alot in doc-tests. However, since other crates like bevy_ecs make excessive use of this assertion aswell and do not have this issue, I am not sure this is the cause of the problem.

No single doc-test in particular seems to be responsible for the increased memory usage, as removing any given set of doc-tests results in a decrease in memory usage more or less proportional to the amount of tests removed. Instead, running doc-tests for (very) large projects on windows machines seems to be the issue.

I was unable to find an "in code" fix for this, but there is a discussion on the rust forums that appears to suggest building for a different but compatible target or using rust-lld as the linker.

However, this problem does not seem to be limited to Windows only, as #11034 details a very similar (the same?) issue on a Linux machine.

@alice-i-cecile
Copy link
Member Author

Further discussion on Discord.

For some reason --jobs 1 doesn't seem to actually limit the amount of tests being compiled, but -- --test-threads=1 does. It also seems that the linker is invoked for each test, and for bevy_gizmos tests it takes almost 2GB per invocation. By default cargo runs 16 of them (one per thread), making me quickly run out of RAM. I wonder why it doesn't happen with the earlier tests though

General conclusions seem to be the same as above, although it looks like rust-lang/rust#123974 should resolve this.

@alice-i-cecile alice-i-cecile added S-Blocked This cannot move forward until something else changes and removed O-Windows Specific to the Windows desktop operating system S-Needs-Investigation This issue requires detective work to figure out what's going wrong A-Gizmos Visual editor and debug gizmos labels May 28, 2024
@SkiFire13
Copy link
Contributor

SkiFire13 commented May 28, 2024

In addition to the previous observation, rustdoc doesn't respect the linker option in Cargo's config.toml, you need to explicitly set it in the rustdocflags:

[target.x86_64-pc-windows-msvc]
rustdocflags = ["-Clinker=rust-lld.exe"]

This helps reduce the memory usage to 1/2 or 1/3, which is a huge improvement if it leads to avoiding swapping (depends on how much RAM your system has).

github-merge-queue bot pushed a commit that referenced this issue Jul 8, 2024
# Objective

- Rustdoc doesn't seem to follow cargo's `linker` setting
- Improves the situation in #12207

## Solution

- Explicitly set the linker in rustdoc flags

## Testing

- I tested this change on Windows and it significantly improves testing
performances (can't give an exact estimate since they got stuck before
this change)

---

Note: I avoided changing the settings on Linux and MacOS because I can't
test on those platforms. It would be nice if someone could test similar
changes there and report so they can be done on all major platforms.
@SkiFire13
Copy link
Contributor

rust-lang/rust#126245 has been merged, though unfortunately it will be enabled only with the 2024 edition.

@roblesch
Copy link
Contributor

Same issue on Fedora 40. Running cargo ci creates many instances of ld which consume all available memory and cause a system crash. Setting rustdocflags to use lld fixed the problem. The nightly release also runs successfully since it uses rust-lld.

Rust version info

rustc 1.80.1 (3f5fd8dd4 2024-08-06)
binary: rustc
commit-hash: 3f5fd8dd41153bc5fdca9427e9e05be2c767ba23
commit-date: 2024-08-06
host: x86_64-unknown-linux-gnu
release: 1.80.1
LLVM version: 18.1.7

config.toml

[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-Clink-arg=-fuse-ld=lld"]
rustdocflags = ["-Clink-arg=-fuse-ld=lld"]

github-merge-queue bot pushed a commit that referenced this issue Aug 22, 2024
# Objective

Running `cargo ci` on Fedora 40 causes a system crash due to many `ld`
processes consuming all available memory when performing doc tests.

## Solution

This PR extends #13553.

- Add reference to #12207 in the CI sections of `CONTRIBUTING.md` and
`config_fast_builds.toml`.
- Add sample `rustdocflags` configurations for `lld` and `mold` to
`config_fast_builds.toml` for Linux.

## Testing

Crashing configuration

- config.toml
  ```
  [target.x86_64-unknown-linux-gnu]
  linker = "clang"
  rustflags = ["-Clink-arg=-fuse-ld=lld"]
  
  [alias]
  ci = "run --package ci --"
  ```
- Test command
  `cargo ci`

Working configuration

- config.toml
  ```
  [target.x86_64-unknown-linux-gnu]
  linker = "clang"
  rustflags = ["-Clink-arg=-fuse-ld=lld"]
  rustdocflags = ["-Clink-arg=-fuse-ld=lld"]
  
  [alias]
  ci = "run --package ci --"
  ```
- Test command
  `cargo ci`
@BenjaminBrienen
Copy link
Contributor

12th Gen Intel(R) Core(TM) i7-12700H

cargo test --doc

test result: ok. 45 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.96s

With cargo run -p ci -- doc there is an entirely different story. A whole bunch of crates pass and then...

   Doc-tests bevy_gizmos

running 45 tests
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::arc_2d (line 34) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::arc_3d (line 154) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::long_arc_2d_between (line 386) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::long_arc_3d_between (line 253) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::short_arc_2d_between (line 340) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::short_arc_3d_between (line 207) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::ArrowBuilder<'_,'_,'_,Config,Clear>::with_tip_length (line 37) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::arrow (line 114) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::arrow_2d (line 145) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::axes (line 175) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::axes_2d (line 208) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::circle (line 116) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::circle_2d (line 157) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::ellipse (line 36) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::ellipse_2d (line 77) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::sphere (line 199) has been running for over 60 seconds
test crates\bevy_gizmos\src\cross.rs - cross::Gizmos<'_,'_,Config>::cross (line 24) has been running for over 60 seconds
test crates\bevy_gizmos\src\cross.rs - cross::Gizmos<'_,'_,Config>::cross_2d (line 59) has been running for over 60 seconds
test crates\bevy_gizmos\src\curves.rs - curves::Gizmos<'w,'s,Config,Clear>::curve_2d (line 30) has been running for over 60 seconds
test crates\bevy_gizmos\src\curves.rs - curves::Gizmos<'w,'s,Config,Clear>::curve_3d (line 64) has been running for over 60 seconds

It uses all 32 GB of RAM and causes the system to freeze up/soft crash after about 5-10 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Build-System Related to build systems or continuous integration C-Bug An unexpected or incorrect behavior S-Blocked This cannot move forward until something else changes
Projects
None yet
Development

No branches or pull requests

5 participants