Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32-bit ARM builds fail as single process uses >3 GiB memory #4320

Closed
MichaIng opened this issue Feb 5, 2024 · 48 comments
Closed

32-bit ARM builds fail as single process uses >3 GiB memory #4320

MichaIng opened this issue Feb 5, 2024 · 48 comments

Comments

@MichaIng
Copy link

MichaIng commented Feb 5, 2024

Subject of the issue

When building vaultwarden on 32-bit ARM systems, it fails at the last compilation step, when assembling/linking the final vaultwarden binary. I first recognised that our GitHub Actions workflow failed building vaultwarden v1.30.3. This compiles on the public GitHub Actions runners within a QEMU-emulated container, throwing the following errors:

fatal runtime error: Rust cannot catch foreign exceptions
qemu: uncaught target signal 6 (Aborted) - core dumped
error: could not compile `vaultwarden` (bin "vaultwarden")

Caused by:
  process didn't exit successfully: `/root/.rustup/toolchains/1.75.0-armv7-unknown-linux-gnueabihf/bin/rustc --crate-name vaultwarden --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C lto=fat -C codegen-units=1 --cfg 'feature="libsqlite3-sys"' --cfg 'feature="sqlite"' -C metadata=1d58e77e878ebff3 -C extra-filename=-1d58e77e878ebff3 --out-dir /tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps -C strip=debuginfo -L dependency=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps --extern argon2=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libargon2-b3c0e2fb66f02f2d.rlib --extern bigdecimal=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libbigdecimal-5c86038a6cda2357.rlib --extern bytes=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libbytes-96d5c91a8970d3e0.rlib --extern cached=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libcached-821a263ab8ba999e.rlib --extern chrono=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libchrono-15dfa97106ea9ebc.rlib --extern chrono_tz=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libchrono_tz-78b03a250f1270bd.rlib --extern cookie=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libcookie-043627deeb63f29d.rlib --extern cookie_store=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libcookie_store-669046e1dd142dcd.rlib --extern dashmap=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdashmap-4e900a06ccdaf881.rlib --extern data_encoding=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdata_encoding-2d1c20ef5ab53b9f.rlib --extern data_url=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdata_url-b52f35a304c403e4.rlib --extern diesel=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdiesel-ca0c77a5a320c634.rlib --extern diesel_migrations=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdiesel_migrations-16e3489a14ca359f.rlib --extern dotenvy=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libdotenvy-0639e1593ada03c1.rlib --extern email_address=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libemail_address-ef15593c949ba84d.rlib --extern fern=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libfern-a911570f27da85c4.rlib --extern futures=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libfutures-e89e3ce9357840c0.rlib --extern governor=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libgovernor-4f4ba3b235183b38.rlib --extern handlebars=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libhandlebars-c7fd7cd5e66214d0.rlib --extern html5gum=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libhtml5gum-f30a2504d52ce515.rlib --extern job_scheduler_ng=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libjob_scheduler_ng-9ad45ce09bc25744.rlib --extern jsonwebtoken=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libjsonwebtoken-450434a1f9394291.rlib --extern lettre=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/liblettre-452bf5ebf537ccf1.rlib --extern libsqlite3_sys=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/liblibsqlite3_sys-31047204d554a120.rlib --extern log=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/liblog-546ced36359b5ce9.rlib --extern num_derive=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libnum_derive-5e15818039b17888.so --extern num_traits=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libnum_traits-92bc955e64bc391c.rlib --extern once_cell=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libonce_cell-49d951a6756b3fe0.rlib --extern openssl=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libopenssl-f81ff30d71ee3f77.rlib --extern paste=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libpaste-a79e4674c1034386.so --extern percent_encoding=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libpercent_encoding-39b0e9fee14dda0f.rlib --extern pico_args=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libpico_args-1ee48336305d6717.rlib --extern rand=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/librand-f0cff6ea6faef3d8.rlib --extern regex=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libregex-527e30f8b0362c21.rlib --extern reqwest=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libreqwest-ea7db8ba6bf40739.rlib --extern ring=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libring-777c51b0c3d288c0.rlib --extern rmpv=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/librmpv-fdaa6bd5760011c9.rlib --extern rocket=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/librocket-4ee1c8e28bd14c24.rlib --extern rocket_ws=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/librocket_ws-2f177dcbabdf5516.rlib --extern rpassword=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/librpassword-c8818deb34b76ea8.rlib --extern semver=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libsemver-60d3d6cb5a7e3505.rlib --extern serde=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libserde-9cc93a7189128f25.rlib --extern serde_json=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libserde_json-86d2120e19427cc1.rlib --extern syslog=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libsyslog-5c152346c1ce8d61.rlib --extern time=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libtime-56441102515dd14f.rlib --extern tokio=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libtokio-5dfae61c8444c1a0.rlib --extern tokio_tungstenite=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libtokio_tungstenite-e58dabcf1dceb921.rlib --extern totp_lite=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libtotp_lite-2df5b536107c4b4f.rlib --extern tracing=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libtracing-ceeaccc151504516.rlib --extern url=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/liburl-16c84f7f7f3fb00c.rlib --extern uuid=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libuuid-73a3a4756b8fa3b6.rlib --extern webauthn_rs=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libwebauthn_rs-df1e3b8e026399eb.rlib --extern which=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libwhich-19dbf4ddca88d868.rlib --extern yubico=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/deps/libyubico-72696a30719f44e6.rlib -L native=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/build/libsqlite3-sys-23d986867762c654/out -L native=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/build/ring-74ae6fe0630f3cbf/out -L native=/tmp/DietPi-Software/vaultwarden-1.30.3/target/release/build/psm-af7ee5a81c4a19fd/out --cfg sqlite` (signal: 6, SIGABRT: process abort signal)

I then tested it natively on an Odroid XU4, which fails with:

LLVM ERROR: out of memory
Allocation failed
error: could not compile `vaultwarden` (bin "vaultwarden")

Caused by:
  process didn't exit successfully: `/root/.rustup/toolchains/1.75.0-armv7-unknown-linux-gnueabihf/bin/rustc --crate-name vaultwarden --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C lto=fat -C codegen-units=1 --cfg 'feature="libsqlite3-sys"' --cfg 'feature="sqlite"' -C metadata=1d58e77e878ebff3 -C extra-filename=-1d58e77e878ebff3 --out-dir /root/vaultwarden-1.30.3/target/release/deps -C strip=debuginfo -L dependency=/root/vaultwarden-1.30.3/target/release/deps --extern argon2=/root/vaultwarden-1.30.3/target/release/deps/libargon2-b3c0e2fb66f02f2d.rlib --extern bigdecimal=/root/vaultwarden-1.30.3/target/release/deps/libbigdecimal-5c86038a6cda2357.rlib --extern bytes=/root/vaultwarden-1.30.3/target/release/deps/libbytes-96d5c91a8970d3e0.rlib --extern cached=/root/vaultwarden-1.30.3/target/release/deps/libcached-821a263ab8ba999e.rlib --extern chrono=/root/vaultwarden-1.30.3/target/release/deps/libchrono-15dfa97106ea9ebc.rlib --extern chrono_tz=/root/vaultwarden-1.30.3/target/release/deps/libchrono_tz-78b03a250f1270bd.rlib --extern cookie=/root/vaultwarden-1.30.3/target/release/deps/libcookie-043627deeb63f29d.rlib --extern cookie_store=/root/vaultwarden-1.30.3/target/release/deps/libcookie_store-669046e1dd142dcd.rlib --extern dashmap=/root/vaultwarden-1.30.3/target/release/deps/libdashmap-4e900a06ccdaf881.rlib --extern data_encoding=/root/vaultwarden-1.30.3/target/release/deps/libdata_encoding-2d1c20ef5ab53b9f.rlib --extern data_url=/root/vaultwarden-1.30.3/target/release/deps/libdata_url-b52f35a304c403e4.rlib --extern diesel=/root/vaultwarden-1.30.3/target/release/deps/libdiesel-ca0c77a5a320c634.rlib --extern diesel_migrations=/root/vaultwarden-1.30.3/target/release/deps/libdiesel_migrations-16e3489a14ca359f.rlib --extern dotenvy=/root/vaultwarden-1.30.3/target/release/deps/libdotenvy-0639e1593ada03c1.rlib --extern email_address=/root/vaultwarden-1.30.3/target/release/deps/libemail_address-ef15593c949ba84d.rlib --extern fern=/root/vaultwarden-1.30.3/target/release/deps/libfern-a911570f27da85c4.rlib --extern futures=/root/vaultwarden-1.30.3/target/release/deps/libfutures-e89e3ce9357840c0.rlib --extern governor=/root/vaultwarden-1.30.3/target/release/deps/libgovernor-4f4ba3b235183b38.rlib --extern handlebars=/root/vaultwarden-1.30.3/target/release/deps/libhandlebars-c7fd7cd5e66214d0.rlib --extern html5gum=/root/vaultwarden-1.30.3/target/release/deps/libhtml5gum-f30a2504d52ce515.rlib --extern job_scheduler_ng=/root/vaultwarden-1.30.3/target/release/deps/libjob_scheduler_ng-9ad45ce09bc25744.rlib --extern jsonwebtoken=/root/vaultwarden-1.30.3/target/release/deps/libjsonwebtoken-450434a1f9394291.rlib --extern lettre=/root/vaultwarden-1.30.3/target/release/deps/liblettre-452bf5ebf537ccf1.rlib --extern libsqlite3_sys=/root/vaultwarden-1.30.3/target/release/deps/liblibsqlite3_sys-31047204d554a120.rlib --extern log=/root/vaultwarden-1.30.3/target/release/deps/liblog-546ced36359b5ce9.rlib --extern num_derive=/root/vaultwarden-1.30.3/target/release/deps/libnum_derive-5e15818039b17888.so --extern num_traits=/root/vaultwarden-1.30.3/target/release/deps/libnum_traits-92bc955e64bc391c.rlib --extern once_cell=/root/vaultwarden-1.30.3/target/release/deps/libonce_cell-49d951a6756b3fe0.rlib --extern openssl=/root/vaultwarden-1.30.3/target/release/deps/libopenssl-f81ff30d71ee3f77.rlib --extern paste=/root/vaultwarden-1.30.3/target/release/deps/libpaste-a79e4674c1034386.so --extern percent_encoding=/root/vaultwarden-1.30.3/target/release/deps/libpercent_encoding-39b0e9fee14dda0f.rlib --extern pico_args=/root/vaultwarden-1.30.3/target/release/deps/libpico_args-1ee48336305d6717.rlib --extern rand=/root/vaultwarden-1.30.3/target/release/deps/librand-f0cff6ea6faef3d8.rlib --extern regex=/root/vaultwarden-1.30.3/target/release/deps/libregex-527e30f8b0362c21.rlib --extern reqwest=/root/vaultwarden-1.30.3/target/release/deps/libreqwest-ea7db8ba6bf40739.rlib --extern ring=/root/vaultwarden-1.30.3/target/release/deps/libring-777c51b0c3d288c0.rlib --extern rmpv=/root/vaultwarden-1.30.3/target/release/deps/librmpv-fdaa6bd5760011c9.rlib --extern rocket=/root/vaultwarden-1.30.3/target/release/deps/librocket-4ee1c8e28bd14c24.rlib --extern rocket_ws=/root/vaultwarden-1.30.3/target/release/deps/librocket_ws-2f177dcbabdf5516.rlib --extern rpassword=/root/vaultwarden-1.30.3/target/release/deps/librpassword-c8818deb34b76ea8.rlib --extern semver=/root/vaultwarden-1.30.3/target/release/deps/libsemver-60d3d6cb5a7e3505.rlib --extern serde=/root/vaultwarden-1.30.3/target/release/deps/libserde-9cc93a7189128f25.rlib --extern serde_json=/root/vaultwarden-1.30.3/target/release/deps/libserde_json-86d2120e19427cc1.rlib --extern syslog=/root/vaultwarden-1.30.3/target/release/deps/libsyslog-5c152346c1ce8d61.rlib --extern time=/root/vaultwarden-1.30.3/target/release/deps/libtime-56441102515dd14f.rlib --extern tokio=/root/vaultwarden-1.30.3/target/release/deps/libtokio-5dfae61c8444c1a0.rlib --extern tokio_tungstenite=/root/vaultwarden-1.30.3/target/release/deps/libtokio_tungstenite-e58dabcf1dceb921.rlib --extern totp_lite=/root/vaultwarden-1.30.3/target/release/deps/libtotp_lite-2df5b536107c4b4f.rlib --extern tracing=/root/vaultwarden-1.30.3/target/release/deps/libtracing-ceeaccc151504516.rlib --extern url=/root/vaultwarden-1.30.3/target/release/deps/liburl-16c84f7f7f3fb00c.rlib --extern uuid=/root/vaultwarden-1.30.3/target/release/deps/libuuid-73a3a4756b8fa3b6.rlib --extern webauthn_rs=/root/vaultwarden-1.30.3/target/release/deps/libwebauthn_rs-df1e3b8e026399eb.rlib --extern which=/root/vaultwarden-1.30.3/target/release/deps/libwhich-19dbf4ddca88d868.rlib --extern yubico=/root/vaultwarden-1.30.3/target/release/deps/libyubico-72696a30719f44e6.rlib -L native=/root/vaultwarden-1.30.3/target/release/build/libsqlite3-sys-23d986867762c654/out -L native=/root/vaultwarden-1.30.3/target/release/build/ring-74ae6fe0630f3cbf/out -L native=/root/vaultwarden-1.30.3/target/release/build/psm-af7ee5a81c4a19fd/out --cfg sqlite` (signal: 6, SIGABRT: process abort signal)

I suspect both to be the same underlying issues, but the build inside the container is probably aborted by the host/container engine.

The same works well with x86_64 and aarch64 builds, natively and within the same QEMU container setup.

I monitored some stats during the build on the Odroid XU4:

6832 | CPU: 19.2% 56°C | RAM: 99% | Swap: 13%
6833 | CPU: 18.0% 56°C | RAM: 99% | Swap: 13%
6834 | CPU: 14.8% 56°C | RAM: 99% | Swap: 13%
6835 | CPU: 17.5% 57°C | RAM: 99% | Swap: 13%
6836 | CPU: 14.7% 55°C | RAM: 74% | Swap: 8%
6837 | CPU: 16.3% 55°C | RAM: 33% | Swap: 1%
6838 | CPU: 2.9% 55°C | RAM: 7% | Swap: 1%
6839 | CPU: 4.6% 54°C | RAM: 7% | Swap: 0%
6840 | CPU: 8.7% 54°C | RAM: 6% | Swap: 0%
6841 | CPU: 6.4% 54°C | RAM: 6% | Swap: 0%

These are the seconds around the failure. RAM size is 2 GiB, and I created an 8 GiB swap space. The last build step of the vaultwarden crate/binary utilises a single CPU core (the XU4 has 8 cores, so 1 core maxed is 12.5% CPU usage, the way I obtained it above) with a single process, and RAM + swap usage seems to crack the 3 GiB limit for a single process, which would explain the issue. LPAE allows an overall larger memory size/usage, but the utilisation for a single process is still limited.

I verified this again by monitoring the process resident and virtual memory usage in htop:
image

One rustc process during the last build step, with 4 threads. And the build fails when the virtual memory usage crosses 3052 MiB, i.e. quite precisely the 32-bit per-process memory limit.

Since we have successful builds/packages with vaultwarden v1.30.1, I tried building v1.30.2, which expectedly fails the same way, as it differs with just 2 tiny surely unrelated commits from v1.30.3. v1.30.1 still builds fine, so the culprit is to be found between v1.30.1 and v1.30.2, probably just dependency crates which raised in size.

Not sure whether there is an easy solution/workaround. Of course we could try cross-compiling, but I actually would like to avoid that, as it is difficult to assure that correctly shared libraries are linked, especially when building for ARMv6 Raspbian systems.

Deployment environment

  • Install method: source build

  • Clients used:

  • Reverse proxy and version:

  • MySQL/MariaDB or PostgreSQL version:

  • Other relevant details:

Steps to reproduce

On Debian (any version):

apt install gcc libc6-dev pkg-config libssl-dev git
curl -sSfo rustup-init.sh 'https://sh.rustup.rs'
chmod +x rustup-init.sh
# ARMv7: Workaround for failing crates index update in emulated 32-bit ARM environments: https://github.com/rust-lang/cargo/issues/8719#issuecomment-1516492970
# ARMv8: Workaround for increased cargo fetch RAM usage: https://github.com/rust-lang/cargo/issues/10583
export CARGO_REGISTRIES_CRATES_IO_PROTOCOL='sparse' CARGO_NET_GIT_FETCH_WITH_CLI='true'
./rustup-init.sh -y --profile minimal --default-toolchain none
rm rustup-init.sh
export PATH="$HOME/.cargo/bin:$PATH"
curl -fLO 'https://github.com/dani-garcia/vaultwarden/archive/1.30.3.tar.gz'
tar xf '1.30.3.tar.gz'
rm '1.30.3.tar.gz'
cd 'vaultwarden-1.30.3'
cargo build --features sqlite --release

Expected behaviour

Actual behaviour

Troubleshooting data

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 5, 2024

Just a quick note. What do you expect us to do on how Rust builds the binaries? Or on how library crates are build. Must be something in there if it suddenly happend.

Also, all the building of the binaries do work on GitHub at least.
So it's not broken perse I think.

@dani-garcia
Copy link
Owner

If the failure point is at the linking step, maybe disabling LTO can help? Does compiling in debug mode finish correctly at least?

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 5, 2024

Also, note #4308, which isn't in 1.30.3.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 5, 2024

What happens if you use main?

@MichaIng
Copy link
Author

MichaIng commented Feb 5, 2024

What do you expect us to do on how Rust builds the binaries?

I asked this myself. At least I wanted to make you aware of the issue, as this should affect others as well. And probably someone with more Rust build/cargo knowledge has an idea how to work around the issue.

Also, all the building of the binaries do work on GitHub at least.

I see you use Docker buildx. I guess it does something differently. Would be of course good to have someone replicating this with cargo directly, so we know it is not just me, or the Debian systems we use for building.

If the failure point is at the linking step, maybe disabling LTO can help? Does compiling in debug mode finish correctly at least?

I am not 100% sure whether it is the linking step, but the rustc --crate-name vaultwarden process at least, i.e. all dependencies have been compiled already.

LTO is disabled when e.g. using --profile dev, right? I'll redo a build, adding also -v, and see what happens.

What happens if you use main?

Will try it as well.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 5, 2024

What do you expect us to do on how Rust builds the binaries?

I asked this myself. At least I wanted to make you aware of the issue, as this should affect others as well. And probably someone with more Rust build/cargo knowledge has an idea how to work around the issue.

It's always good to point this out of course. I just wondered since not much changed to Vaultwarden it self except the crates.

Also, all the building of the binaries do work on GitHub at least.

I see you use Docker buildx. I guess it does something differently. Would be of course good to have someone replicating this with cargo directly, so we know it is not just me, or the Debian systems we use for building.

buildx is not that different from just any other container or basic system. So that shouldn't affect the building of course. It's probably more the libraries or versions of the tools which could make a difference per platform. Might even be a recent openssl update that might causes issues right now, since there were some deb updates the past few days.

If the failure point is at the linking step, maybe disabling LTO can help? Does compiling in debug mode finish correctly at least?

I am not 100% sure whether it is the linking step, but the rustc --crate-name vaultwarden process at least, i.e. all dependencies have been compiled already.

LTO is disabled when e.g. using --profile dev, right? I'll redo a build, adding also -v, and see what happens.

If should indeed.

What happens if you use main?

Will try it as well.

👍🏻

@FlakyPi
Copy link

FlakyPi commented Feb 6, 2024

The same happened to me. Apart from the problem with Handlebars, which I fixed using 5.1.0 version, I could compile Vaultwarden with --profile release-micro.

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

Might even be a recent openssl update that might causes issues right now, since there were some deb updates the past few days.

Although, it affects Debian Bullseye, Bookworm and Trixie all together, Bullseye with LibSSL1.1 and the others with LibSSL3, and quite different toolchain (C) versions as well. Rust itself is installed via rustup, instead of Debian packages. And vaultwarden v1.30.1 still builds fine, so to me it looks more like crate dependency versions which make the difference. And this is of course nasty to track down.

I should have tested main directly. I saw #4308 but thought would fail right at the start with dependency conflicts, if this was the issue, like happened at the issue linked to the PR. But it indirectly could fix the raised memory usage as well, of course.

... okay the --profile dev -v build failed as well. This time I see the LLVM memory allocation error as well in the container build on GitHub:

$ cargo build --features sqlite --profile dev -v
...
     Running `/root/.rustup/toolchains/1.75.0-armv7-unknown-linux-gnueabihf/bin/rustc --crate-name vaultwarden --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C split-debuginfo=unpacked --cfg 'feature="libsqlite3-sys"' --cfg 'feature="sqlite"' -C metadata=fb3c9153b48300ba -C extra-filename=-fb3c9153b48300ba --out-dir /tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps -C incremental=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/incremental -L dependency=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps --extern argon2=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libargon2-eaf0ef43f3639341.rlib --extern bigdecimal=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libbigdecimal-d30d51c91093e1ec.rlib --extern bytes=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libbytes-c66db0f2d7c27463.rlib --extern cached=/tmp/DietPi-Software/vau
LLVM ERROR: out of memory
Allocation failed
qemu: uncaught target signal 6 (Aborted) - core dumped
error: could not compile `vaultwarden` (bin "vaultwarden")
Caused by:
  process didn't exit successfully: `/root/.rustup/toolchains/1.75.0-armv7-unknown-linux-gnueabihf/bin/rustc --crate-name vaultwarden --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C split-debuginfo=unpacked --cfg 'feature="libsqlite3-sys"' --cfg 'feature="sqlite"' -C metadata=fb3c9153b48300ba -C extra-filename=-fb3c9153b48300ba --out-dir /tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps -C incremental=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/incremental -L dependency=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps --extern argon2=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libargon2-eaf0ef43f3639341.rlib --extern bigdecimal=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libbigdecimal-d30d51c91093e1ec.rlib --extern bytes=/tmp/DietPi-Software/vaultwarden-1.30.3/target/debug/deps/libbytes-c66db0f2d7c27463.rlib --extern cached=/

Is there a way to verify that LTO is disabled? I see the -C lto=fat option is now missing, but might there be a (faster) default, which needs to be disabled explicitly? The docs do not give a hint: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
But I'll try to set it to off explicitly in another build.

Building with main branch (but release target) fails the same way.

The same happened to me. Apart from the problem with Handlebars, which I fixed using 5.1.0 version, I could compile Vaultwarden with --profile release-micro.

That is interesting. I saw that this new profile was recently added and thought trying it, though do not expect it to produce much smaller binaries in our case, since we run strip on the resulting binary anyway. Interesting that this solves it for you (I started a build just now), as it does even heavier optimisations, isn't it? Probably what helps is that the dependencies are size-optimised as well, taking less memory when they are finally assembled into the vaultwarden binary?

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

Could you provide a bit more details on the hosts you build it on?
So which distro/version, how much memory, and how many cpu cores? You mentioned bookworm, bullseye and trixy. So if you can provide the details of them, i might be able to reproduce the setup.

And, you also mentioned qemu and different architectures.
If you could describe these setups that would be nice 🙂

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

I also wonder what happens if you try an cargo update That will update all crates to the latest available working version.
I tested it locally and that should works fine.

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

I use two different hosts:

On the Odroid XU4, I am using this image. It has some scripts for system setup purpose, but is at its core a minimal Debian Bookworm armhf image. On GitHub, I use the container images form here, which have the same userland setup, and boot them from the Ubuntu GitHub runner with systemd-container+qemu-user-static+binfmt-support via systemd-nspawn -bD path/to/loop/mount, then trigger the vaultwarden build within this container via systemd unit or autologin script. As we support Debian oldstable, stable and testing, we do builds for all these Debian version, hence Debian Bullseye/oldstable from 2021, Debian Bookworm/stable from 2023 and Debian Trixie/testing, which will be released in 2025. The container images (as well as all other images) are initially generated via deboostrap, hence the tool used by Debian itself for their images as well. Ah, and the same issue happens on the ARMv6 container images, which are not based on Debian, but using the Raspbian repository instead, moreless a Debian clone for the armv6hf architecture used by the first Raspberry Pi models, which is not supported by Debian. But otherwise the setup is identical, and QEMU has not ARMv6hf emulator, hence boots then with ARMv7 emulation as well.

I can actually try to replicate it on any other Debian-based live image via GitHub Actions, or even Ubuntu (which is 99% the same in relevant regards). But Debian does not seem to offer them for ARM: https://www.debian.org/CD/live/
The regular (installer) images required interactive input, hence are not suitable for GitHub Actions. Another approach is to use a Debian-slim Docker armv7 image based container, not using buildx but do rustup install and cargo build "manually" via Dockerfile. I am just not 100% sure yet how to invoke QEMU there, so that this can run on a x86_64 host.

I will try to do a build with proceeding cargo update. The release-micro target btw also works here. I tested it only on GitHub so far, will do the same on the XU4 and monitor/compare memory usage.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

If you want docker to be able to run armxxx images locally you need binfmt support on your host.
It is all explained here: https://github.com/dani-garcia/vaultwarden/blob/main/docker/README.md

We use that same principle to create the final containers per architecture. We just pull in the armv6 or armv7 or aarch64 container and run apt-get update etc.. like it is a armxxx system.

Technically you can do the same with a docker image.
As an example here below, i have binfmt installed so this works just fine for me.

docker run --rm -it -e QEMU_CPU=arm1176 --platform=linux/arm/v5 debian:bookworm-slim bash

root@bc44eb3f2c25:/# uname -a
Linux bc44eb3f2c25 6.7.3-zen1-2-zen #1 ZEN SMP PREEMPT_DYNAMIC Fri, 02 Feb 2024 17:03:56 +0000 armv6l GNU/Linux
root@bc44eb3f2c25:/#

That will use QEMU emulation for all binaries within that container.
So you can install rust use apt etc.. as-if it is that architecture.

The same happens on GitHub for us in the workflows.

- name: Initialize QEMU binfmt support
uses: docker/setup-qemu-action@68827325e0b33c7199eb31dd4e31fbe9023e06e3 # v3.0.0
with:
platforms: "arm64,arm"

There we load binfmt support so we can use the same way to just run that architecture.

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

Okay great. Installing the binfmt-support package on Ubuntu host, like we do in our workflows, should then work as well, instead of docker/setup-qemu-action. Your command seems to go one step further, emulating a particular CPU instead of just user mode emulation, like qemu-system vs qemu-user-static. However, this should not make a difference.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

Looking at your odroid, it should be QEMU_CPU=cortex-a7 now i think of it :).

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

and probably also linux/arm/v7

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

Yes, but it does not matter, as it fails on all 32-bit ARM systems the same way (when using Debian and the same set of commands).

@FlakyPi
Copy link

FlakyPi commented Feb 6, 2024

Have you tried the cross-compiler arm-linux-gnueabihf on a amd64 machine to build the armv7 binary (armhf on Debian)? It should be a lot faster than qemu and not have any memory limitations.

For the Raspberry Pi armv6 I haven't found a better solution than the qemu-builder for the moment, and I think sooner or latter it will be impossible to build Vaultwarden on 32 bits machines, as the crates will keep growing and growing.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

That is how we do it. We cross-compile for the target architecture.
And only the final image is handled via qemu.

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

First of all, cargo update caused the handlebars error with 1.30.3, but worked with main, as expected. However, doing this with main does not solve the build issue with release target.

Cross-compiling is of course an option. But as said, to rule out surprises and assure linked and available shared library do 100% match, also on Raspbian systems, I prefer to do builds within the target userland. But indeed, as long as there is no way to somehow reduce rustc memory consumption without changing optimisation options, earlier or later there won't be another way, I'm afraid.

Currently running the test with the Debian Bookworm Docker container:

apt update
apt install qemu-user-static binfmt-support
cat << '_EOF_' > vaultwarden.sh
#!/usr/bin/env sh
set -e
apt-get update
apt-get -y install curl gcc libc6-dev pkg-config libssl-dev git
curl -sSfo rustup-init.sh 'https://sh.rustup.rs'
chmod +x rustup-init.sh
# ARMv7: Workaround for failing crates index update in emulated 32-bit ARM environments: https://github.com/rust-lang/cargo/issues/8719#issuecomment-1516492970
# ARMv8: Workaround for increased cargo fetch RAM usage: https://github.com/rust-lang/cargo/issues/10583
export CARGO_REGISTRIES_CRATES_IO_PROTOCOL='sparse' CARGO_NET_GIT_FETCH_WITH_CLI='true'
./rustup-init.sh -y --profile minimal --default-toolchain none
rm rustup-init.sh
export PATH="$HOME/.cargo/bin:$PATH"
curl -fLO 'https://github.com/dani-garcia/vaultwarden/archive/1.30.3.tar.gz'
tar xf '1.30.3.tar.gz'
rm '1.30.3.tar.gz'
cd 'vaultwarden-1.30.3'
cargo build --features sqlite --release
_EOF_
docker run --platform=linux/arm/v7 debian:bookworm-slim sh -c "$(<vaultwarden.sh)"

... I did this within a VirtualBox VM, running Debian Bookworm. I should have enabled nested virtualization first (requires disabled core isolation > memory integrity security feature on Windows 11) to speed things up, it is running, but quite slowly ...
I forgot to add a workaround for rust-lang/cargo#8719. But aside of some thrown warnings that some temporary files could not be removed, the crates index update ran through, so I hope this does not lead to an abortion later, before the build step of interest.


Btw, the release-micro profile roughly halved the binary size here, so quite a significant difference to removing symbols afterwards via strip only. @FlakyPi you did not do performance comparisons, did you?

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

@MichaIng we do cross-compiling too. Building via qemu takes a very long time. While not measured, it was certainly more then double the time.

Also, sometimes qemu can cause strange issues which are hard to debug. But moste or the time it works, but much slower.

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

Sure emulation is slow, especially without nested virtualization support. It usually does not play a role when things run on GitHub. And yes, there are SO many issues I needed to work around, in build scripts, in testing scripts, etc etc. You see already cargo issue I linked, and the wrapping container setup scripts for testing and builds are full of workarounds like that, including many for individual software titles. However, since we do test our software install options as well on GitHub, there is no way around fiddling with QEMU emulation, as long as there are no native ARM runners available, or we start financing a battery of SBCs as self-hosted runners. For true ARMv6 tests, it is even harder since there are no fast ARMv6-only SBCs, and testing on real RPi 1 or Zero is slower than emulating on a GitHub runner 😄. And ARMv6 + Raspbian tests are pretty important, since binaries provided by software developers often suddenly loose ARMv6 compatibility, or it is intentionally dropped from release assets. But we still have about 9% ARMv6-only RPi model systems among our user base 🙈.

So applying workarounds for known issues with QEMU issues to the build scripts/workflows as well does not really increase the trouble, while cross-compiling by times adds trouble. Probably not with vaultwarden, but in other cases we had issues with mismatching shared libraries, and the container setup code is shared between all build scripts.

... btw, surprising situation with the Docker build:
image
Little small to see, however, from the host end, the rustc process takes already 4.5 GiB virtual memory. But this could be due to QEMU overhead. Build is still running ...

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

And there it failed the same way within Docker container:

fatal runtime error: Rust cannot catch foreign exceptions
error: could not compile `vaultwarden` (bin "vaultwarden")

Caused by:
  process didn't exit successfully: `/root/.rustup/toolchains/1.75.0-armv7-unknown-linux-gnueabihf/bin/rustc --crate-name vaultwarden --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C lto=fat -C codegen-units=1 --cfg 'feature="libsqlite3-sys"' --cfg 'feature="sqlite"' -C metadata=1d58e77e878ebff3 -C extra-filename=-1d58e77e878ebff3 --out-dir /vaultwarden-1.30.3/target/release/deps -C strip=debuginfo -L dependency=/vaultwarden-1.30.3/target/release/deps --extern argon2=/vaultwarden-1.30.3/target/release/deps/libargon2-b3c0e2fb66f02f2d.rlib --extern bigdecimal=/vaultwarden-1.30.3/target/release/deps/libbigdecimal-5c86038a6cda2357.rlib --extern bytes=/vaultwarden-1.30.3/target/release/deps/libbytes-96d5c91a8970d3e0.rlib --extern cached=/vaultwarden-1.30.3/target/release/deps/libcached-821a263ab8ba999e.rlib --extern chrono=/vaultwarden-1.30.3/target/release/deps/libchrono-15dfa97106ea9ebc.rlib --extern chrono_tz=/vaultwarden-1.30.3/target/release/deps/libchrono_tz-78b03a250f1270bd.rlib --extern cookie=/vaultwarden-1.30.3/target/release/deps/libcookie-043627deeb63f29d.rlib --extern cookie_store=/vaultwarden-1.30.3/target/release/deps/libcookie_store-669046e1dd142dcd.rlib --extern dashmap=/vaultwarden-1.30.3/target/release/deps/libdashmap-4e900a06ccdaf881.rlib --extern data_encoding=/vaultwarden-1.30.3/target/release/deps/libdata_encoding-2d1c20ef5ab53b9f.rlib --extern data_url=/vaultwarden-1.30.3/target/release/deps/libdata_url-b52f35a304c403e4.rlib --extern diesel=/vaultwarden-1.30.3/target/release/deps/libdiesel-ca0c77a5a320c634.rlib --extern diesel_migrations=/vaultwarden-1.30.3/target/release/deps/libdiesel_migrations-16e3489a14ca359f.rlib --extern dotenvy=/vaultwarden-1.30.3/target/release/deps/libdotenvy-0639e1593ada03c1.rlib --extern email_address=/vaultwarden-1.30.3/target/release/deps/libemail_address-ef15593c949ba84d.rlib --extern fern=/vaultwarden-1.30.3/target/release/deps/libfern-a911570f27da85c4.rlib --extern futures=/vaultwarden-1.30.3/target/release/deps/libfutures-e89e3ce9357840c0.rlib --extern governor=/vaultwarden-1.30.3/target/release/deps/libgovernor-4f4ba3b235183b38.rlib --extern handlebars=/vaultwarden-1.30.3/target/release/deps/libhandlebars-c7fd7cd5e66214d0.rlib --extern html5gum=/vaultwarden-1.30.3/target/release/deps/libhtml5gum-f30a2504d52ce515.rlib --extern job_scheduler_ng=/vaultwarden-1.30.3/target/release/deps/libjob_scheduler_ng-9ad45ce09bc25744.rlib --extern jsonwebtoken=/vaultwarden-1.30.3/target/release/deps/libjsonwebtoken-450434a1f9394291.rlib --extern lettre=/vaultwarden-1.30.3/target/release/deps/liblettre-452bf5ebf537ccf1.rlib --extern libsqlite3_sys=/vaultwarden-1.30.3/target/release/deps/liblibsqlite3_sys-31047204d554a120.rlib --extern log=/vaultwarden-1.30.3/target/release/deps/liblog-546ced36359b5ce9.rlib --extern num_derive=/vaultwarden-1.30.3/target/release/deps/libnum_derive-5e15818039b17888.so --extern num_traits=/vaultwarden-1.30.3/target/release/deps/libnum_traits-92bc955e64bc391c.rlib --extern once_cell=/vaultwarden-1.30.3/target/release/deps/libonce_cell-49d951a6756b3fe0.rlib --extern openssl=/vaultwarden-1.30.3/target/release/deps/libopenssl-f81ff30d71ee3f77.rlib --extern paste=/vaultwarden-1.30.3/target/release/deps/libpaste-a79e4674c1034386.so --extern percent_encoding=/vaultwarden-1.30.3/target/release/deps/libpercent_encoding-39b0e9fee14dda0f.rlib --extern pico_args=/vaultwarden-1.30.3/target/release/deps/libpico_args-1ee48336305d6717.rlib --extern rand=/vaultwarden-1.30.3/target/release/deps/librand-f0cff6ea6faef3d8.rlib --extern regex=/vaultwarden-1.30.3/target/release/deps/libregex-527e30f8b0362c21.rlib --extern reqwest=/vaultwarden-1.30.3/target/release/deps/libreqwest-ea7db8ba6bf40739.rlib --extern ring=/vaultwarden-1.30.3/target/release/deps/libring-777c51b0c3d288c0.rlib --extern rmpv=/vaultwarden-1.30.3/target/release/deps/librmpv-fdaa6bd5760011c9.rlib --extern rocket=/vaultwarden-1.30.3/target/release/deps/librocket-4ee1c8e28bd14c24.rlib --extern rocket_ws=/vaultwarden-1.30.3/target/release/deps/librocket_ws-2f177dcbabdf5516.rlib --extern rpassword=/vaultwarden-1.30.3/target/release/deps/librpassword-c8818deb34b76ea8.rlib --extern semver=/vaultwarden-1.30.3/target/release/deps/libsemver-60d3d6cb5a7e3505.rlib --extern serde=/vaultwarden-1.30.3/target/release/deps/libserde-9cc93a7189128f25.rlib --extern serde_json=/vaultwarden-1.30.3/target/release/deps/libserde_json-86d2120e19427cc1.rlib --extern syslog=/vaultwarden-1.30.3/target/release/deps/libsyslog-5c152346c1ce8d61.rlib --extern time=/vaultwarden-1.30.3/target/release/deps/libtime-56441102515dd14f.rlib --extern tokio=/vaultwarden-1.30.3/target/release/deps/libtokio-5dfae61c8444c1a0.rlib --extern tokio_tungstenite=/vaultwarden-1.30.3/target/release/deps/libtokio_tungstenite-e58dabcf1dceb921.rlib --extern totp_lite=/vaultwarden-1.30.3/target/release/deps/libtotp_lite-2df5b536107c4b4f.rlib --extern tracing=/vaultwarden-1.30.3/target/release/deps/libtracing-ceeaccc151504516.rlib --extern url=/vaultwarden-1.30.3/target/release/deps/liburl-16c84f7f7f3fb00c.rlib --extern uuid=/vaultwarden-1.30.3/target/release/deps/libuuid-73a3a4756b8fa3b6.rlib --extern webauthn_rs=/vaultwarden-1.30.3/target/release/deps/libwebauthn_rs-df1e3b8e026399eb.rlib --extern which=/vaultwarden-1.30.3/target/release/deps/libwhich-19dbf4ddca88d868.rlib --extern yubico=/vaultwarden-1.30.3/target/release/deps/libyubico-72696a30719f44e6.rlib -L native=/vaultwarden-1.30.3/target/release/build/libsqlite3-sys-23d986867762c654/out -L native=/vaultwarden-1.30.3/target/release/build/ring-74ae6fe0630f3cbf/out -L native=/vaultwarden-1.30.3/target/release/build/psm-af7ee5a81c4a19fd/out --cfg sqlite` (signal: 6, SIGABRT: process abort signal)

If buildx works for compiling the binaries, then it does something different. But if I understand it correctly, you do not build vaultwarden in an emulated container, but via cross-compiling, and only the final Docker images (logically) via emulation? That would of course explain why your builds are not affected by the 3 GiB limit.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 6, 2024

But, what is the reason for not cross-compiling?

@MichaIng
Copy link
Author

MichaIng commented Feb 6, 2024

Quoting myself:

while cross-compiling by times adds trouble. Probably not with vaultwarden, but in other cases we had issues with mismatching shared libraries, and the container setup code is shared between all build scripts.

Basically to assure that the userland on the build system, hence the linked libraries on the build host, do exactly match the one on the target system. And depending on the toolchain, it is also much easier to setup, compared to installing cross-compiler and multiarch libraries, assuring it is used throughout the toolchain. E.g. Python builds with Rust code have an issue of loosing architecture information along the way. 32-bit ARM wheels compiled on 32-bit userland/OS with 64-bit kernel (default since Raspberry Pi 4 and 5, even on 32-bit userland/OS) are strangely marked as aarch64, while they (of course) are in fact 32-bit wheels running on 32-bit ARM systems.

@FlakyPi
Copy link

FlakyPi commented Feb 7, 2024

Btw, the release-micro profile roughly halved the binary size here, so quite a significant difference to removing symbols afterwards via strip only. @FlakyPi you did not do performance comparisons, did you?

Nothing very thorough. In my old RPi 1 everything is as slow as geology, so i didn't notice a very significant drop in performance.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 7, 2024

Another item. Why not use the per-compiled MUSL binaries. Those are distro indipendant.

@MichaIng
Copy link
Author

MichaIng commented Feb 7, 2024

Where do you provide those? Or do you mean to extract them from Docker images? However, as we have own build scripts and GitHub workflows already, it feels better to also use them, and control the builds, in case flags, profiles etc. And I guess you e.g. do not provide RISC-V and armv6l binaries?
EDIT: I see linux/arm/v6 containers are there 👍.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 7, 2024

@MichaIng could you try the following please?

Replace the following part in Cargo.toml:

[profile.release]
strip = "debuginfo"
lto = "fat"
codegen-units = 1

With:

[profile.release]
strip = "debuginfo"
lto = "thin"

And test again?

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 7, 2024

Actually, this might be better for your use case, run this before you run the cargo build

export CARGO_PROFILE_RELEASE_LTO=thin CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16 CARGO_PROFILE_RELEASE_STRIP=symbols

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 7, 2024

I Actually think that my previous post will help you.

I was looking at the diff between 1.30.1 and 1.30.2, checking some of the crates we updated if there was something there regarding similar issues, until i found the changes we did to the release profile.

The main benefit will be the CARGO_PROFILE_RELEASE_LTO env, since that is what probably is eating your memory, since you mention it is in the latest step of the build process.

I also set the CARGO_PROFILE_RELEASE_CODEGEN_UNITS to it's default 16, but you could set that to 8 or 6. The results will probably differ depending on which system you run it on, but on 1.30.1 it was 16.

I also added the CARGO_PROFILE_RELEASE_STRIP=symbols there, since i saw you mentioning to run strip on the resulting binary, this would prevent that step from being needed at all.

I tested this my self on my system via a docker container. And it looked like it didn't came above 4GiB.

@FlakyPi
Copy link

FlakyPi commented Feb 7, 2024

Actually, this might be better for your use case, run this before you run the cargo build

export CARGO_PROFILE_RELEASE_LTO=thin CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16 CARGO_PROFILE_RELEASE_STRIP=symbols

That works for me.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 8, 2024

I also added a new release profile release-low to this PR: #4328

That might be useful too once merged.

@FlakyPi
Copy link

FlakyPi commented Feb 9, 2024

Thank you for that, it's going to be very useful for me.

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 9, 2024

@FlakyPi can you verify if this works?

export CARGO_PROFILE_RELEASE_LTO=thin CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 CARGO_PROFILE_RELEASE_STRIP=symbols

If so, then we can close this issue since the PR for using --profile release-low will be in the next version, and already is in main.

@FlakyPi
Copy link

FlakyPi commented Feb 9, 2024

It crashes with CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1

It worked with CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16

@BlackDex
Copy link
Collaborator

BlackDex commented Feb 9, 2024

Hmm then ill have to change the profile.

BlackDex added a commit to BlackDex/vaultwarden that referenced this issue Feb 9, 2024
It seems (as disscusses here dani-garcia#4320) a single codegen unit makes it still
crash. This sets it to the default 16 Rust uses for the release profile.
dani-garcia pushed a commit that referenced this issue Feb 10, 2024
It seems (as disscusses here #4320) a single codegen unit makes it still
crash. This sets it to the default 16 Rust uses for the release profile.
@BlackDex
Copy link
Collaborator

Ok that is merged. Since that seems to solve the issue I'm going to close this one.

If not please reopen.

@MichaIng
Copy link
Author

MichaIng commented Feb 11, 2024

Many thanks guys, and sorry for my late reply. Since >1 codegen units and thin LTO seem to potentially worsen performance, and are both not present in the release-micro profile, I wonder which would perform better, and hence whether the new profile has any benefit over release-micro:

  • release-micro: 1 codegen unit and fat LTO, but opt-level = "z"
  • release-low: 16 codegen units, thin LTO, but opt-level = 3

Although, docs say that thin is achieving similar performance results than fat 🤔: https://doc.rust-lang.org/cargo/reference/profiles.html#lto

I am also confused why more parallelisation (codegen units) uses less memory, while I would usually expect it to consume more memory. Did someone test fat LTO with 16 codegen units? Out of interest, I think I'll try all combinations and see which results in which memory usage, so we get a better idea of the effects of each option.

@BlackDex
Copy link
Collaborator

@MichaIng it's difficult for me too really test it actually.
I do not have the same hardware. I could try mimicking GitHub.

The main thing is, the profile changed since you noticed compiling went wrong. We changed from thin to fat. And the main difference is that fat will try a bit harder to optimize links. While in theory both with one codegen should not make a difference i think fat will still use more resources.

Also, 16 codegen units also release memory when they are done, and it's not one proces, that might help on low-end systems maybe?

I think thin with 16 is the best bet, since that was the previous default.

@MichaIng
Copy link
Author

MichaIng commented Feb 29, 2024

I tested the release target with thin LTO but 1 codegen unit, and it worked. Max memory usage during the build was 2.15 GiB.

Then I tested with fat LTO but 16 codegen units, and it worked as well with max memory usage at 2.11 GiB.

... currently running without both, and afterwards with both, just to have the full picture.

EDIT: Okay, now I am confused, as the build went through without any of both settings changed, using max 2.14 GiB memory, hence even a little lower than with thin LTO. Probably the reason for the high memory usage has been fixed among dependencies, or even Rust toolchain itself 🤔. Currently running the build with thin and 16 codegen units.

EDIT2: thin + 16 codegen units result in 2.08 GiB max memory usage.

@polyzen
Copy link

polyzen commented Mar 7, 2024

On my Odroid XU4, I get:

  • terminate called after throwing an instance of 'std::bad_alloc' with --profile release-low
  • LLVM ERROR: out of memory with --profile release-micro

This is on Arch Linux ARM armv7 with https://gitlab.archlinux.org/archlinux/packaging/packages/vaultwarden/-/blob/cb935a55918ef8cace6455426f9c68b7687dd29d/PKGBUILD, but modified for 1.30.5 and with build() edited with eg.

# Workaround for 32-bit systems
# https://github.com/dani-garcia/vaultwarden/issues/4320
if [[ $(getconf LONG_BIT) -eq 32 ]]; then
  VW_VERSION="$pkgver" cargo build --profile release-low --frozen --features sqlite,mysql,postgresql
else
  VW_VERSION="$pkgver" cargo build --release --frozen --features sqlite,mysql,postgresql
fi

@polyzen
Copy link

polyzen commented Mar 7, 2024

Builds pass with --profile release-low when using just --features sqlite, --features mysql, --features postgresql, --features sqlite,mysql, --features sqlite,postgresql, or --features mysql,postgresql.

@BlackDex
Copy link
Collaborator

BlackDex commented Mar 7, 2024

@polyzen probably because there is a lot of extra code per database feature and it also needs to link with one extra library.

@MichaIng
Copy link
Author

I accidentally built with an older version when doing above tests, which explains why it succeeded with the release profile. I did let my Odroid XU4 run through a bunch of optimisation option combinations with latest vaultwarden 1.30.5, all of them with --features sqlite only:

  • release: LTO=fat, codegen units=1, opt-level=3, strip=debuginfo
    • max memory usage=3104 MiB => LLVM ERROR: out of memory!
  • release-micro: LTO=fat, codegen units=1, opt-level=z, strip=symbols
    • max memory usage=2832 MiB => success
  • release-low: LTO=thin, codegen units=16, opt-level=3, strip=symbols
    • max memory usage=2094 MiB => success
  • release + CARGO_PROFILE_RELEASE_LTO=thin
    • max memory usage=2935 MiB => LLVM ERROR: out of memory!
  • release + CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16
    • max memory usage=2806 MiB => LLVM ERROR: out of memory!
  • release + CARGO_PROFILE_RELEASE_STRIP=symbols
    • max memory usage=3029 MiB => LLVM ERROR: out of memory!
  • release + CARGO_PROFILE_RELEASE_OPT_LEVEL=z
    • max memory usage=3040 MiB => fatal runtime error: Rust cannot catch foreign exceptions!
  • release + CARGO_PROFILE_RELEASE_LTO=thin + CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16
    • max memory usage=2023 MiB => success
  • release + CARGO_PROFILE_RELEASE_OPT_LEVEL=z + CARGO_PROFILE_RELEASE_STRIP=symbols
    • max memory usage=3044 MiB => fatal runtime error: Rust cannot catch foreign exceptions
  • release + CARGO_PROFILE_RELEASE_OPT_LEVEL=z + CARGO_PROFILE_RELEASE_STRIP=symbols + CARGO_PROFILE_RELEASE_PANIC=abort (equals release-micro)
    • max memory usage=2772 MiB => success
  • release + CARGO_PROFILE_RELEASE_LTO=thin + CARGO_PROFILE_RELEASE_STRIP=symbols + CARGO_PROFILE_RELEASE_PANIC=abort
    • max memory usage=2657 MiB => success
  • release + CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16 + CARGO_PROFILE_RELEASE_STRIP=symbols + CARGO_PROFILE_RELEASE_PANIC=abort
    • max memory usage=2727 MiB => LLVM ERROR: out of memory
  • release + CARGO_PROFILE_RELEASE_LTO=thin + CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16 + CARGO_PROFILE_RELEASE_STRIP=symbols + CARGO_PROFILE_RELEASE_PANIC=abort
    • max memory usage=1810 MiB => success
  • release + CARGO_PROFILE_RELEASE_LTO=thin + CARGO_PROFILE_RELEASE_CODEGEN_UNITS=16 + CARGO_PROFILE_RELEASE_OPT_LEVEL=z + CARGO_PROFILE_RELEASE_STRIP=symbols + CARGO_PROFILE_RELEASE_PANIC=abort
    • max memory usage=1527 MiB => success

Max memory was btw obtained in a loop which checked free -tb every 0.5 seconds (the total memory usage, hence RAM + swap). The by times <3 GiB on runs with crash (and some other results) indicates that the memory usage ramps up and drop back quickly at a certain stage, so that the peak cannot be caught well. Hence take the numbers with a grain of salt.

What I take from this, is that the max memory usage with release profile would be significantly above 3 GiB, since a single optimisation change does not prevent the crash, but applying at least two of them suddenly can drop it to 2 GiB. It also shows that the release-micro profile, while it currently works, is quite close to 3 GiB, so it likely won't work forever, and hence the release-low profile has indeed some value. Stripping all symbols (instead of only debuginfo), seems to have only a small effect, unless combined with panic=abort (the symbols enhance the panic stack trace). Since the removal of symbols has no negative effect on performance, a positive effect on size, and a negative effect for debugging only, I will personally prefer this for our builds.

Does someone know whether the panic stack trace gives any meaningful information, when all symbols are removed? Else I suggest to add panic=abort to the release-low profile as well, or remove strip=symbols, which alone has no significant effect on memory usage.

@BlackDex
Copy link
Collaborator

Thanks for all the testing.
Just a quick question, did you do a cargo clean before every test?

@MichaIng
Copy link
Author

Just a quick question, did you do a cargo clean before every test?

Sure 🙂. I guess otherwise the differences were smaller.

@BlackDex
Copy link
Collaborator

Just a quick question, did you do a cargo clean before every test?

Sure 🙂. I guess otherwise the differences were smaller.

Not perse, since you changed building parameters.

@MichaIng
Copy link
Author

Err right, depends on with which flags the dependencies were compiled, respectively as far as we know simply which size they have. However, everything was recompiled on every build. So we have an idea now which flag/option has which effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants