Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELF load command address/offset not properly aligned #492

Open
ilya-lavrenov opened this issue Apr 24, 2023 · 18 comments
Open

ELF load command address/offset not properly aligned #492

ilya-lavrenov opened this issue Apr 24, 2023 · 18 comments
Labels

Comments

@ilya-lavrenov
Copy link

Describe the bug

Once 0.18 release is out, our build process started to fail with ELF load command address/offset not properly aligned. We use patchelf inside to add RPATH on Linux systems.

On CentOS all our libraries loading started to fail with such error, on Ubuntu 18.04 the issue is reproduced not in all cases (we don't have more details). But on Ubuntu 20.04 we have not seen any regressions.

Expected behavior

Works as before

patchelf --version output

0.18

@vsuorant
Copy link

Same here with ManyLinux containers that are based on AlmaLinux. 0.17.2.1 works, but 0.18 causes "ELF load command address/offset not properly aligned"

@Mic92
Copy link
Member

Mic92 commented Apr 24, 2023

We changed some alignment stuff in #475 to fix alignment on arm. Looks like this can cause regressions with older glibc versions?
Can you be more precise when this happens and how to reproduce this i.e. using docker? cc @brenoguim

@mathstuf
Copy link

It's not trivial, but VTK's CI has been affected by this. It's not trivial, but everything is in CI here: https://gitlab.kitware.com/vtk/vtk/-/jobs/8115134. The only difference that is meaningful to the error (to a first approximation) is a patchelf bump (see the issue I had filed above).

IIUC, patchelf is used to stuff non-blessed libraries into Python wheels so that they work "everywhere" given the limited set of libraries/ABIs PyPI can expect to exist in arbitrary Linux machines. DT_SONAME, DT_RUNPATH, and DT_NEEDED entries are all affected (the last to sync with the first's changes) before copying into the wheel. This may change section sizes.

I suspect just getting any old project that compiles C or C++ code, uses some "weird" external library, and puts that into a wheel using auditwheel will show this problem when trying to use said wheel.

@svenevs
Copy link

svenevs commented Apr 24, 2023

I suspect just getting any old project that compiles C or C++ code, uses some "weird" external library, and puts that into a wheel using auditwheel will show this problem when trying to use said wheel.

We ran into this with drake, I restored our old functionality of building patchelf from source in RobotLocomotion/drake#19265 to be able to help test if desired. There are instructions on the PR of how to do the build, but I do not think drake will be a convenient codebase for you all to try and identify what needs to be fixed, since iterating development will be very slow. That said, if you think you have something working and a commit is pushed somewhere, I can fairly easily run a canary build to see if the new change is working as desired. Hope that helps some!

@Mic92
Copy link
Member

Mic92 commented Apr 29, 2023

It looks like in #494 it's only breaking on arm64/s390x for centos. What cpu arch are you on?

@mayeut
Copy link
Contributor

mayeut commented Apr 29, 2023

@Mic92, I saw the issue on x86_64. With #494, it just means that there are no tests that show the specific issue just now (or they only appear on arm64/s390x for some reason in the tests). #494 is meant for regressions not to be introduced once this issue is fixed (& a test added) since it only happens with some distros.

@mayeut
Copy link
Contributor

mayeut commented Apr 30, 2023

Ubuntu 18.04 x86_64 fails with the same message in multiple tests: https://github.com/NixOS/patchelf/actions/runs/4845550763/jobs/8634531910

@jacobwilliams
Copy link

This issue has broken conda-build for me, which I guess is calling patchelf? I'm on a centos x86_64 machine.

@mzjp2
Copy link

mzjp2 commented May 13, 2023

Yep, we're also seeing failures here in conda-build and mamba-build. Have pinned to a lower version of patchelf for now.

@isuruf
Copy link

isuruf commented May 24, 2023

It's hard to reproduce this issue, but I have seen the ELF load command address/offset not properly aligned randomly in our builds. Here's a way to reproduce a broken library, but not necessarily the same issue.

mkdir tmp && cd tmp
wget https://anaconda.org/conda-forge/cuda-cudart_linux-64/12.0.107/download/noarch/cuda-cudart_linux-64-12.0.107-h59595ed_4.conda
unzip cuda-cudart_linux-64-12.0.107-h59595ed_4.conda
rm -rf targets
tar -xvf pkg-cuda-cudart_linux-64-12.0.107-h59595ed_4.tar.zst

for i in 1 2 3 4 5 6 7 8 9 10; do
  patchelf --add-rpath '$ORIGIN../'"$i" ./targets/x86_64-linux/lib/libcudart.so.12
  patchelf --print-rpath ./targets/x86_64-linux/lib/libcudart.so.12
  python -c "import ctypes; ctypes.CDLL('./targets/x86_64-linux/lib/libcudart.so.12.0.107')"
done

@rcoup
Copy link

rcoup commented May 24, 2023

To add some complexity, I think Apple's Rosetta 2 has some differences from Linux/glibc wrt ELF interpretation — so if you're executing patchelf'd amd64 Linux binaries under macOS Docker on Apple Silicon (where amd64 ELF binaries are executed/translated by Rosetta 2 using binfmts); you may see different behaviours than a native amd64 Linux OS.

In particular, we've seen some weird issues: segfaults; or messing up dynamic libraries: trying to load lib instead of libwhatever — and patchelf is involved. I don't have a simple reproducer yet, and I don't think it's necessarily related to this issue — but more a heads up that if someone is trying to reproduce amd64 ELF issues under Docker on macOS on Apple Silicon you may get very different results.

@da-x
Copy link

da-x commented Jun 18, 2023

@Mic92

Issue 100% reproducible in unit tests when running under Rocky 8 docker:

docker run -it --rm -w $(pwd) -v $(pwd):$(pwd) rockylinux:8.8.20230518 bash -c 'dnf install -y gcc gcc-c++ make autoconf automake libacl-devel libattr-devel diffutils chrpath && ./bootstrap.sh && cd build && make check || (cat tests/*.log; exit 1)'

Example output (partial):

# Run the patched tool and libraries
./many-syms-main: error while loading shared libraries: libmany-syms.so: ELF load command address/offset not properly aligned
FAIL rename-dynamic-symbols.sh (exit status: 127)

@shr-project
Copy link

shr-project commented Jun 22, 2023

I'm also seeing mkfs.ext4 segfaults after calling patchelf --set-interpreter multiple times with 0.18.0 version (while 0.17.2 version worked fine), I've uploaded simple reproducer test here:
https://github.com/shr-project/patchelf/commits/jansa/mkfs.ext4.segfaults

Reverting 65cdee9 fixes this test.

@brenoguim
Copy link
Collaborator

I'll be able to look into these next week. With a reproducer it should be quick to debug!

bcumming added a commit to eth-cscs/alps-cluster-config that referenced this issue Aug 23, 2023
When used to set RPATHS (e.g. installing nvhpc, intel oneapi,
cray-mpich):
  ELF load command address/offset not properly aligned
c.f.  NixOS/patchelf#492
bcumming added a commit to eth-cscs/alps-cluster-config that referenced this issue Aug 23, 2023
When used to set RPATHS (e.g. installing nvhpc, intel oneapi,
cray-mpich):
  ELF load command address/offset not properly aligned
c.f.  NixOS/patchelf#492
ywbyun0815 pushed a commit to webosose/meta-webosose that referenced this issue Sep 7, 2023
:Release Notes:
mke2fs.real, mkfs.ext2.real, mkfs.ext3.real, mkfs.ext4.real are indentical
binary with multiple hardlinks and we end calling patchelf-uninative 4
times even when the interpreter is already set correctly from the build

:Detailed Notes:
To avoid corrupted binaries created on 18.04 ubuntu avoid calling
patchelf-uninative multiple times and in this case don't call it at all.

It might be related to:
NixOS/patchelf#492
or
NixOS/patchelf#446
but the later was already included in patchelf-0.17.2 used in uninative-3.9

This was submitted to upstream in:
https://lists.openembedded.org/g/openembedded-core/message/183314
but wasn't merged yet (so it cannot be in meta-webos-backports-* layer)
and it might take a while until it's backported to kirkstone.

:Testing Performed:
Only build tested.

:QA Notes:
No change to image.

:Issues Addressed:
[WRP-19053] CCC: Various build fixes
[WRP-17893] mkfs.ext4 segfaults with uninative 3.10 and newer
[WRP-6209] Update jenkins slaves to use Ubuntu 20.04 or 22.04

Cherry-picked-from-commit: d3e0606
Cherry-picked-from-branch:
RikardO-HM pushed a commit to hostmobility/mobility-poky-platform that referenced this issue Jan 3, 2024
Updating due to a segmentation fault in mkfs.ext4 caused by the patchelf in ubuntu 18.04.

Freescale/meta-freescale#1593
NixOS/patchelf#492 (comment)
conda-forge/admin-requests#746
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
* uninative-3.10 and 4.0 doesn't work on e.g. ubuntu-18.04, because patchelf-uninative
  makes the binaries unusable and e.g. mkfs.ext4 segfaults in loader, see:
  NixOS/patchelf#492

* mke2fs.real, mkfs.ext2.real, mkfs.ext3.real, mkfs.ext4.real are indentical
  binary with multiple hardlinks and we end calling patchelf-uninative 4
  times even when the interpreter is already set correctly from the build

  The issue was reported upstream with mkfs.ext4.real as possible reproducer:
  NixOS/patchelf#492 (comment)

  To fix uninative we need to first release new uninative tarball and
  then upgrade it in master, mickledore, kirkstone, dunfell

* originally reported in:
  https://lists.openembedded.org/g/openembedded-core/message/182862
  with temporary work around (applicable locally without waiting for
  new uninative release):
  https://lists.openembedded.org/g/openembedded-core/message/183314

(From OE-Core rev: f0499b58d1dd149300a349dde8f6664679df13e6)

Signed-off-by: Martin Jansa <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Richard Purdie <[email protected]>
mistafunk added a commit to Esri/pyprt that referenced this issue Jul 18, 2024
Workaround for:
"error while loading library 'pyprt/lib/libcom.esri.prt.codecs.so': liblzma-51a76f52.so.5.2.4: ELF load command address/offset not properly aligned"

Also see NixOS/patchelf#492
@satmandu
Copy link
Contributor

satmandu commented Sep 9, 2024

Hello all, I see various suggested patches for this issue. Any chance of some combo of them getting merged and a new release being cut, so downstream packagers don't have to worry about this?

Issam-b added a commit to Issam-b/android_openssl that referenced this issue Sep 16, 2024
OpenSSL has the configuration option shlib_variant, so we can use that instead.

This works for version 1.x and 3.x, so it would make the build script more similar between the two versions.

Also, this avoid issues that can come from patchelf, as this patch comes after a bug found in patchelf 0.18 that created wrongly aligned libraries. See NixOS/patchelf#492.
Issam-b added a commit to Issam-b/android_openssl that referenced this issue Sep 16, 2024
OpenSSL has the configuration option shlib_variant, so we can use that instead.

This works for version 1.x and 3.x, so it would make the build script more similar between the two versions.

Also, this avoid issues that can come from patchelf, as this patch comes after a bug found in patchelf 0.18 that created wrongly aligned libraries. See NixOS/patchelf#492.
Issam-b added a commit to Issam-b/android_openssl that referenced this issue Sep 16, 2024
OpenSSL has the configuration option shlib_variant, so we can use that instead.

This works for version 1.x and 3.x, so it would make the build script more similar between the two versions.

Also, this avoid issues that can come from patchelf, as this patch comes after a bug found in patchelf 0.18 that created wrongly aligned libraries. See NixOS/patchelf#492.
Issam-b added a commit to Issam-b/android_openssl that referenced this issue Sep 20, 2024
OpenSSL has the configuration option shlib_variant, so we can use that instead.

This works for version 1.x and 3.x, so it would make the build script more similar between the two versions.

Also, this avoid issues that can come from patchelf, as this patch comes after a bug found in patchelf 0.18 that created wrongly aligned libraries. See NixOS/patchelf#492.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests