Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ray wheels manylinux2014 compatible #18506

Closed
2 tasks
pcmoritz opened this issue Sep 10, 2021 · 16 comments
Closed
2 tasks

Make Ray wheels manylinux2014 compatible #18506

pcmoritz opened this issue Sep 10, 2021 · 16 comments
Labels
core Issues that should be addressed in Ray Core dependencies Pull requests that update a dependency file Devprod enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Milestone

Comments

@pcmoritz
Copy link
Contributor

pcmoritz commented Sep 10, 2021

What is the problem?

Currently we are not running auditwheel on our wheels (see

# Rename the wheels so that they can be uploaded to PyPI. TODO(rkn): This is a
). It turns out they are actually not manylinux2014 compatible.

The first problem when running auditwheel on them is that all the binaries are in the purelib instead of the platlib folder. After unzipping the wheel, moving purelib to platlib and adapting ray-1.6.0.dist-info/RECORD to point to the new files, I'm getting this message from auditwheel:

ray-1.6.0-cp37-cp37m-linux_x86_64.whl is consistent with the
following platform tag: "manylinux_2_17_x86_64".

The wheel references external versioned symbols in these
system-provided shared libraries: libc.so.6 with versions
{'GLIBC_2.4', 'GLIBC_2.6', 'GLIBC_2.3.4', 'GLIBC_2.17', 'GLIBC_2.14',
'GLIBC_2.7', 'GLIBC_2.15', 'GLIBC_2.3.2', 'GLIBC_2.9', 'GLIBC_2.16',
'GLIBC_2.3', 'GLIBC_2.2.5', 'GLIBC_2.8', 'GLIBC_2.11', 'GLIBC_2.10'},
libstdc++.so.6 with versions {'GLIBCXX_3.4.18', 'CXXABI_1.3.5',
'CXXABI_1.3.7', 'GLIBCXX_3.4.14', 'GLIBCXX_3.4.19', 'GLIBCXX_3.4',
'GLIBCXX_3.4.11', 'CXXABI_1.3.3', 'CXXABI_1.3', 'CXXABI_1.3.2',
'GLIBCXX_3.4.17', 'GLIBCXX_3.4.15', 'GLIBCXX_3.4.9'}, libgcc_s.so.1
with versions {'GCC_3.0'}, libpthread.so.0 with versions
{'GLIBC_2.2.5', 'GLIBC_2.12', 'GLIBC_2.3.3', 'GLIBC_2.3.2'}, libm.so.6
with versions {'GLIBC_2.2.5'}, librt.so.1 with versions
{'GLIBC_2.2.5'}, libdl.so.2 with versions {'GLIBC_2.2.5'}

This constrains the platform tag to "manylinux_2_17_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.

It is not clear that this is currently causing problems but it certainly has the potential to cause problems (in the past we have seen very subtle segfaults from non-conforming wheels). While the manylinux2014 standard is compatible with all older standards (manylinux1, manylinux2010), I'm not sure if this is also the case for manylinux_2_17_x86_64.

We should make sure our wheels have the right tags and we run auditwheel on them. Preferably we should produce manylinux2014 wheels.

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@pcmoritz pcmoritz added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 10, 2021
@mwtian
Copy link
Member

mwtian commented Sep 11, 2021

Great find. I agree either the wheels should conform to manylinux2014, or just be built with a newer image to comform to manylinux_2_y. The current situation has the disadvantage of both using an old image (past EOL) for building wheels which limits our ability for e.g. installing newer tools, and not actually compatible with older environments. cc @scv119

@duburcqa
Copy link
Contributor

duburcqa commented Sep 11, 2021

To be conformant with manylinux2014, exported dynamic libraries must depends on the following maximum versions according to PEP599:

GLIBC_2.17
CXXABI_1.3.7
GLIBCXX_3.4.19

which is apparently the case here, so no segfault to expect in the first place.

From PEP600, manylinux2014 is precisely an alias for manylinux_2_17.

@scv119 scv119 added the P1 Issue that should be fixed within a few weeks label Sep 11, 2021
@scv119 scv119 added this to the Core Backlog milestone Sep 11, 2021
@scv119 scv119 removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Sep 11, 2021
@duburcqa
Copy link
Contributor

duburcqa commented Sep 11, 2021

Apparently it is possible to explicitly check compliance to manylinux2014_x86_64 by doing auditwheel repair --plat manylinux2014_x86_64 [...] if necessary. Hope it helps!

@scv119
Copy link
Contributor

scv119 commented Sep 11, 2021

Thanks for reporting, we will look into this. Also thanks @duburcqa for the pointer, seems we might just need to run auditwheel as @duburcqa suggested.

@SongGuyang
Copy link
Contributor

Do we have a plan to fix this? @scv119 @mwtian

@mwtian
Copy link
Member

mwtian commented Oct 19, 2021

We definitely will when there is bandwidth. Is there a specific problem you are seeing? @SongGuyang

@SongGuyang
Copy link
Contributor

@mwtian We don't ensure the root cause of that why we must set "ABI=0" in C++ example to make it work even now. Can you help to find? cc @qicosmos

@ericl ericl added enhancement Request for new feature and/or capability and removed bug Something that is supposed to be working; but isn't labels Jan 25, 2022
@scv119
Copy link
Contributor

scv119 commented Apr 20, 2022

@mwtian do you know the status of this one?

@mwtian
Copy link
Member

mwtian commented Apr 20, 2022

This has been dropped for awhile .. Related questions are: whether and when should Ray move to the newer manylinux standard, and what kind of ABI should Ray guarantee. Maybe we can start a doc or REP ..

@SongGuyang
Copy link
Contributor

Wish to see a REP. 😊

@rkooo567 rkooo567 added the core Issues that should be addressed in Ray Core label Dec 9, 2022
@scv119 scv119 added the dependencies Pull requests that update a dependency file label Feb 16, 2023
@jjyao jjyao added the Devprod label Sep 25, 2023
@jjyao
Copy link
Collaborator

jjyao commented Sep 25, 2023

@aslonnie @can-anyscale is this issue still relevant now?

@aslonnie
Copy link
Collaborator

@aslonnie @can-anyscale is this issue still relevant now?

probably still relevant, we never really audit the wheel, and the TODO comment was removed by @can-anyscale (which probably should be kept):

https://github.com/ray-project/ray/pull/38415/files#diff-38e85fefc7a419849845aad70765dc1f50055cd84a65f175374cf634ef6543b1L122

which might explains the various segfaults.. as we build the wheels in manylinux, but run them in ubuntu..

@can-anyscale
Copy link
Collaborator

yes, likely still an issue, and the TODO is moved into ci/build/build-manylinux-wheel.sh file ;)

@poodlewars
Copy link

poodlewars commented May 1, 2024

hi @aslonnie @can-anyscale we've been having some trouble with Ray segfaults with our project, arcticdb, since we upgraded it to C++20. arcticdb is manylinux 2014 compatible.

import ray
import arcticdb

segfaults, whereas,

import arcticdb
import ray

is fine.

It seems likely to us that this is a packaging / manylinux issue.

I'm curious about the remark above,

which might explains the various segfaults

Have other projects reported issues with this?

@aslonnie
Copy link
Collaborator

hi

Have other projects reported issues with this?

not recently at least.

where does it segfault at ? what version of ray and arcticdb and platform+python environment can we reproduce this?

@jjerphan
Copy link

jjerphan commented May 20, 2024

Hi all,

The segmentation fault can be reproduced with the PyPI wheels of arcticdb==4.4.2 and ray-core=2.11.0, at least on Linux. This has nothing to do with Ray and the problem comes from the PyPI wheels of ArcticDB.

Full explanation

The build system of ArcticDB is currently statically linking the C++ standard library within shared objects it ships in its wheels. man-group/ArcticDB#1572 modifies the linkage so that the C++ standard library is loaded dynamically, making the wheels comply with the manylinux2014 policy.

This solves the problem for me locally, but we need extensive testing. To me, man-group/ArcticDB#1572 is likely a partial solution to this kind of issues. I believe we should make sure the wheels of ArcticDB and Ray comply with the manylinux2014 policy (which was crafted to share common libraries, eventually preventing this kind of problems).

ArcticDB uses auditwheel before publishing the wheels, but this does not make them manylinux compliant.

Note that those segmentation faults cannot be reproduced with the conda-forge package of arcticdb and ray-core thanks to the conda package model for whom common libraries are distributed as shared dependencies in dedicated packages.

In any case, I confirm that Ray also must have manylinux-compliant wheels instead of performing custom erroneous adaptation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core dependencies Pull requests that update a dependency file Devprod enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests