Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a manylinux2014 compatible release with only LLVM-based backends #4395

Closed
k-ye opened this issue Feb 25, 2022 · 17 comments · Fixed by #4550
Closed

Add a manylinux2014 compatible release with only LLVM-based backends #4395

k-ye opened this issue Feb 25, 2022 · 17 comments · Fixed by #4550
Assignees
Labels
build Build related issues feature request Suggest an idea on this project release

Comments

@k-ye
Copy link
Member

k-ye commented Feb 25, 2022

This will include both CPU and CUDA backends, which could be useful for cloud users. See #4377, #3332

@k-ye k-ye added feature request Suggest an idea on this project build Build related issues labels Feb 25, 2022
@k-ye k-ye added the release label Feb 25, 2022
@bobcao3
Copy link
Collaborator

bobcao3 commented Feb 26, 2022

If we directly build on CentOS 7 nvidia docker, can we get a manylinux2014 wheel with proper cuda / vulkan?

@k-ye
Copy link
Member Author

k-ye commented Feb 27, 2022

Maybe? Not sure either..

My reasoning is that, if people cares about manylinux2014 compatibility, they're definitely running Taichi on a Linux server, so it's fine to limit the provided backends in this case. Instead of uploading these as PyPI packages, we can probably just build a few such special wheels and place them in the Github release page for users who need them. This simplifies our PyPI mgmt.

@bobcao3
Copy link
Collaborator

bobcao3 commented Feb 27, 2022

Linux GPU servers are quite common these days tho. At least we should provide CUDA.

@qiao-bo qiao-bo self-assigned this Feb 28, 2022
@qiao-bo
Copy link
Contributor

qiao-bo commented Feb 28, 2022

@k-ye @bobcao3, Let me recap some of our previous efforts here: The problem with CentOS 7 is that it comes with GCC 4.8, this is too old to compile Taichi (not saying impossible, but too many errors to dig). This is the same story for Nvidia CentOS7. The way we circumvent this is to use a magic docker image with Red Hat devtoolset-10. (https://git.centos.org/rpms/devtoolset-10-gcc), which is based on CentOS7 but with gcc 10. By using this image (quay.io/pypa/manylinux2014_x86_64), we still need to recompile clang and llvm but it is possible to build.

Now to support CUDA, I see three options:
[a] Use Nvidia CentOS 7 image, but fix gcc 4.8 problem. Difficulty is high.
[b] Use the CentOS 7 based image with gcc 10. But install and build Nvidia/CUDA tools. Should be possible.
[c] Still use our Ubuntu 18.04 image, but pin glibc version to 2.2.5 (we already knew how to do it). But ship Taichi without Vulkan. Difficulty is low.

I vote for option [c], WDYT?

@k-ye
Copy link
Member Author

k-ye commented Feb 28, 2022

[c] sounds good to me!

@bobcao3
Copy link
Collaborator

bobcao3 commented Feb 28, 2022

c sounds like the best option but I do think we should attempt a or b so that at least we can confirm vulkan works on the manylinux2014 target. (Which is a lot higher than 2.2.5)

@k-ye
Copy link
Member Author

k-ye commented Feb 28, 2022

I think our top priority in this issue is to unblock CUDA users.. Vulkan will be a nice-to-have, but it's not that urgent :-)

@strongoier
Copy link
Contributor

Where can we upload such a wheel without Vulkan?

@k-ye
Copy link
Member Author

k-ye commented Feb 28, 2022

Where can we upload such a wheel without Vulkan?

I believe GH provides API to upload additional artifacts to release pages like this (https://github.com/taichi-dev/taichi/releases/tag/v0.9.0)

@strongoier
Copy link
Contributor

strongoier commented Feb 28, 2022

Yes that's definitely possible. However, the updated version is no longer Taichi v9.0. Are we going to do so only for future releases?

Next steps in my thoughts:

  1. Update https://github.com/taichi-dev/taichi/blob/master/.github/workflows/release.yml to build a new wheel without Vulkan
  2. In our release workflow, add a step which asks the person responsible for a release to manually update that wheel to the GH release page

@k-ye
Copy link
Member Author

k-ye commented Feb 28, 2022

Are we going to do so only for future releases?

Yep

to manually update that wheel to the GH release page

I think this can be automated with actions like https://github.com/marketplace/actions/gh-release

@strongoier
Copy link
Contributor

strongoier commented Mar 7, 2022

I thought following route [c] could completely solve the problem and was about to add the no-vulkan wheel into our release workflow. However, when I tried the real wheel taichi-nightly-py3.8-manylinux2014.whl on a CentOS 7 image, CXXABI_1.3.11 not found was reported, showing the final failure of route [c]. We still have to build Taichi on a CentOS 7 image instead of our existing Ubuntu images. I will try route [b] next.

@qiao-bo
Copy link
Contributor

qiao-bo commented Mar 8, 2022

GCC 4.8 comes with CXXABI 1.3.7. Maybe first try if the wheel works with this image (quay.io/pypa/manylinux2014_x86_64)? If yes, option [b] will probably have the same result.

@strongoier
Copy link
Contributor

Same error appears on quay.io/pypa/manylinux2014_x86_64. The magic image works by updating GCC without updating libstdc++, so we can still only use CXXABI 1.3.7.

@qiao-bo
Copy link
Contributor

qiao-bo commented Mar 8, 2022

Turns out we do need the magic image to be strictly manylinux compliant. Thanks for confirming this. I wonder which version of gcc and libstdc++ does cloud develop environment typically use, let's see whether the previous wheel could still be useful.

@qiao-bo
Copy link
Contributor

qiao-bo commented Mar 8, 2022

This dockerfile builds Taichi on CPU using the image. It is useful as a starting point.

@strongoier
Copy link
Contributor

strongoier commented Mar 9, 2022

I finally choose path [a], customizing an image for our purpose. After tons of trials, now ti.cuda works properly and:

$ auditwheel-symbols -m 2014 taichi-0.9.2-cp39-cp39-linux_x86_64.whl 
taichi/_lib/core/taichi_core.so is manylinux_2_17(aka manylinux2014) compliant.

Next steps:

  • Upload the Dockerfile.
  • Add the image to our local registry. (registry.taichigraphics.com/taichidev-manylinux2014-cuda:v0.0.0)
  • Add the generation of the wheel into the release workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build related issues feature request Suggest an idea on this project release
Projects
None yet
4 participants