Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing PYG with conda is very buggy #4386

Closed
andrei-rusu opened this issue Mar 31, 2022 · 12 comments · Fixed by #4400
Closed

Installing PYG with conda is very buggy #4386

andrei-rusu opened this issue Mar 31, 2022 · 12 comments · Fixed by #4400

Comments

@andrei-rusu
Copy link

andrei-rusu commented Mar 31, 2022

😵 Describe the installation problem

This is a report concerning an environment built from scratch from an environment.yml file.
First thing I tried was installing a PyTorch 1.11 environment, which caused the following bug with CUDA: rusty1s/pytorch_scatter#248.
Second try was with a PyTorch 1.10 environment which resulted in the following error: #3593. I was able to temporarily fix this by uninstalling torch_spline_conv but this was just a temporary fix. The very next moment I tried installing a package with CUDA, pyg==2.0.4 reported a conflict, which couldn't be resolved by anything I tried. The conda resolver tried in vain to fix this issue, reporting countless of versioning issues related to PYG, including again the GLIBC version problem (which apparently IPython also has a problem with, albeit being silent before the full conda check is performed). Worth mentioning that this is a remote server so I cannot change the GLIBC version anyway.

This is now getting frustrating and I'd hate to have to resort to installing from scratch the environment every time I need a new package. Something is clearly broken with the PYG versioning on conda, since uninstalling pyg fixed everything...
Below is the environment.yml in question:

name: graph
channels:
  - pytorch
  - pyg
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - conda
  - pytorch=1.10.1
  - torchvision==0.11.2
  - cudatoolkit=11.3
  - pyg
  - networkx
  - numpy
  - matplotlib
  - plotly
  - pandas
  - tqdm
  - dill
  - scikit-learn
  - jupyterlab
  - torchvision
  - pytorch-lightning
  - neptune-client

Environment

  • PyG version: 2.0.4
  • PyTorch version: 1.10
  • OS: Red Hat Enterprise Linux 7"
  • Python version: 3.9.12
  • CUDA/cuDNN version: 11.3
  • How you installed PyTorch and PyG (conda, pip, source): conda
  • Any other relevant information (e.g., version of torch-scatter):
@NucciTheBoss
Copy link
Contributor

Hmm. Maybe an update to the conda package is in order.

A GLIBC version error is encountered typically when you try to use an executable compiled for a newer Linux-based OS (i.e. RHEL8) and then try to use it on an older OS (RHEL7). I have encountered the GLIBC version error a few times with certain conda packages on my RHEL7 cluster. Drawback to conda using precompiled packages. You usually only have two ways around this:

  1. Install from source.
  2. Use the conda package inside a container whose base image comes with the necessary GLIBC version.

Since installing from source is not the preferable option, do you have Singularity installed on your remote server?

@andrei-rusu
Copy link
Author

Thanks for the reply! The server uses Slurm, and browsing through the modules I found one that does have Singularity, but I never used that so I am not sure how that would help. I tried searching for a module that actually comes preloaded with another GLIBC version, but no luck there (I found 2021 versions of GCC, but no newer GLIBC...).

@NucciTheBoss
Copy link
Contributor

Yeah... you won't find a module that updates the GLIBC version on your remote server. If the Linux kernel is the brain of a computer, GLIBC is the spine. Swapping out versions of GLIBC on a Linux system can cause the entire system to break since many applications rely on it. Here's a graphic from WIkipedia that shows how it plays into the Linux system:
Linux_kernel_System_Call_Interface_and_glibc

Singularity would help by containerizing PyG so that it can still run on your system even though it has an older GLIBC version. A recent pull request of mine (#4376) fixed issues with building the Singularity image, however, I am still working on updating the image to a newer version of PyG. If you have sudo privileges on a Linux system, you can build the image yourself:

git clone https://github.com/pyg-team/pytorch_geometric.git
sudo singularity build pyg.sif pytorch_geometric/docker/singularity

This would be your best workaround while the conda package and container recipes are updated.

@andrei-rusu
Copy link
Author

I see, thank you! Unfortunately I do not have sudo there, so I think I'll stick to using the "dirty" pip installs for now. But I'll keep this issue opened as it would be nice to have conda behave nicely on remote servers (a lot of which utilize outdated GLIBC unfortunately).

@rusty1s
Copy link
Member

rusty1s commented Apr 1, 2022

As far as I see, a simple fix would be to exclude the pytorch-spline-conv dependency from pyg. Would that work for you?

@andrei-rusu
Copy link
Author

andrei-rusu commented Apr 1, 2022

Yes, maybe that would work. I did not exactly understand the context of the pyg conflicts that conda was complaining about, as the environment was successfully set up via the environment.yml, only for it to cause conflicts upon uninstalling pytorch-spline-conv (step which I've done in order to be able to import torch_geometric.nn without the GLIBC error). Maybe excluding that dependency altogether fixes up the reported conflicts, as I suspect my questionable pip uninstall was sufficient for running my code, but not suitable for conda to quit complaining.

@rusty1s
Copy link
Member

rusty1s commented Apr 2, 2022

This is fixed in #4400.

@rusty1s rusty1s closed this as completed Apr 2, 2022
@rperera12
Copy link

Hi,
I am facing the same issue with Slurm cluster which has GLIBC 2.17.
I am able to import torch_geometric fine with cuda with no problem, until importing MessagePassing (from torch_geometric.nn.conv import MessagePassing) where I get the GLIBC error.
This might be a silly question, but is there a way to import MessagePassing without needing pytorch-spline-conv?

@rusty1s
Copy link
Member

rusty1s commented Apr 30, 2022

torch-spline-conv Is an optional dependency. If you do not need this operator, you can simply choose to not install it.

@rperera12
Copy link

Thank you!
I am using one of the PyG models that needs MessagePassing,
I am currently importing it using:

from torch_geometric.nn.conv import MessagePassing

This line is the one that causes the GLIBC error since it calls torch-spline-conv at some point.
is there a way to import MessagePassing without needing pytorch-spline-conv?

@rusty1s
Copy link
Member

rusty1s commented May 2, 2022

Can you try to run

pip uninstall torch-spline-conv

@rperera12
Copy link

Perfect!
This did the trick, I can now use MessagePassing on slurm cluster without any issues.
Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants