-
-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struggling to switch users and maintain full cuda support #108
Comments
@njacobson-nci The problem is that And the default FYI @mathbunnyru Footnotes
|
@benz0li as far as I understand, in docker-stacks images we do not rely on LD_LIBRARY_PATH. https://www.sudo.ws/docs/man/sudoers.man/
|
@mathbunnyru For the jupyter/docker-stacks you are not supposed to. |
But if someone builds the jupyter/docker-stacks on top of nvidia/cuda images, |
Thanks @benz0li. It makes sense to me 👍 |
Setting LD_LIBRARY_PATH in the jupyter notebook does resolve the issue in notebooks. It doesn't appear to be required in a terminal after sourcing the jovyan bashrc, and it's not set in that terminal session either. Changing the start.sh to not check for an existing /home/{$NB_USER} does allow the script to copy over the jovyan directory correctly and then new terminals are properly set up, but it doesn't fix notebooks. I'll try preserving LD_LIBRARY_PATH and see how that does. import os os.environ["LD_LIBRARY_PATH"] = "/opt/conda/lib/:/usr/local/cuda/lib64/lib"import bitsandbytes |
@benz0li Applying this fix you provided does copy over the LD_LIBRARY_PATH to the environment of the jupyter notbooks, but libcudart.so is still not found. 'LD_LIBRARY_PATH': '/usr/local/nvidia/lib:/usr/local/nvidia/lib64', This path doesn't exist in this image or any of the other ones i've used recently. It is the default ld_library_path on the 11.6.2-cudnn-runtime base image, but those folders don't exist on that image either. |
That is correct. See also jupyter/docker-stacks#1792 (comment).
You need to set/extend the path to Footnotes
|
Updating the start.sh to set the ld_library_path as you called out here still has issues, but that might be a bitsandbytes thing. Appending /opt/conda/lib/ does resolve notebooks being able to find the libcudart.so. This is what i added to the start.sh to fix it now It still seems like there is a deeper issue with switching users in this manner, and I wonder if there are further bugs that will be experienced when applying this fix. In a notebook, if I run " ! ll " this alias is not found despite being defined in the jovyan/njacobson .bashrc, is that expected? |
@njacobson-nci I can't help you any further as my images only use Python – and don't have Conda / Mamba installed. |
Understood, I appreciate the help very much! |
P.S.: You can always install Conda / Mamba on user level in my images. |
I'm trying to run this stack for a few different users and want to be able to switch the username of the notebook user when i stand up the image.
When I do this, the spawned terminals/notebooks under the switched user aren't correctly sourcing the jovyan bashrc and running bitsandbytes fails to find libcudart.so.
The command i'm using: (most basic version to eliminate any variables in my custom install and deployment)
docker run --gpus all -it -p 8848:8888 --user root -e NB_USER="njacobson" -e CHOWN_HOME=yes -w "/home/njacobson" cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only
I attach to the container as root and run
mamba install cudatoolkit -y
python -m pip install bitandbytes
python -m bitsandbytes
this fails to init and can't find libcudadart.so
if I source the /home/jovyan/.bashrc
python -m bitsandbytes works
Running a jupyter notebook via jupyterlab and importing bitsandbytes also fails.
As does a terminal spawned via jupyterlab unless I source the jovyan .bashrc.
I've tried copying the jovyan .bashrc into my home and chowning it, this fixes new terminals but new notebooks still won't properly import bitsandbytes.
nvidia-smi nvcc and torch.cuda.is_available() work in notebooks.
Not sure if this belongs here or with docker-stacks, but figured I'd start here.
Thanks!
The text was updated successfully, but these errors were encountered: