Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to install Tensorflow and Keras for Python and R #6

Closed
damianavila opened this issue Jan 11, 2022 · 23 comments
Closed

Request to install Tensorflow and Keras for Python and R #6

damianavila opened this issue Jan 11, 2022 · 23 comments
Assignees

Comments

@damianavila
Copy link
Contributor

This is a request coming from a Freshdesk support ticket: https://2i2c.freshdesk.com/a/tickets/65
We had a conversation in Slack about this one and we decided we are going to implement the update (instead of the user itself) because of the inherent complexity at the time to install those libraries.
The timeline and specific versions are not clear yet but I already have requested that information.

@damianavila
Copy link
Contributor Author

The latest versions are fine according to the requester.

@damianavila
Copy link
Contributor Author

if it could be installed by the end of this week that would really help these instructors kick off the term

That is the time expectation I got from them.

@damianavila
Copy link
Contributor Author

Copy-pasting what I have written on Slack:

If we install TensorFlow with pip or conda, it should also bring Keras accordingly to https://keras.io/getting_started/

Then it seems it is just a matter to use the python installation from the R side: https://tensorflow.rstudio.com/installation/custom/#locating-tensorflow

We could eventually ship a .RProfilefile

You could also add the RETICULATE_PYTHON environment variable to your .RProfile.

Or ask them to reference the python installation with use_python() or use_condaenv()

@damianavila
Copy link
Contributor Author

Opened an exploratory PR at #7 to build things in several steps.

@damianavila
Copy link
Contributor Author

Update: I have tested the image I build in #7 in the staging hub under the 2i2c cluster (because I can not easily login into the UoT staging hub) and it seems to be working as expected from the Python side of things: I have built and run a notebook with some basic TF and Keras examples without issue (although with several warnings... it seems both TF and Keras are pretty verbose because you can actually install them in different configurations, ie. CPU vs GPU).
I also tested installing the R counterpart on the fly and they seem to work OK as well (so you can also use them from RStudio). I still need to add them to the #7 R requirements, though.

@damianavila
Copy link
Contributor Author

OK, I tried installing the R requirements (tensorflow and keras R-based version... you actually need those counterparts that use the python stack under the hood) but now I am facing a failing pattern related to the fact I need to bump the version of other R packages (see my latest commits in the #7). This seems to be an "endless" pattern after several iterations, so I am wondering if others actually faced the same problem at the time to deal with this image.
@GeorgianaElena @yuvipanda, did you find the same pattern? If that is the case, did you keep bumping versions until no errors were found? Did you try something else?
I am actually quite surprised you can not target older versions of packages coming from cran, is this something specific from the R package management space? Or am I missing something big because I am not knowledgeable enough in that space?

@damianavila
Copy link
Contributor Author

Doing some research I found this article: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages, which seems to indicate we would need to pass a URL to hit the old packages:

$ git diff
diff --git a/install.R b/install.R
index 78bf356..77d5b8d 100755
--- a/install.R
+++ b/install.R
@@ -116,7 +116,8 @@ github_packages <- c(
 for (i in seq(1, length(cran_packages), 2)) {
   devtools::install_version(
     cran_packages[i],
-    version = cran_packages[i + 1]
+    version = cran_packages[i + 1],
+    repos = "http://cran.us.r-project.org"
   )
 }

But in that case, I would be worried about future incompatibilities (as the article indicated at the end):

Potential issues
There are a few potential issues that may arise with installing older versions of packages: - You may be losing functionality or bug fixes that are only present in the newer versions of the packages. - The older package version needed may not be compatible with the version of R you have installed. In this case, you will either need to downgrade R to a compatible version or update your R code to work with a newer version of the package.

@GeorgianaElena
Copy link
Member

@damianavila, yes, I believe I kept bumped versions until it worked. Although I don't remember going into this many steps :(

version = cran_packages[i + 1],
repos = "http://cran.us.r-project.org"

So the place where the R pkgs get installed from is the Rstudio pkg manager:

# Use binary packages!
r-cran-repos=https://packagemanager.rstudio.com/all/__linux__/focal/latest

And I believe that's because it can provide binary pkgs rather than just from source files.

@damianavila
Copy link
Contributor Author

Thanks for the additional context, @GeorgianaElena!

It seems the Rstudio pkg manager actually offers a way to "freeze" the package set you can fetch from but that would not play well with new libraries (when we need to install new ones).

I will try to sync the packages we currently have in file with the latest one from the RStudio package manager and see how that goes...

@damianavila
Copy link
Contributor Author

Independently of how it goes, I think we need to re-think how we are creating and maintaining the R environment because otherwise, we are going to have this version jumps in multiple packages any time we might need to add some new dependencies and that could be an unstable territory from the user perspective, IMHO.

@damianavila damianavila changed the title Request to install Tensorflow and Keras for R Request to install Tensorflow and Keras for Python and R Jan 19, 2022
@damianavila
Copy link
Contributor Author

Btw, for future readers, these are the related PRs workarounding the issue at the time to test in Binder (not possible) and the repo2docker timeout (I have already merged both those along the way):

@damianavila
Copy link
Contributor Author

damianavila commented Jan 20, 2022

The last try (syncing all the versions) seemed to work. I was able to build a test image and tested some basic TF and Keras commands and it seems to work.
I have asked @GeorgianaElena to deploy the test image in UoT staging and I will ask the requester to test it there and provide feedback before promoting it to production through the UoT hub config file (on the PR is merged and successfully built).

@yuvipanda
Copy link
Member

Done! Thanks a lot, @damianavila!

@damianavila
Copy link
Contributor Author

Thanks for the merge, @yuvipanda.
Additional context: the requesters provided positive feedback about the image on staging (https://2i2c.freshdesk.com/a/tickets/57).

@yuvipanda yuvipanda reopened this Jan 25, 2022
@yuvipanda
Copy link
Member

Unfortunately actually attempting to use these libraries doesn't work:

image

If you set Sys.setenv(RETICULATE_PYTHON="/opt/conda/bin/python") temporarily, you now get a different error:

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which makes me suspect and fear that the issue is that R and python are using different openssl libraries...

@damianavila
Copy link
Contributor Author

I was able to build a test image and tested some basic TF and Keras commands and it seems to work.

I did not need to add the RETICULATE_PYTHON before... not sure why this is not working now, although I suspect the image is actually a little bit different now since the last time I tested.

@damianavila
Copy link
Contributor Author

damianavila commented Apr 7, 2022

Update of the whole situation (and also draft to be posted to the Jupyter discourse forum):

Openssl mismatch between RStudio and conda environments

tl;dr

RStudio comes bundled with its own system version of OpenSSL. Conda also installs OpenSSL via some packages. If you use RStudio to run a conda-installed package that calls OpenSSL, there is a good chance that it won't work due to an OpenSSL version mis-match. This is because RStudio forces the use of a system version of OpenSSL, while conda expects its own version of OpenSSL. To fix it, either call the function that requires OpenSSL from a Jupyter interface, or separate your conda and RStudio environments entirely.

Introduction

Recently, 2i2c received a request to install Tensorflow and Keras in an image containing conda environments along wit several R packages, including RStudio: https://github.com/2i2c-org/utoronto-image.

We were able to install the python TensorFlow package and the R counterparts as instructed by the corresponding documentation.

We also needed to set up the RETICULATE_PYTHON environment variable so the R packages could properly find the python ones: https://github.com/2i2c-org/utoronto-image/blob/main/Rprofile.site#L11

Problem

Our users began reporting issues when trying to download example datasets from within RStudio. For example:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which suggested some underlying openssl-related issues.

Investigation

Upon several rounds of debugging sessions, we have found that RStudio seems to load the "system" OpenSSL libraries when it is opened. For example:

$ ldd /usr/lib/rstudio-server/bin/rserver | grep ssl        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f0413833000)

and the "system" version is 1.1.1f

$ dpkg -l | grep openssl
ii  libcurl4-openssl-dev:amd64           7.68.0-1ubuntu2.7                   amd64        development files and documentation for libcurl (OpenSSL flavour)
ii  openssl                              1.1.1f-1ubuntu2.12                  amd64        Secure Sockets Layer toolkit - cryptographic utility

But the conda main environment actually has another version, 1.1.1l:

$ conda list openssl
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
openssl                   1.1.1l               h7f98852_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
$ openssl version
OpenSSL 1.1.1l  24 Aug 2021

Our hypothesis for what is happening:

  • when RStudio starts the session, it loads the OpenSSL 1.1.1f "system" components
  • when you try to load a dataset, the TensorFlow + Keras libraries (and underlying packages) tried to use this RStudio version
  • but the conda (python) libraries are (somehow) expecting the conda-installed OpenSSL version (1.1.1l)
  • This then fails!

To test if this specific mismatch was causing the problem, we tried with a symbolic link hack:

mv /opt/conda/lib/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1.backup
ln -s /usr/lib/x86_64-linux-gnu/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1

and then, it worked!

Screen Shot 2022-03-22 at 12 01 33

This confirmed our suspicion, but is likely not a long-term solution because it is probably a very brittle fix that will break in unexpected ways.

Things we have tried (and it did not work!)

First, we thought about syncing the openssl versions in both environments ("system" and conda): #28. But that approach did not work!

Then we tried pointing the LD_LIBRARY_PATH environment variable to the conda-specific openssl-related paths so "force" RStudio to load the expected openssl libraries but that approach also failed: #29 and triggered other potential issues!

Possible workarounds

After spending a lot of hours on this issue we finally decided to stop trying, look for reasonable workarounds, and post here to disseminate the information we collected.

We have verified the openssl mismatch does NOT happen when you use the Jupyter Notebook application with the R-kernel. So the problem seems to be an RStudio-specific issue when you have multiple co-existing environments (most likely caused by RStudio somehow loading the "system" openssl libraries instead of the conda one). Hence, an immediate workaround is use a Jupyter interface to download the dataset and then return to RStudio for the rest of your task.

Another alternative would be to create a different image without a conda environment to run your RStudio workflows, so that any python package (including TF or Keras) actually uses the "system" openssl library instead of a conflicting one.

There might be other options involving fixes/enhancements at the RStudio level, but this is outside of our expertise to fix. If others have experience with RStudio and an idea for how to resolve this, please share your ideas!

Eager to hear from you if you have any thoughts or if you faced this very same problem (even if you did not solve it ;-).

Hopefully, all this information is useful for future readers!

@damianavila
Copy link
Contributor Author

@choldgraf, this is the draft for the Jupyter discourse post we talked about yesterday.
Feel free to edit it as you wish!

@2i2c-org/tech-team, feel free to comment about the draft as well!

@choldgraf
Copy link
Member

@damianavila I added a few quick edits above. I think in general it looks good to me! Quick thought on structure, but I think we can probably send it off quickly after that:

I think that the post should go from "most general" to "most technical". Most people are going to start reading at the top and lose steam by the time they read 100-150 words, unless they are very motivated. So I think we should put the most actionable and important stuff at the top. To that extent, I'd structure it like:

  • tl;dr: A 1-3 sentence explanation of what happens and how to fix it. Very general and point below for any details.
  • Context of the environment setup
  • Description of the problem and what it created from a user-facing perspective.
  • Say what we think the problem is.
  • Say what people can do to avoid it.
  • As an appendix: include any information about what we tried, error messages, code snippets, etc.

I think the stuff like "confirming which version of OpenSSL" is really nice, but doesn't answer the question people will have of "ok but what do I actually do about this?". So I think we could put that information at the bottom for people who really want to learn more.

@damianavila
Copy link
Contributor Author

Thanks for the feedback, @choldgraf!

I'd structure it like

OK, I will try your structure and ping you back again when it is ready so you can quickly look at it before posting it.

@damianavila
Copy link
Contributor Author

@choldgraf, I think this new layout adheres to your last request.
Can you take another look? (and feel free to make edits).


Openssl mismatch between RStudio and conda environments

tl;dr

RStudio uses the "system" version of OpenSSL. Conda also installs OpenSSL. If you use RStudio to run a conda-installed package that calls OpenSSL, there is a good chance that it won't work due to an OpenSSL "mismatch". This is because RStudio forces the use of a system version of OpenSSL, while conda expects its own version of OpenSSL. To fix it, either call the function that requires OpenSSL from a Jupyter interface, or separate your conda and RStudio environments entirely.

Introduction

Recently, 2i2c received a request to install Tensorflow and Keras in an image containing conda environments along wit several R packages, including RStudio: https://github.com/2i2c-org/utoronto-image.

We were able to install the python TensorFlow package and the R counterparts as instructed by the corresponding documentation.

We also needed to set up the RETICULATE_PYTHON environment variable so the R packages could properly find the python ones: https://github.com/2i2c-org/utoronto-image/blob/main/Rprofile.site#L11

Problem

Our users began reporting issues when trying to download example datasets from within RStudio. For example:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- unknown url type: https

which suggested some underlying OpenSSL-related issues.

Investigation

Upon several rounds of debugging sessions, we have found that RStudio seems to load the "system" OpenSSL libraries when it is opened. Our hypothesis for what is happening:

  • when RStudio starts the session, it loads the OpenSSL "system" components
  • when you try to load a dataset, the TensorFlow + Keras libraries (and underlying packages) try to use this OpenSSL "system" version
  • but the conda (python) libraries are (somehow) expecting the conda-installed OpenSSL version
  • This then fails!

To test if this specific mismatch was causing the problem, we tried a symbolic link hack:

mv /opt/conda/lib/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1.backup
ln -s /usr/lib/x86_64-linux-gnu/libssl.so.1.1 /opt/conda/lib/libssl.so.1.1

and then, it worked!

Screen Shot 2022-03-22 at 12 01 33

This confirmed our suspicion but is likely not a long-term solution because it is probably a very brittle fix that will break in unexpected ways.

Possible workarounds

After spending a lot of hours on this issue we finally decided to stop trying, look for reasonable workarounds, and post here to disseminate the information we collected.

We have verified the OpenSSL mismatch does NOT happen when you use the Jupyter Notebook application with the R-kernel. So the problem seems to be an RStudio-specific issue when you have multiple co-existing environments (most likely caused by RStudio somehow loading the "system" OpenSSL libraries instead of the conda one). Hence, an immediate workaround is to use a Jupyter interface to download the dataset and then return to RStudio for the rest of your task.

Another alternative would be to create a different image without a conda environment to run your RStudio workflows, so that any python package (including TF or Keras) actually uses the "system" OpenSSL library instead of a conflicting one.

There might be other options involving fixes/enhancements at the RStudio level, but this is outside of our expertise to fix. If others have experience with RStudio and an idea for how to resolve this, please share your ideas!

Hopefully, all this information is useful for future readers!

Appendix

Things we have tried (and it did not work!)

First, we thought about syncing the OpenSSL versions in both environments ("system" and conda): #28. But that approach did NOT work!

Then we tried pointing the LD_LIBRARY_PATH environment variable to conda-specific OpenSSL-related paths (to "force" RStudio to load the expected OpenSSL libraries) but that approach also failed: #29 and triggered other potential issues!

Checking OpenSSL versions

To check the OpenSSL "system" version being used, you used the ldd command:

$ ldd /usr/lib/rstudio-server/bin/rserver | grep ssl        libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f0413833000)

and the "system" version was 1.1.1f. We also confirmed the installed version with the dpkg -l command:

$ dpkg -l | grep openssl
ii  libcurl4-openssl-dev:amd64           7.68.0-1ubuntu2.7                   amd64        development files and documentation for libcurl (OpenSSL flavour)
ii  openssl                              1.1.1f-1ubuntu2.12                  amd64        Secure Sockets Layer toolkit - cryptographic utility

To check the OpenSSL conda-associated version, we listed the openssl conda package and also directly checked the version:

$ conda list openssl
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
openssl                   1.1.1l               h7f98852_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
$ openssl version
OpenSSL 1.1.1l  24 Aug 2021

and the conda main environment had version 1.1.1l.

@damianavila damianavila moved this to Review / QA 👀 in Sprint Board Apr 20, 2022
@damianavila
Copy link
Contributor Author

Jupyter Discourse post was published here: https://discourse.jupyter.org/t/openssl-mismatch-between-rstudio-and-conda-environments/14123

@damianavila
Copy link
Contributor Author

Pinged Nathan on the ticket: https://2i2c.freshdesk.com/a/tickets/65?note=80136454078.

Closing this one now.

Repository owner moved this from Review / QA 👀 to Done 🎉 in Sprint Board May 5, 2022
Repository owner moved this from In progress to Complete in DEPRECATED Engineering and Product Backlog May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants