Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terra-jupyter-r question about multi-core operations #210

Open
sjfleming opened this issue Apr 19, 2021 · 13 comments
Open

terra-jupyter-r question about multi-core operations #210

sjfleming opened this issue Apr 19, 2021 · 13 comments

Comments

@sjfleming
Copy link

sjfleming commented Apr 19, 2021

Hello! This is a question that's going to be a little bit incomplete... but here it goes:

I have been basing an image off of the terra-jupyter-r:1.0.4 base image for use in Terra notebooks.

I just recently realized that, when I try to run commands like lmFit() in the limma package (for differential expression analysis), only one processor is being used. When I have set up my own R installations in the past, limma's differential expression testing has always automatically parallelized itself over all the available cores. Unfortunately I do not understand much about multi-core computing in R... but I have come to rely on that factor of 16 speedup I get on a 16-cpu machine.

My question:
Is there something about the R installation in terra-jupyter-r that changes a default somewhere for how multi-core operations happen? Something that might have broken the default behavior, which is to parallelize operations over all available cores?

(A bit more information... if I would watch processes in top in the past, there were two steps where I would see multi-core operations: one would show one process using 1600% compute, and the other operation would show 16 processes each using 100% compute. I am guessing they used different means of parallelizing their operations. But they both work on my own R installation, and they both fail to work (they show one process using 100% compute) on terra-jupyter-r.
Working install shows

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /home/sfleming/miniconda3/envs/scanpy15/lib/libopenblasp-r0.3.7.so

terra-jupyter-r:1.0.4 shows

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

)
I am not sure what else to look for!

Any tips would be much appreciated!

@sjfleming
Copy link
Author

sjfleming commented May 4, 2021

It looks like this could be addressed in a number of ways (either directly in R, or using conda), by installing OpenBLAS and pointing R to that installation.

@rtitle
Copy link
Collaborator

rtitle commented May 4, 2021

Hey @sjfleming!

Sorry I didn't see this until now. I am not sure the answer, but am curious. Do you have any example R code that demonstrates the issue?

@rtitle
Copy link
Collaborator

rtitle commented May 4, 2021

Yeah installing an optimized BLAS sounds like it could be a good idea: https://csantill.github.io/RPerformanceWBLAS/

@rtitle
Copy link
Collaborator

rtitle commented May 4, 2021

I simply ran in my runtime:

sudo apt-get install libopenblas-base

and I now see:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

Compared to previous:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

So @sjfleming if you add that apt-get line to your custom image, does it cause R to use all cores? If that fixes it, we can add that to the terra-jupyter-r image I think.

@rtitle
Copy link
Collaborator

rtitle commented May 4, 2021

Also (this is a long shot), we recently changed the default notebook runtime in the UI to a 1 CPU machine. So just double-check your runtime actually has >1 CPUs. :)

@nturaga
Copy link
Collaborator

nturaga commented May 4, 2021

You can try parallel::detectCores()

@sjfleming
Copy link
Author

sjfleming commented May 4, 2021

Hi @rtitle thanks for your response! Yes, I think your apt-get solution might work, and I will let you know if it works when included in a custom docker image built on terra-jupyter-r.

Here's an example of a minimal piece of code that demonstrates the difference:

m <- matrix(rnorm(9000000), nrow=3000)
d <- solve(m)

In my little experiments, this takes about 30 seconds with the current setup, but on a 16-cpu machine with OpenBLAS installed, it runs in about 1 second. If you watch top while the code runs, you can see that all the cores blast off when using OpenBLAS (though it's all finished very quickly anyway).

@sjfleming
Copy link
Author

Image built on top of terra-jupyter-r
image

An install with OpenBLAS on a 16 CPU machine
image

@rtitle
Copy link
Collaborator

rtitle commented May 5, 2021

Cool, let us know how it goes @sjfleming. I also created a JIRA to track the work if we decide to do this on the Terra side: https://broadworkbench.atlassian.net/browse/IA-2736

@sjfleming
Copy link
Author

sjfleming commented May 18, 2021

Alright @rtitle ... I tried interactively running

sudo apt-get update
sudo apt-get install -yq --no-install-recommends libopenblas-base

inside my docker image (built on top of terra-jupyter-r:1.0.4). This installation succeeded (if I ran as root), and when I ran R, it seemed like the above test passed. So that's good news, and it seems promising.

The following lines in my Dockerfile seem to do the trick (and this time I built the image on top of the current terra-jupyter-r:1.0.14)

USER root
RUN sudo apt-get update \
 && sudo apt-get install -yq --no-install-recommends libopenblas-base
USER $USER

I assume the same could be done as part of the terra-jupyter-r image.

@kyuksel
Copy link
Contributor

kyuksel commented May 19, 2021

@sjfleming I'm curious if including the line

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

or the line

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

before the command

sudo apt-get update

would fix the issue.

@rtitle
Copy link
Collaborator

rtitle commented May 19, 2021

Maybe there was an edit.. sounds like @sjfleming is saying the above lines worked. 🎉

I agree we should fold that in to the terra-jupyter-r image, and perhaps also the Bioconductor RStudio image (which is a separate Docker hierarchy).

@sjfleming
Copy link
Author

Thanks @kyuksel , I realized I had screwed with the key I was using for R later on in my image... but if I put the necessary lines before the part where I was messing around later, then it worked alright. I'm grateful for the suggestion though if I run into that again, which I probably will. I never know what to do in those cases, so that's very helpful.

@rtitle sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants