Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modifying environmental variable OPENBLAS_NUM_THREADS #62

Closed
gfudenberg opened this issue Mar 29, 2023 · 4 comments
Closed

modifying environmental variable OPENBLAS_NUM_THREADS #62

gfudenberg opened this issue Mar 29, 2023 · 4 comments
Assignees

Comments

@gfudenberg
Copy link
Collaborator

probably not ideal to modify environmental variables inside of scripts

we should trace back why this was introduced & probably remove across the library
cc @PSmaruj @Kamulegeya-Fahad

@gfudenberg gfudenberg changed the title modifying environmental variables in scripts modifying environmental variable OPENBLAS_NUM_THREADS in scripts Mar 29, 2023
@gfudenberg gfudenberg changed the title modifying environmental variable OPENBLAS_NUM_THREADS in scripts modifying environmental variable OPENBLAS_NUM_THREADS Mar 29, 2023
@PSmaruj
Copy link
Collaborator

PSmaruj commented May 15, 2023

I found this (my chat with Geoff from September 17th):

I’ve modified the multi-script, although there are some problems:
I got a weird error at the beginning:

(basenji) [smaruj@discovery2 insert_virtual_flanks_experiment]$ python multiGPU-virtual_symmetric_experiment.py /project/fudenber_735/tensorflow_models/akita/v2/models/f0c0/train/params.json /project/fudenber_735/tensorflow_models/akita/v2/models/f0c0/train/model1_best.h5 out.tsv -f /project/fudenber_735/genomes/mm10/mm10.fa --head-index 1 --model-index 1 --batch-size 4  --stats SCD,INS-16 -p 4
OpenBLAS blas_thread_init: pthread_create failed for thread 31 of 32: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1031243 max
Traceback (most recent call last):
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/numpy/core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/numpy/core/multiarray.py", line 12, in <module>
    from . import overrides
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/numpy/core/overrides.py", line 7, in <module>
    from numpy.core._multiarray_umath import (
ImportError: PyCapsule_Import could not import module "datetime"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "multiGPU-virtual_symmetric_experiment.py", line 36, in <module>
    import h5py
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/h5py/__init__.py", line 34, in <module>
    from . import version
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/h5py/version.py", line 19, in <module>
    import numpy
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/numpy/__init__.py", line 145, in <module>
    from . import core
  File "/home1/smaruj/miniconda3/envs/basenji/lib/python3.8/site-packages/numpy/core/__init__.py", line 48, in <module>
    raise ImportError(msg)
ImportError:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.8 from "/home1/smaruj/miniconda3/envs/basenji/bin/python"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: PyCapsule_Import could not import module "datetime"

I’ve found some guidance here: numpy/numpy#14474
I’ve added a line os.environ['OPENBLAS_NUM_THREADS'] = '1' after importing os, and the problem disappeared.

@PSmaruj
Copy link
Collaborator

PSmaruj commented May 15, 2023

I've just checked and both my experiments work well when the line os.environ["OPENBLAS_NUM_THREADS"] = "1" is removed. There might have been changes in CARC jobs setup/limitations and now it works without this line (?).
@gfudenberg Can I assume that now I can remove this line from all my scripts?

@gfudenberg
Copy link
Collaborator Author

my guess is that CARC raised errors if scripts were run on head nodes w/o setting this environmental variable (otherwise the script would grab all the possible threads & lead to issues with login etc)

so yes, my guess is it is good to remove from all scripts (b/c on compute nodes you might want to grab all the threads allocated!)

@PSmaruj
Copy link
Collaborator

PSmaruj commented May 16, 2023

I think that this CARC limit still exists since I tried to run a multi-GPU script without the openblas line on the home node and I recreated exactly the same error. The error doesn't show up when I run the same script on the GPU setup (partition debug).
I usually do run a multi-GPU script on the home node since it submits the GPU slurm-jobs and only collects the data at the end. I do agree that the regular scripts that are run on GPU should not have this line: os.environ["OPENBLAS_NUM_THREADS"] = "1", although I think that it can be left in the multi-GPU scripts as long as they are run on the home node (?). I preferred this option so far since I was able to run them under screen.
@Kamulegeya-Fahad How do you run multi-GPU scripts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants