Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Support #9

Open
mdbarnesUCSD opened this issue Nov 13, 2020 · 7 comments
Open

GPU Support #9

mdbarnesUCSD opened this issue Nov 13, 2020 · 7 comments

Comments

@mdbarnesUCSD
Copy link

Hello,

I am looking to run tensorsignatures on an AWS g3 instance. I was hoping to make use of the GPU support but am receiving an error. The AWS conda environment that I am using is tensorflow_p36 and comes with tensorflow-gpu version 1.15.3 installed. After running 'pip install tensorsignatures' the packages are:

tensorboard 1.15.0
tensorboard-plugin-wit 1.7.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.3
tensorflow-serving-api 1.15.0
tensorsignatures 0.5.0

The code runs when 'tensorflow 1.15.0 ' is installed but when it is just 'tensorflow-gpu 1.15.3' it does not (because tensorflow can not be imported).

Is there a way to verify that GPU is working?

Thank you!

@sagar87
Copy link
Owner

sagar87 commented Nov 16, 2020

Hi mdbarnesUCSD,

interesting, I didn't tried to run tensorsignatures on an AWS environment myself, but since it doesn't seem to be possible to import tensorflow there might be something wrong with the python installation, or the respective conda environment is not active. Could you try to open a Python shell on the AWS machine and test whether it is possible to import the package, that should look somewhat like this:

$ python
Python 3.6.1 (default, Sep 22 2017, 15:04:10)
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/home/hsv23/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
>>> tf.__version__
'1.5.0'

Otherwise it might help to downgrade tensorflow to 1.5.0. Perhaps running conda install tensorflow=1.15.0 works ?

To generally asses if the GPU is working you can try to run $ nvidia-smi or $ nvcc --version which should return the installed CUDA version on the AWS machine. Let me know if that helps.

@mdbarnesUCSD
Copy link
Author

Thanks for you response. I ran the $ nvidia-smi command and see that the gpu is being used. I was initially concerned that cpu was being used instead, but now see that it is functioning as expected.

Thanks!

@mdbarnesUCSD
Copy link
Author

Hello,

I am reopening this issue because when I run the GPU version of the code my GPU-Util is at 0% when running tensorsignatures train. I installed tensorsignatures gpu by running:
pip install --upgrade pip setuptools wheel && pip install -r requirements-gpu.txt

I am using an AWS g3 instance with the following 'tensor' packages:
tensorboard 1.15.0
tensorboard-plugin-wit 1.7.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.0
tensorflow-serving-api 1.15.0
tensorsignatures 0.5.0

Here is the output from running $ nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 40C P0 37W / 150W | 70MiB / 7618MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 26132 C ...ensorflow2_p36/bin/python 67MiB |
+-----------------------------------------------------------------------------+

Also, this is the output from $ nvcc-version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Please let me know if I can provide any additional information. Thank you!

@mdbarnesUCSD mdbarnesUCSD reopened this Feb 2, 2021
@sagar87
Copy link
Owner

sagar87 commented Feb 3, 2021

Hi mdbarnesUCSD,

this certainly looks wrong. Hard to diagnose the problem remotely ... Can you paste the output of pip freeze ? What happens if you execute this test script (taken from https://stackoverflow.com/questions/55691174/check-whether-tensorflow-is-running-on-gpu) ?

import tensorflow as tf
print(tf.__version__)
if tf.test.gpu_device_name():
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Please install GPU version of TF")

@mdbarnesUCSD
Copy link
Author

Here is the output from running the test script:

1.15.0
Please install GPU version of TF

Additionally, I ran this (from this stack overflow post):
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

and got this output:

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

Also, from the same post:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

producing this output:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2796909284998702720
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 9824271307598288295
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 6226426787505711414
physical_device_desc: "device: XLA_GPU device"
]

Here is the output from pip freeze:

absl-py==0.11.0
alabaster==0.7.12
anaconda-client==1.7.2
anaconda-project==0.8.3
argh==0.26.2
asn1crypto==1.3.0
astor==0.8.1
astroid==2.4.2
astropy==4.0
astunparse==1.6.3
atomicwrites==1.3.0
attrs==19.3.0
autopep8==1.4.4
autovizwidget==0.16.0
Babel==2.8.0
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.8.2
bitarray==1.2.1
bkcharts==0.2
bleach==1.5.0
bokeh==1.4.0
boto==2.49.0
boto3==1.16.9
botocore==1.19.9
Bottleneck==1.3.2
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.0
chardet==3.0.4
Click==7.0
cloudpickle==1.3.0
clyent==1.2.2
colorama==0.4.3
contextlib2==0.6.0.post1
cryptography==2.8
cycler==0.10.0
Cython==0.29.15
cytoolz==0.10.1
dask==2.11.0
decorator==4.4.1
defusedxml==0.6.0
diff-match-patch==20181111
distributed==2.11.0
docutils==0.16
entrypoints==0.3
environment-kernels==1.1.1
et-xmlfile==1.0.1
fastcache==1.1.0
filelock==3.0.12
flake8==3.7.9
Flask==1.1.1
flatbuffers==1.12
fsspec==0.6.2
future==0.18.2
gast==0.2.2
gevent==1.4.0
glob2==0.7
gmpy2==2.0.8
google-auth==1.23.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
greenlet==0.4.15
grpcio==1.32.0
h5py==2.10.0
hdijupyterutils==0.16.0
HeapDict==1.0.1
horovod==0.19.5
html5lib==0.9999999
hypothesis==5.5.4
idna==2.8
imageio==2.6.1
imagesize==1.2.0
importlib-metadata==1.5.0
intervaltree==3.0.2
ipykernel==5.1.4
ipyparallel @ file:///tmp/build/80754af9/ipyparallel_1593440601845/work
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.21
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.14.1
jeepney==0.4.2
Jinja2==2.11.1
jmespath @ file:///tmp/build/80754af9/jmespath_1594304593830/work
joblib==0.14.1
json5==0.9.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.1
jupyterlab==1.2.6
jupyterlab-server==1.0.6
Keras==2.3.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keyring==21.1.0
kiwisolver==1.1.0
lazy-object-proxy==1.4.3
libarchive-c==2.8
lief==0.9.0
llvmlite==0.31.0
locket==0.2.0
lxml==4.5.0
Markdown==3.3.3
MarkupSafe==1.1.1
matplotlib==3.1.2
mccabe==0.6.1
mistune==0.8.4
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
mock==4.0.1
more-itertools==8.2.0
mpi4py==3.0.3
mpmath==1.1.0
msgpack==0.6.1
multipledispatch==0.6.0
nb-conda==2.2.1
nb-conda-kernels @ file:///tmp/build/80754af9/nb_conda_kernels_1598624781735/work
nbconvert==5.6.1
nbformat==5.0.4
networkx==2.4
nltk==3.4.5
nose==1.3.7
notebook==6.0.3
numba==0.48.0
numexpr==2.7.1
numpy==1.16.1
numpydoc==0.9.2
oauthlib==3.1.0
olefile==0.46
opencv-python==4.2.0.32
openpyxl==3.0.3
opt-einsum==3.3.0
packaging==20.1
pandas==0.25.3
pandocfilters==1.4.2
parso==0.5.2
partd==1.1.0
path==13.1.0
pathlib2==2.3.5
pathtools==0.1.2
patsy==0.5.1
pep8==1.7.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.0.0
pkginfo==1.5.0.1
plotly==4.12.0
pluggy==0.13.1
ply==3.11
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.14.0
protobuf3-to-dict==0.1.5
psutil==5.6.7
psycopg2==2.7.5
PTable==0.9.2
ptyprocess==0.6.0
py==1.8.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.5.0
pycosat==0.6.3
pycparser==2.19
pycrypto==2.6.1
pycurl==7.43.0.5
pydocstyle==4.0.1
pyflakes==2.1.1
pygal==2.4.0
Pygments==2.5.2
pykerberos==1.2.1
pylint==2.5.3
pyodbc===4.0.0-unsupported
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.7
PySocks==1.7.1
pytest==5.3.5
pytest-arraydiff==0.3
pytest-astropy==0.8.0
pytest-astropy-header==0.1.2
pytest-doctestplus==0.5.0
pytest-openfiles==0.4.0
pytest-remotedata==0.3.2
python-dateutil==2.8.1
python-jsonrpc-server==0.3.4
python-language-server==0.31.7
pytz==2019.3
PyWavelets==1.1.1
pyxdg==0.26
PyYAML==5.3.1
pyzmq==18.1.1
QDarkStyle==2.8
QtAwesome==0.6.1
qtconsole==4.6.0
QtPy==1.9.0
requests==2.22.0
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
retrying==1.3.3
rope==0.16.0
rsa==4.6
Rtree==0.9.3
ruamel-yaml==0.15.87
s3fs==0.4.2
s3transfer==0.3.3
sagemaker==2.16.1
scikit-image==0.16.2
scikit-learn==0.21.3
scipy==1.3.2
seaborn==0.10.0
SecretStorage==3.1.2
Send2Trash==1.5.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.15.0
smdebug-rulesconfig==0.1.5
snowballstemmer==2.0.0
sortedcollections==1.1.2
sortedcontainers==2.1.0
soupsieve==1.9.5
sparkmagic==0.15.0
Sphinx==2.4.0
sphinxcontrib-applehelp==1.0.1
sphinxcontrib-devhelp==1.0.1
sphinxcontrib-htmlhelp==1.0.2
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.2
sphinxcontrib-serializinghtml==1.1.3
sphinxcontrib-websupport==1.2.0
spyder==4.0.1
spyder-kernels==1.8.1
SQLAlchemy==1.3.13
statsmodels==0.11.0
sympy==1.5.1
tables==3.6.1
tblib==1.6.0
tensorboard==1.15.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
tensorflow-serving-api==1.15.0
tensorsignatures==0.5.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.1
toolz==0.10.0
tornado==6.0.3
tqdm==4.39.0
traitlets==4.3.3
typed-ast==1.4.1
typing-extensions==3.7.4.3
ujson==1.35
unicodecsv==0.14.1
urllib3==1.25.10
watchdog==0.10.2
wcwidth==0.1.8
webencodings==0.5.1
Werkzeug==1.0.0
widgetsnbextension==3.5.1
wrapt==1.12.1
wurlitzer==2.0.0
xlrd==1.2.0
XlsxWriter==1.2.7
xlwt==1.3.0
yapf==0.28.0
zict==1.0.0
zipp==2.2.0

Thanks for the help. Please let me know if I can provide any more information.

@mdbarnesUCSD
Copy link
Author

mdbarnesUCSD commented Feb 6, 2021

Issue resolved when I switched from tensorflow2_p36 to tensorflow_p36 AWS environment and ran the installation commands from README:

pip install --upgrade pip setuptools wheel && pip install -r requirements-gpu.txt
python setup.py install

The version for tensorflow-gpu==1.15.3.

@sagar87
Copy link
Owner

sagar87 commented Feb 6, 2021

That's great! Thanks for letting me know. I am thinking about porting the code to TF2 or Pytorch, TF1 is indeed a pain in the neck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants