GPU Support #9

mdbarnesUCSD · 2020-11-13T01:57:26Z

Hello,

I am looking to run tensorsignatures on an AWS g3 instance. I was hoping to make use of the GPU support but am receiving an error. The AWS conda environment that I am using is tensorflow_p36 and comes with tensorflow-gpu version 1.15.3 installed. After running 'pip install tensorsignatures' the packages are:

tensorboard 1.15.0
tensorboard-plugin-wit 1.7.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.3
tensorflow-serving-api 1.15.0
tensorsignatures 0.5.0

The code runs when 'tensorflow 1.15.0 ' is installed but when it is just 'tensorflow-gpu 1.15.3' it does not (because tensorflow can not be imported).

Is there a way to verify that GPU is working?

Thank you!

sagar87 · 2020-11-16T07:12:30Z

Hi mdbarnesUCSD,

interesting, I didn't tried to run tensorsignatures on an AWS environment myself, but since it doesn't seem to be possible to import tensorflow there might be something wrong with the python installation, or the respective conda environment is not active. Could you try to open a Python shell on the AWS machine and test whether it is possible to import the package, that should look somewhat like this:

$ python
Python 3.6.1 (default, Sep 22 2017, 15:04:10)
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/home/hsv23/tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
>>> tf.__version__
'1.5.0'

Otherwise it might help to downgrade tensorflow to 1.5.0. Perhaps running conda install tensorflow=1.15.0 works ?

To generally asses if the GPU is working you can try to run $ nvidia-smi or $ nvcc --version which should return the installed CUDA version on the AWS machine. Let me know if that helps.

mdbarnesUCSD · 2020-11-23T02:18:01Z

Thanks for you response. I ran the $ nvidia-smi command and see that the gpu is being used. I was initially concerned that cpu was being used instead, but now see that it is functioning as expected.

Thanks!

mdbarnesUCSD · 2021-02-02T08:41:46Z

Hello,

I am reopening this issue because when I run the GPU version of the code my GPU-Util is at 0% when running tensorsignatures train. I installed tensorsignatures gpu by running:
pip install --upgrade pip setuptools wheel && pip install -r requirements-gpu.txt

I am using an AWS g3 instance with the following 'tensor' packages:
tensorboard 1.15.0
tensorboard-plugin-wit 1.7.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.0
tensorflow-serving-api 1.15.0
tensorsignatures 0.5.0

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 26132 C ...ensorflow2_p36/bin/python 67MiB |
+-----------------------------------------------------------------------------+

Also, this is the output from $ nvcc-version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Please let me know if I can provide any additional information. Thank you!

sagar87 · 2021-02-03T06:17:57Z

Hi mdbarnesUCSD,

this certainly looks wrong. Hard to diagnose the problem remotely ... Can you paste the output of pip freeze ? What happens if you execute this test script (taken from https://stackoverflow.com/questions/55691174/check-whether-tensorflow-is-running-on-gpu) ?

import tensorflow as tf
print(tf.__version__)
if tf.test.gpu_device_name():
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
    print("Please install GPU version of TF")

mdbarnesUCSD · 2021-02-03T08:50:49Z

Here is the output from running the test script:

1.15.0
Please install GPU version of TF

Additionally, I ran this (from this stack overflow post):
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

and got this output:

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

Also, from the same post:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

producing this output:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2796909284998702720
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 9824271307598288295
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 6226426787505711414
physical_device_desc: "device: XLA_GPU device"
]

Here is the output from pip freeze:

absl-py==0.11.0
alabaster==0.7.12
anaconda-client==1.7.2
anaconda-project==0.8.3
argh==0.26.2
asn1crypto==1.3.0
astor==0.8.1
astroid==2.4.2
astropy==4.0
astunparse==1.6.3
atomicwrites==1.3.0
attrs==19.3.0
autopep8==1.4.4
autovizwidget==0.16.0
Babel==2.8.0
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.8.2
bitarray==1.2.1
bkcharts==0.2
bleach==1.5.0
bokeh==1.4.0
boto==2.49.0
boto3==1.16.9
botocore==1.19.9
Bottleneck==1.3.2
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.0
chardet==3.0.4
Click==7.0
cloudpickle==1.3.0
clyent==1.2.2
colorama==0.4.3
contextlib2==0.6.0.post1
cryptography==2.8
cycler==0.10.0
Cython==0.29.15
cytoolz==0.10.1
dask==2.11.0
decorator==4.4.1
defusedxml==0.6.0
diff-match-patch==20181111
distributed==2.11.0
docutils==0.16
entrypoints==0.3
environment-kernels==1.1.1
et-xmlfile==1.0.1
fastcache==1.1.0
filelock==3.0.12
flake8==3.7.9
Flask==1.1.1
flatbuffers==1.12
fsspec==0.6.2
future==0.18.2
gast==0.2.2
gevent==1.4.0
glob2==0.7
gmpy2==2.0.8
google-auth==1.23.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
greenlet==0.4.15
grpcio==1.32.0
h5py==2.10.0
hdijupyterutils==0.16.0
HeapDict==1.0.1
horovod==0.19.5
html5lib==0.9999999
hypothesis==5.5.4
idna==2.8
imageio==2.6.1
imagesize==1.2.0
importlib-metadata==1.5.0
intervaltree==3.0.2
ipykernel==5.1.4
ipyparallel @ file:///tmp/build/80754af9/ipyparallel_1593440601845/work
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.21
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.14.1
jeepney==0.4.2
Jinja2==2.11.1
jmespath @ file:///tmp/build/80754af9/jmespath_1594304593830/work
joblib==0.14.1
json5==0.9.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.1
jupyterlab==1.2.6
jupyterlab-server==1.0.6
Keras==2.3.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keyring==21.1.0
kiwisolver==1.1.0
lazy-object-proxy==1.4.3
libarchive-c==2.8
lief==0.9.0
llvmlite==0.31.0
locket==0.2.0
lxml==4.5.0
Markdown==3.3.3
MarkupSafe==1.1.1
matplotlib==3.1.2
mccabe==0.6.1
mistune==0.8.4
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
mock==4.0.1
more-itertools==8.2.0
mpi4py==3.0.3
mpmath==1.1.0
msgpack==0.6.1
multipledispatch==0.6.0
nb-conda==2.2.1
nb-conda-kernels @ file:///tmp/build/80754af9/nb_conda_kernels_1598624781735/work
nbconvert==5.6.1
nbformat==5.0.4
networkx==2.4
nltk==3.4.5
nose==1.3.7
notebook==6.0.3
numba==0.48.0
numexpr==2.7.1
numpy==1.16.1
numpydoc==0.9.2
oauthlib==3.1.0
olefile==0.46
opencv-python==4.2.0.32
openpyxl==3.0.3
opt-einsum==3.3.0
packaging==20.1
pandas==0.25.3
pandocfilters==1.4.2
parso==0.5.2
partd==1.1.0
path==13.1.0
pathlib2==2.3.5
pathtools==0.1.2
patsy==0.5.1
pep8==1.7.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.0.0
pkginfo==1.5.0.1
plotly==4.12.0
pluggy==0.13.1
ply==3.11
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.14.0
protobuf3-to-dict==0.1.5
psutil==5.6.7
psycopg2==2.7.5
PTable==0.9.2
ptyprocess==0.6.0
py==1.8.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.5.0
pycosat==0.6.3
pycparser==2.19
pycrypto==2.6.1
pycurl==7.43.0.5
pydocstyle==4.0.1
pyflakes==2.1.1
pygal==2.4.0
Pygments==2.5.2
pykerberos==1.2.1
pylint==2.5.3
pyodbc===4.0.0-unsupported
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.7
PySocks==1.7.1
pytest==5.3.5
pytest-arraydiff==0.3
pytest-astropy==0.8.0
pytest-astropy-header==0.1.2
pytest-doctestplus==0.5.0
pytest-openfiles==0.4.0
pytest-remotedata==0.3.2
python-dateutil==2.8.1
python-jsonrpc-server==0.3.4
python-language-server==0.31.7
pytz==2019.3
PyWavelets==1.1.1
pyxdg==0.26
PyYAML==5.3.1
pyzmq==18.1.1
QDarkStyle==2.8
QtAwesome==0.6.1
qtconsole==4.6.0
QtPy==1.9.0
requests==2.22.0
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
retrying==1.3.3
rope==0.16.0
rsa==4.6
Rtree==0.9.3
ruamel-yaml==0.15.87
s3fs==0.4.2
s3transfer==0.3.3
sagemaker==2.16.1
scikit-image==0.16.2
scikit-learn==0.21.3
scipy==1.3.2
seaborn==0.10.0
SecretStorage==3.1.2
Send2Trash==1.5.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.15.0
smdebug-rulesconfig==0.1.5
snowballstemmer==2.0.0
sortedcollections==1.1.2
sortedcontainers==2.1.0
soupsieve==1.9.5
sparkmagic==0.15.0
Sphinx==2.4.0
sphinxcontrib-applehelp==1.0.1
sphinxcontrib-devhelp==1.0.1
sphinxcontrib-htmlhelp==1.0.2
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.2
sphinxcontrib-serializinghtml==1.1.3
sphinxcontrib-websupport==1.2.0
spyder==4.0.1
spyder-kernels==1.8.1
SQLAlchemy==1.3.13
statsmodels==0.11.0
sympy==1.5.1
tables==3.6.1
tblib==1.6.0
tensorboard==1.15.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
tensorflow-serving-api==1.15.0
tensorsignatures==0.5.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.1
toolz==0.10.0
tornado==6.0.3
tqdm==4.39.0
traitlets==4.3.3
typed-ast==1.4.1
typing-extensions==3.7.4.3
ujson==1.35
unicodecsv==0.14.1
urllib3==1.25.10
watchdog==0.10.2
wcwidth==0.1.8
webencodings==0.5.1
Werkzeug==1.0.0
widgetsnbextension==3.5.1
wrapt==1.12.1
wurlitzer==2.0.0
xlrd==1.2.0
XlsxWriter==1.2.7
xlwt==1.3.0
yapf==0.28.0
zict==1.0.0
zipp==2.2.0

Thanks for the help. Please let me know if I can provide any more information.

mdbarnesUCSD · 2021-02-06T03:34:34Z

Issue resolved when I switched from tensorflow2_p36 to tensorflow_p36 AWS environment and ran the installation commands from README:

pip install --upgrade pip setuptools wheel && pip install -r requirements-gpu.txt
python setup.py install

The version for tensorflow-gpu==1.15.3.

sagar87 · 2021-02-06T07:38:42Z

That's great! Thanks for letting me know. I am thinking about porting the code to TF2 or Pytorch, TF1 is indeed a pain in the neck.

mdbarnesUCSD closed this as completed Nov 23, 2020

mdbarnesUCSD mentioned this issue Feb 2, 2021

Question about TensorSignatures #13

Open

mdbarnesUCSD reopened this Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Support #9

GPU Support #9

mdbarnesUCSD commented Nov 13, 2020

sagar87 commented Nov 16, 2020

mdbarnesUCSD commented Nov 23, 2020

mdbarnesUCSD commented Feb 2, 2021

sagar87 commented Feb 3, 2021

mdbarnesUCSD commented Feb 3, 2021

mdbarnesUCSD commented Feb 6, 2021 •

edited

Loading

sagar87 commented Feb 6, 2021

GPU Support #9

GPU Support #9

Comments

mdbarnesUCSD commented Nov 13, 2020

sagar87 commented Nov 16, 2020

mdbarnesUCSD commented Nov 23, 2020

mdbarnesUCSD commented Feb 2, 2021

sagar87 commented Feb 3, 2021

mdbarnesUCSD commented Feb 3, 2021

mdbarnesUCSD commented Feb 6, 2021 • edited Loading

sagar87 commented Feb 6, 2021

mdbarnesUCSD commented Feb 6, 2021 •

edited

Loading