NaN values #33

ShengyuH · 2019-10-22T14:01:52Z

hi @HuguesTHOMAS
Thanks for your work and open-source code!

Under CUDA10.2, Ubuntu 18.04.3, tensorflow 1.12.0, GeForce GTX 1080 Ti, I successfully compiled cpp wrappers and tf_ops by removing the as mentioned tag. However, when I run train_ModelNet.py, everything goes well in first around two epochs, after around 2000 steps, I have the problem of NaN values in loss and acc. I compile tensorflow from source and under the same environment, I compiled other tf-user-ops and there's no problem there.

HuguesTHOMAS · 2019-10-22T14:37:08Z

Hi @HenrryBryant,

As explained here, it seems that CUDA 10 has internal issues leading to the apparitions of NaN values. Although these issues have only appeared when using a RTX 2080ti, I dont recommand using this version of CUDA.

Best,
Hugues

HuguesTHOMAS closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN values #33

NaN values #33

ShengyuH commented Oct 22, 2019

HuguesTHOMAS commented Oct 22, 2019

NaN values #33

NaN values #33

Comments

ShengyuH commented Oct 22, 2019

HuguesTHOMAS commented Oct 22, 2019