Loss NaN, incorrect value Nan or Inf in input Tensor #27

wangq95 · 2020-03-18T08:24:53Z

I wonder if there are special processes for training data, or other solutions to solve this proplem.
Thanks a lot.

huaifeng1993 · 2020-03-18T11:18:57Z

Trying smaller learning rate. The new model has not tested.

wangq95 · 2020-03-19T05:50:34Z

@huaifeng1993 Hi, I use the default XceptionA as backbone network without pre-trained weigths, as the pre-training costs to much time. I try to decrease the learning rate to 1e-2, 1e-3 and 1e-4, nothing changed, and I also print out the information of input images and labels, which didn't have any inf or nan value, but the output of DFANet had.

huaifeng1993 · 2020-03-20T13:01:42Z

what is time you downloaded the dfanet.py. there is a version that has the problem you meet. checking the dfanet.py see if the code same as what you downloaded.

wangq95 · 2020-03-21T09:29:44Z

@huaifeng1993 Yep, loss decreases in a right way. But I find the usage of GPU is too low, and the speed of inference is only 21 fps with default resolution on Tesla V100. What is the key limitation do you think, is the CPU I/O or the number of workers?

huaifeng1993 · 2020-03-21T12:09:30Z

i think i/o takes a lot of time.But you can try loading data into gpu previously when you test the model inference's speed.

xhding1997 · 2021-03-24T02:39:58Z

@huaifeng1993 Yep, loss decreases in a right way. But I find the usage of GPU is too low, and the speed of inference is only 21 fps with default resolution on Tesla V100. What is the key limitation do you think, is the CPU I/O or the number of workers?

hellow, can you share your XceptionA backbone?
i can't find it anywhere.
THANKS !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss NaN, incorrect value Nan or Inf in input Tensor #27

Loss NaN, incorrect value Nan or Inf in input Tensor #27

wangq95 commented Mar 18, 2020

huaifeng1993 commented Mar 18, 2020

wangq95 commented Mar 19, 2020

huaifeng1993 commented Mar 20, 2020

wangq95 commented Mar 21, 2020

huaifeng1993 commented Mar 21, 2020

xhding1997 commented Mar 24, 2021

Loss NaN, incorrect value Nan or Inf in input Tensor #27

Loss NaN, incorrect value Nan or Inf in input Tensor #27

Comments

wangq95 commented Mar 18, 2020

huaifeng1993 commented Mar 18, 2020

wangq95 commented Mar 19, 2020

huaifeng1993 commented Mar 20, 2020

wangq95 commented Mar 21, 2020

huaifeng1993 commented Mar 21, 2020

xhding1997 commented Mar 24, 2021