Training Steps #8

AmingWu · 2021-08-29T09:34:12Z

Thanks for your code. I have studied your code. The training process contains two stages. Firstly, the teacher and student networks are separately trained based on the same dataset. Then, the distillation is performed based on the pre-trained teacher and student networks.

Could you tell me this is right?

Thank you.

ggjy · 2021-08-31T06:17:10Z

First, the Teacher netowrk is trained based on the dataset, and then we will freeze the Teacher's weight, finally the random initialized student will be trained by the GT and teacher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Steps #8

Training Steps #8

AmingWu commented Aug 29, 2021

ggjy commented Aug 31, 2021

Training Steps #8

Training Steps #8

Comments

AmingWu commented Aug 29, 2021

ggjy commented Aug 31, 2021