Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the results of the paper #4

Open
Goingqs opened this issue Oct 1, 2018 · 6 comments
Open

Reproduce the results of the paper #4

Goingqs opened this issue Oct 1, 2018 · 6 comments

Comments

@Goingqs
Copy link

Goingqs commented Oct 1, 2018

Can this code reproduce the results of the paper?I got 24.61% top-1 error.

@Goingqs
Copy link
Author

Goingqs commented Oct 1, 2018

I set the target rate 0.7 and follow the standard ResNet training procedure.

@andreasveit
Copy link
Owner

Yes, the code should be able to produce the results from the paper. I assume you trained a model based upon ResNet-50? Can you please provide more details? For example, what is the average execution rate of your trained model?
Further, what batch-size did you use? I suggest training with a batch-size of 256 (thats the standard for ResNets and is also used in the paper) or even larger, since the effective batch-size per layer is lower with low execution rates.

@Goingqs
Copy link
Author

Goingqs commented Oct 2, 2018

average execution rate is 0.8585,my batch size is 2048. I remove all the fc1bn.

I find that fc1bn will degrade result. top1 error is 25.324 with fc1bn.

@Goingqs
Copy link
Author

Goingqs commented Oct 2, 2018

I get 25.32 top-1 error and the average execution rate is 0.8452. The batch size is 512 without fc1bn.
I think larger batch size is better. So what's the problem? Please help me ~~

@PerdonLiu
Copy link

I can not reproduce the result, either.
On CIFAR-10, I used exactly the same setting as paper did(batch-size 256, epoch 350, target rate 0.7) but got 6.68% top-1 error.

@adrianloy
Copy link

@Goingqs @PerdonLiu the readme says "Specifically, for the results in the paper the following target rate schedules are used for ResNet 50: [1, 1, 0.8, 1, t, t, t, 1, t, t, t, t, t, 1, 0.7, 1] for t in [0.4, 0.5, 0.6, 0.7] " Did you do that, or use target rate 0.7 for all gates? I do not understand how this code allows to have different target rates per layer, the arg parser expects a float and I also cant see adjustment for layer-specifid target rates in other parts of the code where I would expect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants