-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in load_importance_loss #167
Comments
Hi I think this happens because the default And if we do normal = Normal(0, 0.0) it's weird, why we have a normal distribution with zero variance? and it returns *** ValueError: Expected parameter scale (Tensor of shape ()) of distribution Normal(loc: 0.0, scale: 0.0) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
0.0 |
If I preset gate_noise to 1.0, I think the code run without problems but I am not sure if it's numerically correct? gate_type={'type': 'top', 'k': 2, 'fp32_gate': False, 'gate_noise': 1.0, }, |
Hi @Luodian, yes, you need to set |
Yep, and I also found an issue when using cosine projector. It seems that in logit_scale = torch.clamp(self.temperature, max=torch.log(torch.tensor(1. / 0.01)).cuda()).exp() |
We have added |
Hi I had the errors when using
load_importance_loss
(the code works fine when usinggshard_loss
). Does anyone have an idea about it?The error log (in one rank/node) is in below:
The text was updated successfully, but these errors were encountered: