You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In OptimLP, the gradient scaling factor is multiplied before quantization. However, grad scaling is meant to prevent possible underflow of low precision quantized gradient values. I think the current implementation cannot prevent underflow.
Maybe the correct implementation is to multiply the scaling factor after quantization.
QPyTorch/qtorch/optim/optim_low.py
Line 81 in ed0d8b1
In
OptimLP
, the gradient scaling factor is multiplied before quantization. However, grad scaling is meant to prevent possible underflow of low precision quantized gradient values. I think the current implementation cannot prevent underflow.Maybe the correct implementation is to multiply the scaling factor after quantization.
The text was updated successfully, but these errors were encountered: