You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently trying to understand the code in [45], the function "loss_gradients".
I just want to ask, if in the line
loss_gradients['B1'] = dLdB1.sum(axis=0)
it should be written instead:
loss_gradients['B1'] = dLdB1
Reason:
The expression dLdB1 in my test project shows me, that the dimension of it is (hidden_size,1).
Also the dimension of weights['B1'] is (hidden_size,1).
If the expression additonally sum over all [hidden_size] entries, then each [hidden_size] entry of weights['B1'] is updated with the same value. That seems not correct for me.
Best Regards
The text was updated successfully, but these errors were encountered:
Dear Mr. Weidman,
I am currently trying to understand the code in [45], the function "loss_gradients".
I just want to ask, if in the line
loss_gradients['B1'] = dLdB1.sum(axis=0)
it should be written instead:
loss_gradients['B1'] = dLdB1
Reason:
The expression dLdB1 in my test project shows me, that the dimension of it is (hidden_size,1).
Also the dimension of weights['B1'] is (hidden_size,1).
If the expression additonally sum over all [hidden_size] entries, then each [hidden_size] entry of weights['B1'] is updated with the same value. That seems not correct for me.
Best Regards
The text was updated successfully, but these errors were encountered: