You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the same question: Why element-wise multiplication is applied to calculate dLdN = dLdS*dSdN, rather than matrix multiplication via either np.dot() or np.matmul()?
I assume this is to make the dimensionality of the rest derivatives correct, as shown in the comment following each derivative. But, I'm still confused...
def matrix_function_backward_sum_1(X: ndarray,
W: ndarray,
sigma: Array_Function) -> ndarray:
'''
Compute derivative of matrix function with a sum with respect to the
first matrix input
'''
assert X.shape[1] == W.shape[0] # X: (m x n), W: (n x p)
# matrix multiplication
N = np.dot(X, W) # N: (m x p)
# feeding the output of the matrix multiplication through sigma
S = sigma(N) # S: (m x p)
# sum all the elements
L = np.sum(S) # L: a scalar
# note: I'll refer to the derivatives by their quantities here,
# unlike the math where we referred to their function names
# dLdS - just 1s
dLdS = np.ones_like(S) # (m x p)
# dSdN
dSdN = deriv(sigma, N) # (m x p)
# dLdN (element-wise multiplication)
dLdN = dLdS * dSdN # (m x p)
# dNdX
dNdX = np.transpose(W, (1, 0)) # (p x n)
# dLdX
dLdX = np.dot(dSdN, dNdX) # (m x p) x (p x n) = (m x n)
return dLdX
In the below code, could you clarify why are calculating dLdN when you are not using in subsequent calculations
dLdS = np.ones_like(S)
dSdN = deriv(sigma, N)
dLdN = dLdS * dSdN
dNdX = np.transpose(W, (1, 0))
dLdX = np.dot(dSdN, dNdX)
return dLdX
The text was updated successfully, but these errors were encountered: