Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarification in chap1 #4

Open
uballa opened this issue Dec 6, 2019 · 1 comment
Open

clarification in chap1 #4

uballa opened this issue Dec 6, 2019 · 1 comment

Comments

@uballa
Copy link

uballa commented Dec 6, 2019

In the below code, could you clarify why are calculating dLdN when you are not using in subsequent calculations

dLdS = np.ones_like(S)

dSdN = deriv(sigma, N)

dLdN = dLdS * dSdN

dNdX = np.transpose(W, (1, 0))

dLdX = np.dot(dSdN, dNdX)
return dLdX

@hopezh
Copy link

hopezh commented Mar 28, 2021

I have the same question: Why element-wise multiplication is applied to calculate dLdN = dLdS*dSdN, rather than matrix multiplication via either np.dot() or np.matmul()?

I assume this is to make the dimensionality of the rest derivatives correct, as shown in the comment following each derivative. But, I'm still confused...

def matrix_function_backward_sum_1(X: ndarray,
                                   W: ndarray,
                                   sigma: Array_Function) -> ndarray:
    '''
    Compute derivative of matrix function with a sum with respect to the
    first matrix input
    '''
    assert X.shape[1] == W.shape[0] # X: (m x n), W: (n x p)

    # matrix multiplication
    N = np.dot(X, W) # N: (m x p)

    # feeding the output of the matrix multiplication through sigma
    S = sigma(N) # S: (m x p)

    # sum all the elements
    L = np.sum(S) # L: a scalar 

    # note: I'll refer to the derivatives by their quantities here,
    # unlike the math where we referred to their function names

    # dLdS - just 1s
    dLdS = np.ones_like(S) # (m x p)

    # dSdN
    dSdN = deriv(sigma, N) # (m x p)
    
    # dLdN (element-wise multiplication)
    dLdN = dLdS * dSdN # (m x p) 

    # dNdX
    dNdX = np.transpose(W, (1, 0)) # (p x n)

    # dLdX
    dLdX = np.dot(dSdN, dNdX) # (m x p) x (p x n) = (m x n)

    return dLdX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants