understanding the weight learning algorithm #18

gittea-rpi · 2024-10-08T15:01:14Z

In the paper, the weights are the solution to equation (8), which minimizes the squared frobenius norms of the weighted RFF covariance matrices for each pair of features, subject to the constraint that the weights are a probability distribution.

In the code, the weight_learner function solves this problem (?) by using gradient descent on a modified objective that combines the squared frobenius norms of the weighted RFF covariance matrices and a lp norm of the weight vector. What is the purpose of the lp norm on the weight vector (which is already created using softmax on logits, so it is a probability vector)?

Does this somehow ensure that the logits don't go off to infinity? If that is the aim, why not directly regularize by the size of the logits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

understanding the weight learning algorithm #18

understanding the weight learning algorithm #18

gittea-rpi commented Oct 8, 2024

understanding the weight learning algorithm #18

understanding the weight learning algorithm #18

Comments

gittea-rpi commented Oct 8, 2024