-
I'm trying to compute the NTK kernel, where the labels are k dimensional vectors. But no matter how I changed the width of the output layer, the dimension of the NTK kernel is always |D|*|D|, where D is the size of the inputs, instead of k|D| * k|D| as suggested in the line before equation (4) of the paper. Does anyone knows why the difference?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Note that Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size Finally, single-sample, finite-width NTK/NNGP (https://neural-tangents.readthedocs.io/en/latest/empirical.html), are not block-diagonal, so you can get the full |
Beta Was this translation helpful? Give feedback.
Note that
kernel_fn
computes (infinite limit of) the expectation of outputs (nngp
) or Jacobians (ntk
) covariance. But both outputs and Jacobians are i.i.d. along the outputchannel_axis
(of size 2 in your example), hence thek|D| * k|D|
covariance is constant-block diagonal along the pair ofk
dimensions, and the full covariance is the Kronecker product of the kernel and the identity matrixkernel_{|D| * |D|} \otimes I_{k * k}
. For this reason we only compute the non-trivial and replicated|D| * |D|
kernel block.Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size
|D|, H, k
, the output kernel will have shape|D|, |D|, H, H
, (note that pairs of dimensions ar…