Dimension of the NTK kernel #161

yidiq7 · 2022-08-17T03:09:32Z

yidiq7
Aug 17, 2022

I'm trying to compute the NTK kernel, where the labels are k dimensional vectors. But no matter how I changed the width of the output layer, the dimension of the NTK kernel is always |D|*|D|, where D is the size of the inputs, instead of k|D| * k|D| as suggested in the line before equation (4) of the paper. Does anyone knows why the difference?

init_fn, apply_fn, kernel_fn = stax.serial(
    stax.Dense(512, W_std=1.5, b_std=0.05), stax.Erf(),
    stax.Dense(512, W_std=1.5, b_std=0.05), stax.Erf(),
    stax.Dense(2, W_std=1.5, b_std=0.05)
)

apply_fn = jit(apply_fn)
kernel_fn = jit(kernel_fn, static_argnames='get')

kernel = kernel_fn(train_xs, train_xs, 'ntk')

Answered by romanngg

Aug 17, 2022

Note that kernel_fn computes (infinite limit of) the expectation of outputs (nngp) or Jacobians (ntk) covariance. But both outputs and Jacobians are i.i.d. along the output channel_axis (of size 2 in your example), hence the k|D| * k|D| covariance is constant-block diagonal along the pair of k dimensions, and the full covariance is the Kronecker product of the kernel and the identity matrix kernel_{|D| * |D|} \otimes I_{k * k}. For this reason we only compute the non-trivial and replicated |D| * |D| kernel block.

Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size |D|, H, k, the output kernel will have shape |D|, |D|, H, H, (note that pairs of dimensions ar…

View full answer

romanngg · 2022-08-17T04:57:49Z

romanngg
Aug 17, 2022

Note that kernel_fn computes (infinite limit of) the expectation of outputs (nngp) or Jacobians (ntk) covariance. But both outputs and Jacobians are i.i.d. along the output channel_axis (of size 2 in your example), hence the k|D| * k|D| covariance is constant-block diagonal along the pair of k dimensions, and the full covariance is the Kronecker product of the kernel and the identity matrix kernel_{|D| * |D|} \otimes I_{k * k}. For this reason we only compute the non-trivial and replicated |D| * |D| kernel block.

Non-i.i.d. dimensions are preserved, so if e.g. your NN outputs CNN outputs of size |D|, H, k, the output kernel will have shape |D|, |D|, H, H, (note that pairs of dimensions are interleaved in the NT convention) - the batch and spatial dimensions remain, but the channel dimension k is removed.

Finally, single-sample, finite-width NTK/NNGP (https://neural-tangents.readthedocs.io/en/latest/empirical.html), are not block-diagonal, so you can get the full |D|, |D|, k, k kernel by passing trace_axes=() argument to the finite-width NTK function (we default to trace_axes=(-1,) to use finite-width kernels to approximate the infinite-width |D| * |D| block).

2 replies

yidiq7 Aug 17, 2022
Author

Thanks Roman for the quick response! This makes sense. So one can just use the prediction function directly for multi-output NNs without worrying about the dimensions or manually flattening anything.

romanngg Aug 17, 2022

Correct! predict functions have the respective argument trace_axes=(- 1,) that defaults to the last (-1) axis, indicating that the input kernel has the respective pair of axes missing (has shape |D| * |D|), and is assumed to be constant-block diagonal along them (e.g. https://neural-tangents.readthedocs.io/en/latest/_autosummary/neural_tangents.predict.gp_inference.html#neural_tangents.predict.gp_inference).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension of the NTK kernel #161

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Dimension of the NTK kernel #161

yidiq7 Aug 17, 2022

Replies: 1 comment · 2 replies

romanngg Aug 17, 2022

yidiq7 Aug 17, 2022 Author

romanngg Aug 17, 2022

yidiq7
Aug 17, 2022

Replies: 1 comment 2 replies

romanngg
Aug 17, 2022

yidiq7 Aug 17, 2022
Author