PCA on what? #100

sachinruk · 2024-10-20T08:23:25Z

sachinruk
Oct 20, 2024

Hello, I've seen that using PCA is one of the biggest advantages. Just wondering where PCA is applied exactly?

I initially thought that you were applying PCA over the output dimension of a decently large dataset, but seems like this is not the case (since you seem to be able to apply this to arbitrary models). The only other place I can think of is the token embeddings itself, but then the model dimensions mis-match.

TIA

Answered by Pringled

Oct 20, 2024

Hi @sachinruk, we are applying PCA directly on the static token embeddings that you get by forward passing the vocabulary. The relevant line can be found here. So essentially, we first forward pass all the tokens, which gives you (vocab_size, dim_size) embeddings (where dim_size is the dimensionality of the model you are distilling), and then we apply PCA on those embeddings, which gives you (vocab_size, pca_dims) output embeddings. Hope that answers your question!

View full answer

Pringled · 2024-10-20T10:45:49Z

Pringled
Oct 20, 2024
Maintainer

Hi @sachinruk, we are applying PCA directly on the static token embeddings that you get by forward passing the vocabulary. The relevant line can be found here. So essentially, we first forward pass all the tokens, which gives you (vocab_size, dim_size) embeddings (where dim_size is the dimensionality of the model you are distilling), and then we apply PCA on those embeddings, which gives you (vocab_size, pca_dims) output embeddings. Hope that answers your question!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA on what? #100

{{title}}

Replies: 1 comment

{{title}}

Select a reply

PCA on what? #100

sachinruk Oct 20, 2024

Replies: 1 comment

Pringled Oct 20, 2024 Maintainer

sachinruk
Oct 20, 2024

Pringled
Oct 20, 2024
Maintainer