NLP-monosemanticity

Successfully replicated Anthropic's work on extracting monosemantic features from a one-layer transformer trained on Wikipedia's text. (Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023).

Trabajo práctico para la materia Procesamiento de Lenguaje Natural dictada por Luciano Del Corro en la Facultad de Ciencias Exactas y Naturales de la UBA.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
activations.py		activations.py
activations_histogram.png		activations_histogram.png
activations_histogram_zoomed.png		activations_histogram_zoomed.png
activations_histogram_zoomed_2.png		activations_histogram_zoomed_2.png
activations_params.py		activations_params.py
activations_save_activations.py		activations_save_activations.py
autoencoder.py		autoencoder.py
autoencoder_gpt2_params.py		autoencoder_gpt2_params.py
autoencoder_gpt2_train.py		autoencoder_gpt2_train.py
autoencoder_params.py		autoencoder_params.py
autoencoder_play.ipynb		autoencoder_play.ipynb
autoencoder_train.py		autoencoder_train.py
autoencoder_utils.py		autoencoder_utils.py
constants.py		constants.py
extract_features_meaning.py		extract_features_meaning.py
feature_activations.ipynb		feature_activations.ipynb
get_activations.ipynb		get_activations.ipynb
gpt.py		gpt.py
gpt_params.py		gpt_params.py
gpt_play.ipynb		gpt_play.ipynb
gpt_train.py		gpt_train.py
gpt_utils.py		gpt_utils.py
mlflow_env.py		mlflow_env.py
requirements.txt		requirements.txt
text_loader.py		text_loader.py
tokens_election_by_LLM.ipynb		tokens_election_by_LLM.ipynb
tokens_generating.ipynb		tokens_generating.ipynb
tokens_mlp_election_by_LLM.ipynb		tokens_mlp_election_by_LLM.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-monosemanticity

About

Releases

Packages

Contributors 4

Languages

manoloFer10/NLP-monosemanticity

Folders and files

Latest commit

History

Repository files navigation

NLP-monosemanticity

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages