Skip to content

Successfully replicated Anthropic's work on extracting monosemantic features from a one-layer transformer trained on Wikipedia's text. (Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023).

Notifications You must be signed in to change notification settings

manoloFer10/NLP-monosemanticity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-monosemanticity

Successfully replicated Anthropic's work on extracting monosemantic features from a one-layer transformer trained on Wikipedia's text. (Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023).

Trabajo práctico para la materia Procesamiento de Lenguaje Natural dictada por Luciano Del Corro en la Facultad de Ciencias Exactas y Naturales de la UBA.

About

Successfully replicated Anthropic's work on extracting monosemantic features from a one-layer transformer trained on Wikipedia's text. (Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •