Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.65 KB

File metadata and controls

22 lines (13 loc) · 1.65 KB

Mechanistic Interpretability Projects

This repository houses research projects in mechanistic interpretability: reverse engineering neural networks.

Open In Colab

The goal of this notebook is to explore the phenomenon of bracket closing in the GPT-Neo 125M model, whereby it can correctly match open parentheses ([{< with their corresponding closing versions )]}>.

This is Problem 2.13 in Neel Nanda's 200 Concrete Open Problems in Mechanistic Interpretability. The first goal is to figure out how the model determines whether an opening or closing bracket is more appropriate, and the second is to figure out how it knows the correct kind: (, [, { or <.

Some notebooks messing around with models.