Skip to content

Code for paper "Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent"

License

Notifications You must be signed in to change notification settings

KarolisRam/vpt-mi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

This repository contains some initial code for the paper Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent, which appeared at the Mechanistic Interpretability Workshop at ICML 2024. The main codebase is in the "research code" state at the moment, and we will do our best to share it if there is enough interest.

Meanwhile, feel free to play around with these:
Attention outputs dashboard Colab
Attention weights interactive notebook
The data for the Attention weights notebook can be found here.

Citation

Please cite the paper using the below BibTeX:

@article{jucys2024vptmi,
  title={Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent},
  author={Jucys, Karolis and Adamopoulos, George and Hamidi, Mehrab and Milani, Stephanie and Samsami, Mohammad Reza and Zholus, Artem and Joseph, Sonia and Richards, Blake and Rish, Irina and {\c{S}}im{\c{s}}ek, {\"O}zg{\"u}r},
  journal={arXiv preprint arXiv:2407.12161},
  url={https://arxiv.org/abs/2407.12161},
  year={2024}
}

About

Code for paper "Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published