Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

website| arXiv | twitter/X | LessWrong

This repository contains some initial code for the paper Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent, which appeared at the Mechanistic Interpretability Workshop at ICML 2024. The main codebase is in the "research code" state at the moment, and we will do our best to share it if there is enough interest.

Meanwhile, feel free to play around with these:
Attention outputs dashboard Colab
Attention weights interactive notebook
The data for the Attention weights notebook can be found here.

Citation

Please cite the paper using the below BibTeX:

@article{jucys2024vptmi,
  title={Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent},
  author={Jucys, Karolis and Adamopoulos, George and Hamidi, Mehrab and Milani, Stephanie and Samsami, Mohammad Reza and Zholus, Artem and Joseph, Sonia and Richards, Blake and Rish, Irina and {\c{S}}im{\c{s}}ek, {\"O}zg{\"u}r},
  journal={arXiv preprint arXiv:2407.12161},
  url={https://arxiv.org/abs/2407.12161},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
attention-weights.ipynb		attention-weights.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

website| arXiv | twitter/X | LessWrong

Citation

About

Releases

Packages

Languages

License

KarolisRam/vpt-mi

Folders and files

Latest commit

History

Repository files navigation

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

website| arXiv | twitter/X | LessWrong

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages