Skip to content

Latest commit

 

History

History
32 lines (19 loc) · 2.1 KB

README.md

File metadata and controls

32 lines (19 loc) · 2.1 KB

Experiments with seeing the secret thoughts of LLM's

I had a play with Quiet-STaR

It has "private thoughts" that have never been tuned to human preferences. I was curious about it's private thought.

  • occasionally there is a glimpse of duplicity
  • mostly is garbled, or unrelated like regular CoT
  • curious about a larger model

The thoughts share these properties with normal Chain Of Thought

  • the conclusion is sometimes not faithfull to the reasoning
  • it's sometimes garbled (normal for a small 7b parameter model)

I think all these differences could become more distince in a larger model. And it's fascinating to see thoughts that have been trained to be effective, not to please humans. However some of the thoughts contradict each other, so it would be even nicer if we could somehow make sure that they are used, but this is an open research question.

Links:

image

Quiet-STaR

Code for Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking.

This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer). Our patches were applied to Huggingface's transformers version 4.37.0.dev0 under src/transformers/models/mistral/ -- we cannot guarantee that other changes to their implementation will not affect our implementation, so for reproducibility, we encourage using the same version.

One pitfall to be wary of: the model is not taught not to generate start and end thought tokens. Thus, when performing actual inference, it is necessary to mask these out.

We make an 8-thought-token ahead (including start and end tokens) model available via Huggingface.