repeng

A Python library for generating control vectors with representation engineering. Train a vector in less than sixty seconds!

For a full example, see the notebooks folder or the blog post.

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from repeng import ControlVector, ControlModel, DatasetEntry

# load and wrap Mistral-7B
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = ControlModel(model, list(range(-5, -18, -1)))

def make_dataset(template: str, pos_personas: list[str], neg_personas: list[str], suffixes: list[str]):
    # see notebooks/experiments.ipynb for a definition of `make_dataset`
    ...

# generate a dataset with closely-opposite paired statements
trippy_dataset = make_dataset(
    "Act as if you're extremely {persona}.",
    ["high on psychedelic drugs"],
    ["sober from psychedelic drugs"],
    truncated_output_suffixes,
)

# train the vector—takes less than a minute!
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)

# set the control strength and let inference rip!
for strength in (-2.2, 1, 2.2):
    print(f"strength={strength}")
    model.set_control(trippy_vector, strength)
    out = model.generate(
        **tokenizer(
            f"[INST] Give me a one-sentence pitch for a TV show. [/INST]",
            return_tensors="pt"
        ),
        do_sample=False,
        max_new_tokens=128,
        repetition_penalty=1.1,
    )
    print(tokenizer.decode(out.squeeze()).strip())
    print()

strength=-2.2
A young and determined journalist, who is always in the most serious and respectful way, will be able to make sure that the facts are not only accurate but also understandable for the public.

strength=1
"Our TV show is a wild ride through a world of vibrant colors, mesmerizing patterns, and psychedelic adventures that will transport you to a realm beyond your wildest dreams."

strength=2.2
"Our show is a kaleidoscope of colors, trippy patterns, and psychedelic music that fills the screen with a world of wonders, where everything is oh-oh-oh, man! ��psy��oodle��psy��oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

For a more detailed explanation of how the library works and what it can do, see the blog post.

Notes

For a list of changes by version, see the CHANGELOG.
For quantized use, you may be interested in llama.cpp#5970—after training a vector with repeng, export it by calling vector.export_gguf(filename) and then use it in llama.cpp with any quant!
Vector training currently does not work with MoE models (such as Mixtral). (This is theoretically fixable with some work, let me know if you're interested.)

Notice

Some of the code in this repository derives from andyzoujm/representation-engineering (MIT license).

Citation

If this repository is useful for academic work, please remember to cite the representation-engineering paper that it's based on, along with this repository:

@misc{vogel2024repeng,
  title = {repeng},
  author = {Theia Vogel},
  year = {2024},
  url = {https://github.com/vgel/repeng/}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
notebooks		notebooks
repeng		repeng
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

repeng

Notes

Notice

Citation

About

Releases

Packages

Contributors 5

Languages

License

vgel/repeng

Folders and files

Latest commit

History

Repository files navigation

repeng

Notes

Notice

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages