Merge pull request #9 from lauritowal/joss-paper

Joss paper
EleutherAI · Sep 2, 2024 · 8e72628 · 8e72628
2 parents b350d08 + 129902a
commit 8e72628
Show file tree

Hide file tree

Showing 367 changed files with 828 additions and 410 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,10 +1,6 @@
 *.csv
 *.npy
-elk/models/*
-elk/trained/*
-nohup.out
 .idea
-*.pkl
 
 # scripts for experiments in progress
 my_*.sh

diff --git a/LICENSE.md b/LICENSE.md
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2023 EleutherAI
+Copyright (c) 2024 EleutherAI
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,2 +1,2 @@
-recursive-include elk/promptsource/templates *
-recursive-include elk/resources *
+recursive-include ccs/promptsource/templates *
+recursive-include ccs/resources *
diff --git a/README.md b/README.md
@@ -5,8 +5,8 @@
 Because language models are trained to predict the next token in naturally occurring text, they often reproduce common
 human errors and misconceptions, even when they "know better" in some sense. More worryingly, when models are trained to
 generate text that's rated highly by humans, they may learn to output false statements that human evaluators can't
-detect. We aim to circumvent this issue by directly [**eliciting latent knowledge
-**](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
+detect. We aim to circumvent this issue by directly [eliciting latent knowledge
+](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
 of a language model.
 
 Specifically, we're building on the **Contrastive Representation Clustering** (CRC) method described in the
@@ -19,79 +19,104 @@ classification tasks, even though the features are trained without labels.
 
 Our code is based on [PyTorch](http://pytorch.org)
 and [Huggingface Transformers](https://huggingface.co/docs/transformers/index). We test the code on Python 3.10 and
-3.11.
+3.11. An example can be found [here](https://colab.research.google.com/drive/1pzcH55aHVXvfF0967hNixReG--gNT473?usp=sharing).
 
-First install the package with `pip install -e .` in the root directory, or `pip install -e .[dev]` if you'd like to
-contribute to the project (see **Development** section below). This should install all the necessary dependencies.
+First, create a virtual environment by using e.g. conda:
+
+```
+conda create -n ccs python==3.10
+conda activate ccs
+```
+
+Clone the repository:
+```
+git clone https://github.com/EleutherAI/ccs.git
+```
+
+Next, install the package with `pip install -e .` in the root directory. Use `pip install -e .[dev]` if you'd like to contribute to the project (see **Development** section below). This should install all the necessary dependencies.
 
 To fit reporters for the HuggingFace model `model` and dataset `dataset`, just run:
 
 ```bash
-elk elicit microsoft/deberta-v2-xxlarge-mnli imdb
+ccs elicit microsoft/deberta-v2-xxlarge-mnli imdb
 ```
 
 This will automatically download the model and dataset, run the model and extract the relevant representations if they
-aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `elk-reporters` folder in your
+aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `ccs-reporters` folder in your
 home directory. It will also evaluate the reporter classification performance on a held out test set and save it to a
 CSV file in the same folder.
 
 The following will generate a CCS (Contrast Consistent Search) reporter instead of the CRC-based reporter, which is the
 default.
 
 ```bash
-elk elicit microsoft/deberta-v2-xxlarge-mnli imdb --net ccs
+ccs elicit microsoft/deberta-v2-xxlarge-mnli imdb --net ccs
 ```
 
 The following command will evaluate the probe from the run naughty-northcutt on the hidden states extracted from the
 model deberta-v2-xxlarge-mnli for the imdb dataset. It will result in an `eval.csv` and `cfg.yaml` file, which are
-stored under a subfolder in `elk-reporters/naughty-northcutt/transfer_eval`.
+stored under a subfolder in `ccs-reporters/naughty-northcutt/transfer_eval`.
 
 ```bash
-elk eval naughty-northcutt microsoft/deberta-v2-xxlarge-mnli imdb
+ccs eval naughty-northcutt microsoft/deberta-v2-xxlarge-mnli imdb
 ```
 
 The following runs `elicit` on the Cartesian product of the listed models and datasets, storing it in a special folder
-ELK_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets
+CCS_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets
 together. You can also add a `--visualize` flag to visualize the results of the sweep.
 
 ```bash
-elk sweep --models gpt2-{medium,large,xl} --datasets imdb amazon_polarity --add_pooled
+ccs sweep --models gpt2-{medium,large,xl} --datasets imdb amazon_polarity --add_pooled
 ```
 
-If you just do `elk plot`, it will plot the results from the most recent sweep.
+If you just do `ccs plot`, it will plot the results from the most recent sweep.
 If you want to plot a specific sweep, you can do so with:
 
 ```bash
-elk plot {sweep_name}
+ccs plot {sweep_name}
 ```
 
 ## Caching
 
-The hidden states resulting from `elk elicit` are cached as a HuggingFace dataset to avoid having to recompute them
+The hidden states resulting from `ccs elicit` are cached as a HuggingFace dataset to avoid having to recompute them
 every time we want to train a probe. The cache is stored in the same place as all other HuggingFace datasets, which is
 usually `~/.cache/huggingface/datasets`.
 
-## Development
+## Contribution Guidelines
 
-Use `pip install pre-commit && pre-commit install` in the root folder before your first commit.
+If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself.
+Maybe, even share it in the elk channel of Eleuther's Discord with a small note. In this way, others know you are
+working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.
 
-### Devcontainer
+### Submitting a Pull-Requests
+We welcome PRs to our libraries. They're an efficient way to include your fixes or improvements in our next release. Please follow these guidelines:
 
-[
-![Open in Remote - Containers](
-https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
-)
-](
-https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/EleutherAI/elk
-)
+- Focus on either functionality changes OR widespread style issues, not both.
+- Add tests for new or modified functionality if it makes sense.
+- Address a single issue or feature with minimal code changes.
+- Include relevant documentation in the repo or on our docs site.
 
-### Run tests
+#### "fork-and-pull" Git workflow:
+
+- Fork the repository to your Github account.
+- Clone the project to your local machine.
+- Create a new branch with a concise, descriptive name.
+- Make and commit your changes to our neww branch.
+- Follow any repo-specific formatting and testing guidelines (see next section)
+- Push the changes to your fork.
+- Open a PR in our repository, using the PR template for efficient review.
+
+
+#### Before commiting
+1. Use `pip install pre-commit && pre-commit install` in the root folder before your first commit.
+
+2. Run tests
 
 ```bash
 pytest
 ```
 
-### Run type checking
+3. Run type checking
 
 We use [pyright](https://github.com/microsoft/pyright), which is built into the VSCode editor. If you'd like to run it
 as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en/download/)
@@ -100,7 +125,7 @@ as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en
 pyright
 ```
 
-### Run the linter
+4. Run the linter
 
 We use [ruff](https://beta.ruff.rs/docs/). It is installed as a pre-commit hook, so you don't have to run it manually.
 If you want to run it manually, you can do so with:
@@ -109,8 +134,10 @@ If you want to run it manually, you can do so with:
 ruff . --fix
 ```
 
-### Contributing to this repository
+### Issues
 
-If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself (
-Maybe, even share it in the elk channel of Eleuther's Discord with a small note). In this way, others know you are
-working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.
+Issues serve three main purposes: reporting library problems, requesting new features, and discussing potential changes before creating a Pull Request (PR). If you encounter a problem, first check if an existing Issue addresses it. If so, add your own reproduction information to that Issue instead of creating a new one. This approach prevents duplicate reports and helps maintainers understand the problem's scope. Additionally, adding a reaction (like a thumbs-up) to an existing Issue signals to maintainers that the problem affects multiple users, which can influence prioritization.
+
+### Discussion and Contact
+
+If you have additional questions you ask them in the elk channel of Eleuther's Discord https://discord.gg/zBGx3azzUn 
diff --git a/ccs.lock b/ccs.lock
diff --git a/elk/__init__.py → ccs/__init__.py b/elk/__init__.py → ccs/__init__.py
@@ -1,11 +1,15 @@
 from .extraction import Extract, extract_hiddens
 from .training import EigenFitter, EigenFitterConfig
+from .training.train import Elicit
+from .evaluation import Eval
 from .truncated_eigh import truncated_eigh
 
 __all__ = [
     "EigenFitter",
     "EigenFitterConfig",
     "extract_hiddens",
     "Extract",
+    "Elicit",
+    "Eval",
     "truncated_eigh",
 ]
diff --git a/elk/__main__.py → ccs/__main__.py b/elk/__main__.py → ccs/__main__.py
@@ -1,13 +1,13 @@
-"""Main entry point for `elk`."""
+"""Main entry point for `ccs`."""
 
 from dataclasses import dataclass
 
 from simple_parsing import ArgumentParser
 
-from elk.evaluation.evaluate import Eval
-from elk.plotting.command import Plot
-from elk.training.sweep import Sweep
-from elk.training.train import Elicit
+from ccs.evaluation.evaluate import Eval
+from ccs.plotting.command import Plot
+from ccs.training.sweep import Sweep
+from ccs.training.train import Elicit
 
 
 @dataclass

diff --git a/elk/debug_logging.py → ccs/debug_logging.py b/elk/debug_logging.py → ccs/debug_logging.py
@@ -31,7 +31,11 @@ def save_debug_log(datasets: list[DatasetDictWithName], out_dir: Path) -> None:
         else:
             train_split, val_split = select_train_val_splits(ds)
 
-        text_questions = ds[val_split][0]["text_questions"]
+        if len(ds[val_split]) == 0:
+            logging.warning(f"Val split '{val_split}' is empty!")
+            continue
+
+        text_questions = ds[val_split][0]["texts"]
         template_ids = ds[val_split][0]["variant_ids"]
         label = ds[val_split][0]["label"]
 

diff --git a/elk/evaluation/__init__.py → ccs/evaluation/__init__.py b/elk/evaluation/__init__.py → ccs/evaluation/__init__.py
diff --git a/elk/evaluation/evaluate.py → ccs/evaluation/evaluate.py b/elk/evaluation/evaluate.py → ccs/evaluation/evaluate.py
@@ -6,8 +6,8 @@
 import torch
 from simple_parsing.helpers import field
 
-from ..files import elk_reporter_dir
-from ..metrics import evaluate_preds
+from ..files import ccs_reporter_dir
+from ..metrics import evaluate_preds, get_logprobs
 from ..run import Run
 from ..utils import Color
 
@@ -22,7 +22,7 @@ class Eval(Run):
     def __post_init__(self):
         # Set our output directory before super().execute() does
         if not self.out_dir:
-            root = elk_reporter_dir() / self.source
+            root = ccs_reporter_dir() / self.source
             self.out_dir = root / "transfer" / "+".join(self.data.datasets)
 
     def execute(self, highlight_color: Color = "cyan"):
@@ -31,38 +31,61 @@ def execute(self, highlight_color: Color = "cyan"):
     @torch.inference_mode()
     def apply_to_layer(
         self, layer: int, devices: list[str], world_size: int
-    ) -> dict[str, pd.DataFrame]:
+    ) -> tuple[dict[str, pd.DataFrame], dict]:
         """Evaluate a single reporter on a single layer."""
         device = self.get_device(devices, world_size)
         val_output = self.prepare_data(device, layer, "val")
 
-        experiment_dir = elk_reporter_dir() / self.source
+        experiment_dir = ccs_reporter_dir() / self.source
 
         reporter_path = experiment_dir / "reporters" / f"layer_{layer}.pt"
         reporter = torch.load(reporter_path, map_location=device)
 
+        out_logprobs = defaultdict(dict)
         row_bufs = defaultdict(list)
-        for ds_name, (val_h, val_gt, val_lm_preds) in val_output.items():
+        for ds_name, val_data in val_output.items():
             meta = {"dataset": ds_name, "layer": layer}
+            if self.save_logprobs:
+                out_logprobs[ds_name] = dict(
+                    row_ids=val_data.row_ids.cpu(),
+                    variant_ids=val_data.variant_ids,
+                    texts=val_data.texts,
+                    labels=val_data.labels.cpu(),
+                    lm=dict(),
+                    lr=dict(),
+                    reporter=dict(),
+                )
 
-            val_credences = reporter(val_h)
+            val_credences = reporter(val_data.hiddens)
             for mode in ("none", "partial", "full"):
                 row_bufs["eval"].append(
                     {
                         **meta,
                         "ensembling": mode,
-                        **evaluate_preds(val_gt, val_credences, mode).to_dict(),
+                        **evaluate_preds(
+                            val_data.labels, val_credences, mode
+                        ).to_dict(),
                     }
                 )
+                if self.save_logprobs:
+                    out_logprobs[ds_name]["reporter"][mode] = (
+                        get_logprobs(val_credences, mode).detach().cpu()
+                    )
 
-                if val_lm_preds is not None:
+                if val_data.lm_preds is not None:
                     row_bufs["lm_eval"].append(
                         {
                             **meta,
                             "ensembling": mode,
-                            **evaluate_preds(val_gt, val_lm_preds, mode).to_dict(),
+                            **evaluate_preds(
+                                val_data.labels, val_data.lm_preds, mode
+                            ).to_dict(),
                         }
                     )
+                    if self.save_logprobs:
+                        out_logprobs[ds_name]["lm"][mode] = get_logprobs(
+                            val_data.lm_preds, mode
+                        ).cpu()
 
                 lr_dir = experiment_dir / "lr_models"
                 if not self.skip_supervised and lr_dir.exists():
@@ -71,15 +94,25 @@ def apply_to_layer(
                         if not isinstance(lr_models, list):  # backward compatibility
                             lr_models = [lr_models]
 
+                    if self.save_logprobs:
+                        out_logprobs[ds_name]["lr"][mode] = dict()
+
                     for i, model in enumerate(lr_models):
                         model.eval()
+                        val_credences = model(val_data.hiddens)
+                        if self.save_logprobs:
+                            out_logprobs[ds_name]["lr"][mode][i] = get_logprobs(
+                                val_credences, mode
+                            ).cpu()
                         row_bufs["lr_eval"].append(
                             {
                                 "ensembling": mode,
                                 "inlp_iter": i,
                                 **meta,
-                                **evaluate_preds(val_gt, model(val_h), mode).to_dict(),
+                                **evaluate_preds(
+                                    val_data.labels, val_credences, mode
+                                ).to_dict(),
                             }
                         )
 
-        return {k: pd.DataFrame(v) for k, v in row_bufs.items()}
+        return {k: pd.DataFrame(v) for k, v in row_bufs.items()}, out_logprobs
diff --git a/elk/extraction/__init__.py → ccs/extraction/__init__.py b/elk/extraction/__init__.py → ccs/extraction/__init__.py
diff --git a/elk/extraction/balanced_sampler.py → ccs/extraction/balanced_sampler.py b/elk/extraction/balanced_sampler.py → ccs/extraction/balanced_sampler.py
diff --git a/elk/extraction/dataset_name.py → ccs/extraction/dataset_name.py b/elk/extraction/dataset_name.py → ccs/extraction/dataset_name.py