Baselines

Under the baselines folder we provide all the necessary utils to reproduce the baselines' results of our paper.

Unsupervised methods

Both Self-Talk and GKP provide repository and code to reproduce their experiments.

Self-Talk Repository
GKP Repository

For Self-Talk, you can find the necessary code to adapt their framework to support LLMs at the following Google Drive link.
In particular, you should follow the following steps:

add the desired datasets from zebra/data/datasets to the self_talk/data folder (e.g. cp -r zebra/data/datasets/obqa self_talk/data)
rename every dataset with the format: {split}.jsonl (e.g. mv self_talk/data/obqa/obqa-test.jsonl self_talk/data/obqa/test.jsonl)
create a folder under self_talk/experiments with the same name as the folder containing the dataset (e.g. mkdir self_talk/experiments/obqa)
create a json file called prefixes.json where the question clarification prefixes will be stored (e.g touch self_talk/experiments/obqa/prefixes.json)
fill-in the prefixes.json file with the desired question perfixes
replace the generate_clarifications_from_lm.py with the one contained in the drive link
replace the lm_text_generator.py with the one contained in the drive link
add the generate_llms_clarifications.sh to the folder to run the pipeline with the newly added scripts

For GKP, you can find the necessary code to adapt their framework to support LLMs at the following Google Drive link.
In particular, you should follow the following steps:

add the desired datasets from zebra/data/datasets to the GKP/data folder (e.g. cp -r zebra/data/datasets/obqa GKP/data)
add the {dataset_name}_standardize.py script contained in the drive link to the GKP/standardize folder
standardize the desired dataset by running the GKP/standardize/{dataset_name}_standardize.py script
add the prompts contained in the drive link to the GKP/knowledge/prompts folder
add the llms_generate_knowledge.py script from the drive link to the GKP/knowledge folder
run the llms_generate_knowledge.py python script with the required parameters.

These procedures will produce the relative baseline outputs needed by ZEBRA to run their evaluation. For convenience, in our repository we provide a folder to store the results of the baselines: baselines/outputs.

Supervised methods

Both Rainier and Crystal provide the relative HuggingFace code to load their models and to generate knowledge.

Rainier HuggingFace page
Crystal HuggingFace page

For convenience, we provide the scripts to generate the knowledge with both Raininer and Crystal under the baselines folder:

Raininer: baselines/generate_raininer_knowledge.py
Crystal (both 3b and 11b): baselines/generate_crystal_knowledge.py

For convenience, in our repository we provide a folder to store the results of the baselines: baselines/outputs.

RACo-based Retrieval Augmentation (RBR)

In the Retrival Augmentation for Commonsense Reasoning: A Unified Approach (RACo) paper, the authors provide guidelines to train a retriever to fetch relevant commonsense statements from a large collection.

Both the data to train the retriever and the knowledge base serving as document index can be found in the official RACo repository.

Steps:

Gather the data needed to train the RACo retriever and the documents that will serve as document index from the official RACo repository, then you can train your own RACo retriever by running: baselines/rbr_retriever/train.py. To ease reproducibility, we provide a checkpoint of a pre-trained RACo retriever at the following HuggingFace link. Alternatively, you can just follow the steps explained in the official RACo repository to train the retriever.
Once you have a pre-trained RACo retriever, you can create your own document index by running: baselines/rbr_retriever/create_index.py
Once the document index is created, you can generate the outputs for a specific dataset by running: baselines/rbr_retriever/retriever_inference.py

For convenience, in our repository we provide a folder to store the results of the baselines: baselines/outputs.

Evaluation

Once you have generated the output of a relative baseline, you can run: baselines/evaluate_baseline.py

python baselines/evaluate_baseline.py --help

  Usage: evaluate_baseline.py [ARGUMENTS] [OPTIONS] 

╭─ Arguments ────────────────────────────────────────────────────────────╮
│ *    model_name              TEXT  [default: None] [required]          │
│ *    data_path               TEXT  [default: None] [required]          │
│ *    dataset_tag             TEXT  [default: None] [required]          │
│ *    output_dir              TEXT  [default: None] [required]          │
│ *    baseline                TEXT  [default: None] [required]          │
╰────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────╮
│ --plain                                    BOOL     [default: False]   │
│ --max_knowledge                            INTEGER  [default: None]    │
│ --limit_samples                            INTEGER  [default: None]    │
│ --device                                   TEXT     [default: 'cuda']  │
│ --help                                     Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────╯

For example:

python scripts/evaluation/run_zebra.py \
  --model_name meta-llama/Meta-Llama-3-8B-Instruct \
  --data_path baselines/outputs/csqa/csqa-dev|crystal-11b|explanations=crystal|num_return_sequences=10.jsonl \
  --dataset_tag csqa \ 
  --output_dir results/crystal/csqa \
  --baseline crystal

To ease reproducibility, we provide all the baselines outputs at the following Google Drive link. You can download the baseline outputs from the link and put them into the baselines/outputs folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Baselines

Unsupervised methods

Supervised methods

RACo-based Retrieval Augmentation (RBR)

Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Baselines

Unsupervised methods

Supervised methods

RACo-based Retrieval Augmentation (RBR)

Evaluation