Code, metrics, and models for the paper Outcome-supervised Verifiers for Planning in Mathematical Reasoning
The key technical implementations (utils/sampling.py
):
-
Value-guided beam search: step-level beam search guided by a value model
-
Allow batch generation with caculator using cache (2-3 times faster than a naive implementation)
Model | Dataset | Link |
---|---|---|
OVM-Llama2-7B | GSM8K | parameters |
OVM-Mistral-7B | GSM8K | parameters |
See the training data of our value models (generated by the generators) in dataset
See the training data for Process Reward Models in GSM8K in dataset
- Directories
configs
: for model training withaccelerate
data
: benchmark, and generator created data for training the value modeleval_results
: metrics and responsesgenerator
: generator-only (greedy, self-consistency, or pass@k)verifier
: ORM accuracygenerator_with_verifier
: guided beam search, i.e. OVM and PRM
scripts
: scripts for training and inferenceutils
: functions and classes
- target_set
- GSM8K: there are
train
andtest
, which corresponds to training set and test set respectively - Game of 24: there are
train
andmid
train
: the first 900 problemsmid
: problems index 901-1000
- scripts for GSM8K and Game of 24 are similar. For simplicity, we only take GSM8K as the example below. You can simply run the same pipeline in Game of 24 by replacing
gsm8k
withgame24
Training data for generator:
- GSM8K:
data/gsm8k/train.jsonl
, from OpenAI GSM8K - Game of 24:
data/game24/train.jsonl
, the first 900 problems indata/game24/24.csv
(from ToT) with enumerated solutions
To run the script train_generator.sh
(under scripts/gsm8k
or scripts/game24
), you should first set WANDB_API_KEY
, WANDB_ENTITY
, model_name_or_path
, save_dir
. The generator is named by save_generator_id
cd OVM
bash scripts/gsm8k/train_generator.sh
First use the generator generator_id
to generate n_solutions
for each question in the training set,
cd OVM
bash scripts/gsm8k/generate.sh
You should first config the path of your generator checkpoint model_name_or_path
, and set --target_set train
The output will be saved to data/gsm8k/model_generation/
Train OVM using train_verifier.sh
. First set WANDB_API_KEY
, WANDB_ENTITY
, save_dir
, and checkpoint_dir
(the path of generator checkpoint). The verifier is named with save_verifier_id
cd OVM
bash scripts/gsm8k/train_verifier.sh
Config your generator checkpoint path model_name_or_path
and verifier checkpoint path verifier_model_name_or_path
in eval_step_beam.sh
cd OVM
bash scripts/gsm8k/eval_step_beam.sh
(when dedup_mode=1
, it will prioritize linguistically different candidates, which means when the sorted candidates are ['a', 'a', 'b', 'b', 'c'] it will select ['a', 'b', 'c'] rather than ['a', 'a', 'b'] if n_beam=3)
The output will be saved to eval_results/gsm8k/generator_with_verifier/test
(or eval_results/game24/generator_with_verifier/mid
)
-
First sample the data: config the generator checkpoint
model_name_or_path
, and set--target_set test
cd OVM bash scripts/gsm8k/generate.sh
-
Then call ORM to score and rerank the samples: config the verifier checkpoint
verifier_model_name_or_path
cd OVM bash scripts/gsm8k/eval_with_verifier.sh
The output will be saved to eval_results/gsm8k/generator_with_verifier/test
Config your generator checkpoint path model_name_or_path
cd OVM
bash scripts/gsm8k/greedy_eval.sh
The output will be saved to eval_results/gsm8k/generator/test
@misc{yu2023outcomesupervised,
title={Outcome-supervised Verifiers for Planning in Mathematical Reasoning},
author={Fei Yu and Anningzhe Gao and Benyou Wang},
year={2023},
eprint={2311.09724},
archivePrefix={arXiv},
primaryClass={cs.AI}
}