Read our paper here Conformal Prediction with Large Language Models for Multi-Choice Question Answering
conformal_llm_scores.py contains the python script for classification using 1-shot question prompts. It outputs three files
- The softmax scores corresponding to each subjects for each of the 10 prompts
- The accuracy for each subject prompt for mmlu-based 1-shot question as a dictionary where the key is the subject name and value is a list containing accuracy for each of the 10 prompts.
- The accuracy for each subject prompt for gpt4-based 1-shot question as a dictionary where the key is the subject name and value is a list containing accuracy for each of the 10 prompts.
In conformal.ipynb, we have results for all conformal prediction experiments and gpt4 vs mmlu based prompt comparison. It requires the three files outputted by conformal_llm_scores.py to work. To run the experiment, download the llm_probs_gpt.zip file, unzip it and save it in your working directory and then run the conformal.ipynb file.
If you would like to run the experiments from scratch, apply for LLaMA access here and then use the hugging face version of LLaMA by converting original LLaMA weights to hugging face version refer here for instructions and then run the conformal_llm_scores.py script.