Skip to content

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

License

Notifications You must be signed in to change notification settings

glgh/awesome-llm-human-preference-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Awesome Human Preference Datasets for LLM 🧑❤️🤖

A curated list of open source Human Preference datasets for LLM instruction-tuning, RLHF and evaluation.

For general NLP datasets and text corpora, check out this awesome list.

Datasets

OpenAI WebGPT Comparisons

  • 20k comparisons where each example comprises a question, a pair of model answers, and human-rated preference scores for each answer.
  • RLHF dataset used to train the OpenAI WebGPT reward model.

OpenAI Summarization

Anthropic Helpfulness and Harmlessness Dataset (HH-RLHF)

  • In total 170k human preference comparisons, including human preference data collected for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback and human-generated red teaming data from Red Teaming Language Models to Reduce Harms, divided into 3 sub-datasets:
    • A base dataset using a context-distilled 52B model, with 44k helpfulness comparisons and 42k red-teaming (harmlessness) comparisons.
    • A RS dataset of 52k helpfulness comparisons and 2k red-teaming comparisons using rejection sampling models, where rejection sampling used a preference model trained on the base dataset.
    • An iterated online dataset including data from RLHF models, updated weekly over five weeks, with 22k helpfulness comparisons.

OpenAssistant Conversations Dataset (OASST1)

  • A human-generated, human-annotated assistant-style conversation corpus consisting of 161k messages in 35 languages, annotated with 461k quality ratings, resulting in 10k+ fully annotated conversation trees.

Stanford Human Preferences Dataset (SHP)

  • 385K collective human preferences over responses to questions/instructions in 18 domains for training RLHF reward models and NLG evaluation models. Datasets collected from Reddit.

Reddit ELI5

  • 270k examples of questions, answers and scores collected from 3 Q&A subreddits.

Human ChatGPT Comparison Corpus (HC3)

  • 60k human answers and 27K ChatGPT answers for around 24K questions.
  • Sibling dataset available for Chinese.

HuggingFace H4 StackExchange Preference Dataset

  • 10 million questions (with >= 2 answers) and answers (scored based on vote count) from Stackoverflow.

ShareGPT.com

  • 90k (as of April 2023) user-uploaded ChatGPT interactions.
  • To access the data using ShareGPT's API, see documentation here The ShareGPT API is currently disabled ("due to excess traffic").
  • Precompliled datasets on HuggingFace.

Alpaca

  • 52k instructions and demonstrations generated by OpenAI's text-davinci-003 engine for self-instruct training.

GPT4All

  • 1M prompt-response pairs colleced using GPT-3.5-Turbo API in March 2023. GitHub repo.

Databricks Dolly Dataset

  • 15k instruction-following records generated by Databricks employees in categories including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

HH_golden

  • 42k harmless data, same prompts and "rejected" responses as the Harmless dataset in Anthropic HH datasets, but the responses in the "chosen" responses are re-writtened using GPT4 to yield more harmless answers. The comparison before and after re-written can be found here. Empirically, compared with the original Harmless dataset, training on this dataset improves the harmless metrics for various alignment methods such as RLHF and DPO.

About

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

Topics

Resources

License

Stars

Watchers

Forks