Clarification on Dataset Files for Each Task #1

Shinning-Zhou · 2024-09-29T11:01:49Z

Hi,
In the "Merging for Generative Models" step, I see 20 files uploaded in the finetune dataset link (https://huggingface.co/datasets/lu-vae/natural-dataset/tree/main), but I'm not sure which task each file corresponds to. Can you upload the test dataset configuration file?

Thanks!

LZY-the-boys · 2024-09-30T02:43:52Z

Thanks for your interest.

The finetuning dataset is detailed in the paper Appendix D.2. For MMLU and TruthfulQA, which lack official training sets, we used the Dolly-15k dataset for MMLU and the BigBench-sampled dataset for TruthfulQA. For GSM8k and CNN-DailyMail, we use original training dataset, such as here. I forgot to upload the BigBench dataset, which I will work on shortly.

The test dataset is contained in HELM evaluation framework, we actually have uploaded a subset in here, its source is configured by this file.

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Dataset Files for Each Task #1

Clarification on Dataset Files for Each Task #1

Shinning-Zhou commented Sep 29, 2024

LZY-the-boys commented Sep 30, 2024

Clarification on Dataset Files for Each Task #1

Clarification on Dataset Files for Each Task #1

Comments

Shinning-Zhou commented Sep 29, 2024

LZY-the-boys commented Sep 30, 2024