Skip to content

Commit

Permalink
Restrict datasets to <2.20
Browse files Browse the repository at this point in the history
  • Loading branch information
yulinggu-cs committed Jun 13, 2024
1 parent a1d833b commit 520c720
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 5 deletions.
15 changes: 10 additions & 5 deletions olmo_eval/tasks/olmes_v0_1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

## Introduction

This directory contains the data for OLMES (v0.1).

OLMES (Open Language Model Evaluation Standard) is a set of principles and associated tasks,
for evaluating large language models (LLMs).
for evaluating large language models (LLMs). See our paper [OLMES: A Standard for Language Model Evaluations (Gu et al, 2024)](https://www.semanticscholar.org/paper/OLMES%3A-A-Standard-for-Language-Model-Evaluations-Gu-Tafjord/c689c37c5367abe4790bff402c1d54944ae73b2a) for more details.

The current version includes:

* Standardized formatting of dataset instances
Expand Down Expand Up @@ -90,8 +89,14 @@ winogrande : 52.7 (CF)
average : 49.0
```

## Citation
## [Citation](https://arxiv.org/abs/2406.08446)

```
Coming soon
@misc{gu2024olmes,
title={OLMES: A Standard for Language Model Evaluations},
author={Yuling Gu and Oyvind Tafjord and Bailey Kuehl and Dany Haddad and Jesse Dodge and Hannaneh Hajishirzi},
year={2024},
eprint={2406.08446},
archivePrefix={arXiv}
}
```
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ requires-python = ">=3.10"

dependencies = [
# Add your own dependencies here
"datasets<2.20", # Workaround for trust_remote_code=True needed in catwalk
"ai2-catwalk>=1.0.0rc0",
"ai2-tango[torch,transformers,fairscale,beaker,wandb,gs]>=1.3.2",
"pygsheets"
Expand Down

0 comments on commit 520c720

Please sign in to comment.