Restrict datasets to <2.20

allenai · Jun 13, 2024 · 520c720 · 520c720
1 parent a1d833b
commit 520c720
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 5 deletions.
diff --git a/olmo_eval/tasks/olmes_v0_1/README.md b/olmo_eval/tasks/olmes_v0_1/README.md
@@ -2,10 +2,9 @@
 
 ## Introduction
 
-This directory contains the data for OLMES (v0.1).
-
 OLMES (Open Language Model Evaluation Standard) is a set of principles and associated tasks, 
-for evaluating large language models (LLMs). 
+for evaluating large language models (LLMs). See our paper [OLMES: A Standard for Language Model Evaluations (Gu et al, 2024)](https://www.semanticscholar.org/paper/OLMES%3A-A-Standard-for-Language-Model-Evaluations-Gu-Tafjord/c689c37c5367abe4790bff402c1d54944ae73b2a) for more details.
+
 The current version includes:
 
    * Standardized formatting of dataset instances
@@ -90,8 +89,14 @@ winogrande   : 52.7  (CF)
 average      : 49.0
 ```
 
-## Citation
+## [Citation](https://arxiv.org/abs/2406.08446)
 
 ```
-Coming soon
+@misc{gu2024olmes,
+      title={OLMES: A Standard for Language Model Evaluations}, 
+      author={Yuling Gu and Oyvind Tafjord and Bailey Kuehl and Dany Haddad and Jesse Dodge and Hannaneh Hajishirzi},
+      year={2024},
+      eprint={2406.08446},
+      archivePrefix={arXiv}
+}
 ```
diff --git a/pyproject.toml b/pyproject.toml
@@ -21,6 +21,7 @@ requires-python = ">=3.10"
 
 dependencies = [
   # Add your own dependencies here
+  "datasets<2.20", #  Workaround for trust_remote_code=True needed in catwalk
   "ai2-catwalk>=1.0.0rc0",
   "ai2-tango[torch,transformers,fairscale,beaker,wandb,gs]>=1.3.2",
   "pygsheets"