Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync eval changes in OLMo/ladder-1xC to here #122

Merged
merged 23 commits into from
Dec 19, 2024
Merged

Sync eval changes in OLMo/ladder-1xC to here #122

merged 23 commits into from
Dec 19, 2024

Conversation

liujch1998
Copy link
Contributor

@liujch1998 liujch1998 commented Dec 15, 2024

This adds scaling law eval sets as in-loop.

Testing of metric: https://legacy.beaker.org/ex/01JF4NNA49YJGC55P3Q5FPEAPA/tasks/01JF4NNA4HM9Q90BQNQ99XSJ9Y/job/01JF4P6XRZVTDXWC3J2559R0K5

2024-12-15T08:21:11.301073649Z 2024-12-15 08:21:11.300	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:68	INFO	Running downstream evals...
2024-12-15T08:21:14.829675802Z 2024-12-15 08:21:14.829	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=5/75]
2024-12-15T08:21:14.940428448Z 2024-12-15 08:21:14.940	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=10/75]
2024-12-15T08:21:15.049435484Z 2024-12-15 08:21:15.049	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=15/75]
2024-12-15T08:21:15.157967512Z 2024-12-15 08:21:15.157	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=20/75]
2024-12-15T08:21:15.267427337Z 2024-12-15 08:21:15.267	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=25/75]
2024-12-15T08:21:15.375047960Z 2024-12-15 08:21:15.374	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=30/75]
2024-12-15T08:21:15.483513780Z 2024-12-15 08:21:15.483	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=35/75]
2024-12-15T08:21:15.594538312Z 2024-12-15 08:21:15.594	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=40/75]
2024-12-15T08:21:15.702422918Z 2024-12-15 08:21:15.702	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=45/75]
2024-12-15T08:21:15.811504739Z 2024-12-15 08:21:15.811	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=50/75]
2024-12-15T08:21:15.919817749Z 2024-12-15 08:21:15.919	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=55/75]
2024-12-15T08:21:16.026753004Z 2024-12-15 08:21:16.026	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=60/75]
2024-12-15T08:21:16.133501599Z 2024-12-15 08:21:16.133	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=65/75]
2024-12-15T08:21:16.240990822Z 2024-12-15 08:21:16.240	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=70/75]
2024-12-15T08:21:16.348730485Z 2024-12-15 08:21:16.348	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:111	INFO	[eval=downstream,step=75/75]
2024-12-15T08:21:17.056109188Z 2024-12-15 08:21:17.055	d22e6d646321:0	olmo_core.train.callbacks.evaluator_callback:104	INFO	Eval metrics:
2024-12-15T08:21:17.056129669Z     arc_challenge_val_rc_5shot (len_norm)=0.2441
2024-12-15T08:21:17.056131828Z     arc_challenge_val_rc_5shot (ce_loss)=2.472
2024-12-15T08:21:17.056133529Z     arc_challenge_val_rc_5shot (bpb)=3.565
2024-12-15T08:21:17.056134965Z     arc_challenge_val_rc_5shot (soft)=0.2539
2024-12-15T08:21:17.056136416Z     arc_challenge_val_rc_5shot (soft_log)=-1.46E+00

To see things in Comet: https://www.comet.com/ai2/olmo-core-1b/7a3614872861484dbc7ad651ad5c9e35

Comment on lines 104 to 106
# install ai2-olmo-eval from source git repo
"pip uninstall -y ai2-olmo-eval",
"pip install git+https://github.com/allenai/OLMo-in-loop-evals.git@moreeval",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will clean this up when the PR in olmo-eval lands

shuffle=False,
num_replicas=get_world_size(dp_process_group),
rank=get_rank(dp_process_group),
)

rank_batch_size_instances = max(0, rank_batch_size // self.task.max_sequence_length)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a bug. I don't see why we should divide batch size by seq len. Batch size was already number of examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch size was already number of examples.

Where was it set to number of examples instead of tokens? It should always be set in tokens. This change will cause bugs elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's derived from Line 230 of this file, eval_batch_size. Do you mean this variable should be in number of tokens? It seems to be a semantic change from the old repo, and each eval's max seq length is dependent on the task so we can't set a fixed batch size ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a change from the old repo.

each eval's max seq length is dependent on the task so we can't set a fixed batch size

Well batch size is roughly fixed by number of tokens, not instances. This is more efficient because we can pack more instances from shorter tasks together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I've reverted it so now eval_batch_size takes number of tokens.

@liujch1998 liujch1998 marked this pull request as ready for review December 15, 2024 08:45
@liujch1998 liujch1998 requested a review from epwalsh December 19, 2024 17:58
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise LGTM!

@@ -73,6 +76,48 @@ def build_trainer_config(common: CommonComponents) -> TrainerConfig:
cancel_check_interval=10,
),
)
.with_callback(
"downstream",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this "downstream_evaluator" to be consistent with the naming convention for other callbacks.

Suggested change
"downstream",
"downstream_evaluator",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

@liujch1998 liujch1998 requested a review from epwalsh December 19, 2024 19:42
@liujch1998 liujch1998 merged commit ee27348 into main Dec 19, 2024
14 checks passed
@liujch1998 liujch1998 deleted the moreeval branch December 19, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants