Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async eval callback #702

Merged
merged 66 commits into from
Dec 19, 2023
Merged

Async eval callback #702

merged 66 commits into from
Dec 19, 2023

Conversation

aspfohl
Copy link
Contributor

@aspfohl aspfohl commented Oct 27, 2023

Here's an example of a finetuning run that I've split icl light eval async:
Screenshot 2023-11-09 at 4 12 25 PM

Check out the run yaml by describing the run: meow-dVKYjP
And then subsequent eval run yamls: eval0-meow-W8dG0M

Wandb: https://wandb.ai/mosaic-ml/llm-foundry-scripts_train/runs/lgrecstd/overview?workspace=user-aspfohl

scripts/train/train.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires bumping / requiring mcli pin

llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
@aspfohl aspfohl marked this pull request as ready for review November 10, 2023 01:37
Copy link
Collaborator

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple minor comments

llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
.github/workflows/pytest-gpu.yaml Show resolved Hide resolved
tests/callbacks/test_async_eval_callback.py Outdated Show resolved Hide resolved
scripts/train/train.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
@aspfohl
Copy link
Contributor Author

aspfohl commented Dec 7, 2023

Example of latest successful runs (trivial 1b ex - async every ba, trains for 3ba)

mpt-1b-demo-d2tV6B
eval-2ba-mpt-1b-demo-tYzJlk (Actually batch 1)
eval-3ba-mpt-1b-demo-N83r5n (Actually batch 2)
eval-final-mpt-1b-demo-m2Pf1K (Batch 3 - latest/final)

Messed up the experiment tracker config so rerunning now once there's compute availability..

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few remaining comments

scripts/train/train.py Outdated Show resolved Hide resolved
llmfoundry/utils/builders.py Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
llmfoundry/callbacks/async_eval_callback.py Outdated Show resolved Hide resolved
@aspfohl aspfohl enabled auto-merge (squash) December 13, 2023 18:20
@aspfohl aspfohl merged commit a7e916b into main Dec 19, 2023
10 checks passed
@aspfohl aspfohl deleted the anna/asynceval branch December 19, 2023 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants