Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama v2 70 b lora update #358

Merged
merged 8 commits into from
Apr 2, 2024

Conversation

itayhubara
Copy link
Contributor

No description provided.

@itayhubara itayhubara requested review from a team as code owners February 29, 2024 14:31
Copy link

github-actions bot commented Feb 29, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@@ -15,6 +15,7 @@
'ncf': 10,
'rnnt': 10,
'unet3d': 40,
'llama2_70b_lora': 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is a good benchmark name, seems a bit overcomplicated. Have we agreed on this? How about just lora?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what was decided by training WG

@@ -0,0 +1,11 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole file seems unnecessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, I removed it

@@ -263,13 +263,13 @@ def check_loglines(self, loglines, config):
self.put_message('No log lines detected')

enqueue_config(config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here?

REQ: EXACTLY_ONE

- KEY:
NAME: gradient_accumulation_steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct I removed the duplication

REQ: EXACTLY_ONE

- KEY:
NAME: lora_alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated in L12

REQ: EXACTLY_ONE

- KEY:
NAME: lora_rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we put constrains on this value to 16?

@@ -167,7 +167,7 @@
EVAL_MAX_PREDICTION_SYMBOLS = "eval_max_prediction_symbols"
START_WARMUP_STEP = "start_warmup_step"
INIT_CHECKPOINT_STEP = "init_checkpoint_step"

LORA_ALPHA = "lora_alpha"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really comment in in the right place, but in L40 there are benchmark-specific name constants. Lora should go there as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in L40 we have benchmark names. Alpha is a hyperparameter. Anyway I added in L40
LLAMA2_70B_LORA = "llama2_70b_lora"

nv-rborkar
nv-rborkar previously approved these changes Mar 20, 2024
@nv-rborkar
Copy link
Contributor

@itayhubara can you please resolve the conflicts so that we can merge. Thanks!

@itayhubara
Copy link
Contributor Author

Done, the root cause of all the issues was GNN PR which was created after llama PR...

@nv-rborkar nv-rborkar merged commit 5158299 into mlcommons:master Apr 2, 2024
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants