-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama v2 70 b lora update #358
Llama v2 70 b lora update #358
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
mlperf_logging/benchmark_meta.py
Outdated
@@ -15,6 +15,7 @@ | |||
'ncf': 10, | |||
'rnnt': 10, | |||
'unet3d': 40, | |||
'llama2_70b_lora': 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is a good benchmark name, seems a bit overcomplicated. Have we agreed on this? How about just lora
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what was decided by training WG
@@ -0,0 +1,11 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole file seems unnecessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, I removed it
@@ -263,13 +263,13 @@ def check_loglines(self, loglines, config): | |||
self.put_message('No log lines detected') | |||
|
|||
enqueue_config(config) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened here?
REQ: EXACTLY_ONE | ||
|
||
- KEY: | ||
NAME: gradient_accumulation_steps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct I removed the duplication
REQ: EXACTLY_ONE | ||
|
||
- KEY: | ||
NAME: lora_alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated in L12
REQ: EXACTLY_ONE | ||
|
||
- KEY: | ||
NAME: lora_rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we put constrains on this value to 16?
@@ -167,7 +167,7 @@ | |||
EVAL_MAX_PREDICTION_SYMBOLS = "eval_max_prediction_symbols" | |||
START_WARMUP_STEP = "start_warmup_step" | |||
INIT_CHECKPOINT_STEP = "init_checkpoint_step" | |||
|
|||
LORA_ALPHA = "lora_alpha" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really comment in in the right place, but in L40 there are benchmark-specific name constants. Lora should go there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in L40 we have benchmark names. Alpha is a hyperparameter. Anyway I added in L40
LLAMA2_70B_LORA = "llama2_70b_lora"
@itayhubara can you please resolve the conflicts so that we can merge. Thanks! |
Done, the root cause of all the issues was GNN PR which was created after llama PR... |
No description provided.