Llama v2 70 b lora update #358

itayhubara · 2024-02-29T14:31:16Z

No description provided.

github-actions · 2024-02-29T14:31:30Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

mmarcinkiewicz · 2024-02-29T16:23:38Z

mlperf_logging/benchmark_meta.py

@@ -15,6 +15,7 @@
        'ncf': 10,
        'rnnt': 10,
        'unet3d': 40,
+        'llama2_70b_lora': 10,


I'm not convinced this is a good benchmark name, seems a bit overcomplicated. Have we agreed on this? How about just lora?

this is what was decided by training WG

mmarcinkiewicz · 2024-02-29T16:23:53Z

mlperf_logging/compliance_checker/closed_common.yaml

@@ -0,0 +1,11 @@
+


The whole file seems unnecessary here?

correct, I removed it

mmarcinkiewicz · 2024-02-29T16:24:22Z

mlperf_logging/compliance_checker/mlp_compliance.py

@@ -263,13 +263,13 @@ def check_loglines(self, loglines, config):
          self.put_message('No log lines detected')

        enqueue_config(config)
-
+        


what happened here?

mmarcinkiewicz · 2024-02-29T16:24:54Z

mlperf_logging/compliance_checker/training_4.0.0/closed_llama2_70b_lora.yaml

+    REQ:   EXACTLY_ONE
+
+- KEY:
+    NAME:  gradient_accumulation_steps


duplicated below

correct I removed the duplication

mmarcinkiewicz · 2024-02-29T16:25:08Z

mlperf_logging/compliance_checker/training_4.0.0/closed_llama2_70b_lora.yaml

+    REQ:   EXACTLY_ONE
+
+- KEY:
+    NAME:  lora_alpha


duplicated in L12

mmarcinkiewicz · 2024-02-29T16:25:49Z

mlperf_logging/compliance_checker/training_4.0.0/closed_llama2_70b_lora.yaml

+    REQ:   EXACTLY_ONE
+
+- KEY:
+    NAME:  lora_rank


How about we put constrains on this value to 16?

mmarcinkiewicz · 2024-02-29T16:32:29Z

mlperf_logging/mllog/constants.py

@@ -167,7 +167,7 @@
 EVAL_MAX_PREDICTION_SYMBOLS = "eval_max_prediction_symbols"
 START_WARMUP_STEP = "start_warmup_step"
 INIT_CHECKPOINT_STEP = "init_checkpoint_step"
-
+LORA_ALPHA = "lora_alpha"


I can't really comment in in the right place, but in L40 there are benchmark-specific name constants. Lora should go there as well

in L40 we have benchmark names. Alpha is a hyperparameter. Anyway I added in L40
LLAMA2_70B_LORA = "llama2_70b_lora"

nv-rborkar · 2024-03-22T17:38:44Z

@itayhubara can you please resolve the conflicts so that we can merge. Thanks!

itayhubara · 2024-03-31T07:42:34Z

Done, the root cause of all the issues was GNN PR which was created after llama PR...

itayhubara added 2 commits February 29, 2024 12:59

adding rcps and logging rules to enable referance code to pass checker

fb17bd8

fixing HP for BS=16,32

7e21d49

itayhubara requested review from a team as code owners February 29, 2024 14:31

mmarcinkiewicz reviewed Feb 29, 2024

View reviewed changes

itayhubara added 3 commits March 1, 2024 06:41

adding 4.0.0 to repo checker

f02c455

fixing rcp per new eval frequancy

54f8589

fixing comments on PR

0f894e1

nv-rborkar previously approved these changes Mar 20, 2024

View reviewed changes

mwawrzos mentioned this pull request Mar 21, 2024

[logging] samples_num constant #361

Closed

adding addtional 4 runs per training WG decision

1f55590

itayhubara dismissed nv-rborkar’s stale review via 1f55590 March 26, 2024 10:29

fixing merging issue and switching to samples_count

72b575f

switching from samples_num to samples_count for eval as well

022b391

nv-rborkar approved these changes Apr 2, 2024

View reviewed changes

nv-rborkar merged commit 5158299 into mlcommons:master Apr 2, 2024
1 check passed

github-actions bot locked and limited conversation to collaborators Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama v2 70 b lora update #358

Llama v2 70 b lora update #358

itayhubara commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 •

edited

Loading

mmarcinkiewicz Feb 29, 2024

itayhubara Mar 14, 2024

mmarcinkiewicz Feb 29, 2024

itayhubara Mar 14, 2024

mmarcinkiewicz Feb 29, 2024

mmarcinkiewicz Feb 29, 2024

itayhubara Mar 14, 2024

mmarcinkiewicz Feb 29, 2024

mmarcinkiewicz Feb 29, 2024

mmarcinkiewicz Feb 29, 2024

itayhubara Mar 14, 2024

nv-rborkar commented Mar 22, 2024

itayhubara commented Mar 31, 2024

		@@ -263,13 +263,13 @@ def check_loglines(self, loglines, config):
		self.put_message('No log lines detected')

		enqueue_config(config)

Llama v2 70 b lora update #358

Llama v2 70 b lora update #358

Conversation

itayhubara commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nv-rborkar commented Mar 22, 2024

itayhubara commented Mar 31, 2024

github-actions bot commented Feb 29, 2024 •

edited

Loading