Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libraries update for Stable Diffusion (#335) #336

Merged
merged 3 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mlperf_logging/benchmark_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
'minigo': 10,
'resnet': 5,
'ssd': 5,
'stable_diffusion': 10,
'transformer': 10,
'ncf': 10,
'rnnt': 10,
Expand Down
21 changes: 12 additions & 9 deletions mlperf_logging/compliance_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
### Existing config files for training submissions

3.1.0/common.yaml - currently the default config file, checks common fields complience and equeues benchmark-specific config file
3.1.0/closed_common.yaml - the common rules file for closed submissions. These rules apply to all benchmarks
3.1.0/closed_common.yaml - the common rules file for closed submissions. These rules apply to all benchmarks
3.1.0/open_common.yaml - the common rules file for open submissions. These rules apply to all benchmarks
3.1.0/closed_resnet.yaml - Per-benchmark rules, closed submissions.
3.1.0/closed_ssd.yaml
Expand All @@ -33,6 +33,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
3.1.0/closed_bert.yaml
3.1.0/closed_dlrm_dcnv2.yaml
3.1.0/closed_gpt3.yaml
3.1.0/closed_stable_diffusion.yaml
3.1.0/open_resnet.yaml - Per-benchmark rules, closed submissions.
3.1.0/open_ssd.yaml
3.1.0/open_maskrcnn.yaml
Expand All @@ -41,6 +42,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
3.1.0/open_bert.yaml
3.1.0/open_dlrm_dcnv2.yaml
3.1.0/open_gpt3.yaml
3.1.0/open_stable_diffusion.yaml

### Existing config files for HPC submissions

Expand All @@ -64,15 +66,15 @@ Compliance checking is done following below algorithm.
2. If present, evaluate `CHECK` section, and raise an exception if the result is false
7. Print all warning messages

Possible side effects of yaml sections execution can be [printing output](#other-operations), or [enqueueing
Possible side effects of yaml sections execution can be [printing output](#other-operations), or [enqueueing
additional yaml files to be verified](#enqueuing-additional-config-files).

### Config file syntax
Rules to be checked are provided in yaml (config) file. A config file contains the following records:

#### `BEGIN` record
Defines `CODE` to be executed before any other rules defined in the current file. This record is optional
and there can be up to a single `BEGIN` record per config file.
and there can be up to a single `BEGIN` record per config file.

Example:

Expand All @@ -87,6 +89,7 @@ The following fields are optional:
- `REQ` - specifies the requirement regarding occurrence. Possible values :
- `EXACTLY_ONE` - current key has to appear exactly once
- `AT_LEAST_ONE` - current key has to appear at least once
- `AT_LEAST(n)` - current key has to appear at least n times
- `AT_LEAST_ONE_OR(alternatives)` - current key or one of the alternative has to appear at least once;
alternatives is a comma separated list of keys
- `PRE` - code to be executed before performing checks
Expand All @@ -112,14 +115,14 @@ The following fields are optional:

#### Global and local state access

During processing of the records there is a global state `s` maintained, accessible from
During processing of the records there is a global state `s` maintained, accessible from
code provided in yaml. In addition, rules can access the information fields (values) `v`
of the record, as well as timestamp and the original line string as part of the record `ll`.

Global state `s` can be used to enforce any cross keys rules, by updating the global state
Global state `s` can be used to enforce any cross keys rules, by updating the global state
in `POST` (or `PRE`) of one `KEY` and using that information for `CHECK` of another `KEY`.
For each config file, `s` starts as an empty dictionary, so in order to track global state
it would require adding an entry to `s`.
For each config file, `s` starts as an empty dictionary, so in order to track global state
it would require adding an entry to `s`.

Example:

Expand Down Expand Up @@ -152,7 +155,7 @@ Config files in the queue are processed independently, meaning that they do not

Each config file may define it's `BEGIN` and `END` records, as well as any other `KEY` rules.

Example:
Example:

- KEY:
NAME: submission_benchmark
Expand All @@ -164,7 +167,7 @@ Example:
#### Other operations

`CODE`, `REQ`, and `POST` fields are executed using python's `exec` function. `CHECK` is performed
using `eval` call. As such, any legal python code would be suitable for use.
using `eval` call. As such, any legal python code would be suitable for use.

For instance, can define rules that would print out information as shown in the [example above](#global-and-local-state-access).

Expand Down
13 changes: 12 additions & 1 deletion mlperf_logging/compliance_checker/mlp_compliance.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def parse_alternatives(self, string):
alternatives = in_pharentises.split(',')
return [s.strip() for s in alternatives]

def parse_at_least(self, string):
n_string = string[len('AT_LEAST(') : -1]
n = int(n_string)
return n

def configured_checks(self, loglines, config_file):
with open(config_file) as f:
checks = yaml.load(f, Loader=yaml.BaseLoader)
Expand All @@ -164,7 +169,7 @@ def configured_checks(self, loglines, config_file):
begin_blocks = [x for x in checks if list(x)[0]=='BEGIN']
assert(len(begin_blocks)<=1) # up to one begin block
if len(begin_blocks)==1:
exec(begin_blocks[0]['BEGIN']['CODE'].strip(), state)
exec(begin_blocks[0]['BEGIN']['CODE'].strip(), state, locals())

key_records = {}
for k in checks:
Expand Down Expand Up @@ -231,6 +236,12 @@ def configured_checks(self, loglines, config_file):
self.put_message(f"Required AT_LEAST_ONE occurrence of '{k}' but found {len(reported_values[k])}",
key=k)

if v['REQ'].startswith('AT_LEAST'):
n = self.parse_at_least(v['REQ'])
if len(reported_values[k])<n:
self.put_message(f"Required AT_LEAST({n}) occurrence of '{k}' but found {len(reported_values[k])}",
key=k)

if v['REQ'].startswith('AT_LEAST_ONE_OR'):
alternatives.add(tuple({k, *self.parse_alternatives(v['REQ'])}))

Expand Down
17 changes: 9 additions & 8 deletions mlperf_logging/compliance_checker/mlp_parser/ruleset_310.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@
import json
import re
import sys
from dataclasses import dataclass

from io import open

LogLine = collections.namedtuple('LogLine', [
'full_string', # the complete line as a string
'timestamp', # seconds as a float, e.g. 1234.567
'key', # the string key
'value', # the parsed value associated with the tag, or None if no value
'lineno', # the line number in the file
])

@dataclass
class LogLine:
"""Class for keeping track of an item in inventory."""
full_string: str
timestamp: float
key: str
value: str
lineno: int

TOKEN = ':::MLLOG '

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
- KEY:
NAME: submission_benchmark
REQ: EXACTLY_ONE
CHECK: " v['value'] in ['resnet', 'ssd', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
CHECK: " v['value'] in ['resnet', 'ssd', 'stable_diffusion', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
POST: " enqueue_config('training_3.1.0/closed_{}.yaml'.format(v['value'])) "

- KEY:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Stable diffusion uses two metrics, FID and CLIP.
# These metrics can be calculated offline, using different scripts
# and logged seperatly. Therefore, we create a virtual key
# called aggregated_eval_accuracy, which aggregates
# both metrics into a single log line

- BEGIN:
CODE: |
from dataclasses import replace
agg_eval_lines = {}
for line in loglines:
if line.key == "eval_accuracy":
step_num = line.value['metadata']['step_num']
if step_num not in agg_eval_lines:
new_line = replace(line) # Make a copy
new_line.key = "aggregated_eval_accuracy"
new_line.full_string = "" # Not needed
new_line.lineno = -1 # Not needed
new_line.value = {'value': {'step_num': step_num}, 'metadata':{}}
agg_eval_lines[step_num] = new_line

agg_eval_lines[step_num].timestamp = max(line.timestamp, agg_eval_lines[step_num].timestamp)
agg_eval_lines[step_num].value['value'][line.value['metadata']['metric']] = line.value['value']
loglines.extend(agg_eval_lines.values())


- KEY:
NAME: global_batch_size
REQ: AT_LEAST_ONE
CHECK: " v['value'] >= 0 "

- KEY:
NAME: opt_name
REQ: EXACTLY_ONE
CHECK: " v['value'] == 'adamw' "

- KEY:
NAME: opt_adamw_beta_1
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.9 "

- KEY:
NAME: opt_adamw_beta_2
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.999 "

- KEY:
NAME: opt_adamw_epsilon
REQ: EXACTLY_ONE
CHECK: " v['value'] == 1e-08 "

- KEY:
NAME: opt_adamw_weight_decay
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.01 "

- KEY:
NAME: opt_base_learning_rate
REQ: EXACTLY_ONE
CHECK: " v['value'] >= 0.0 "

- KEY:
NAME: opt_learning_rate_warmup_steps
REQ: EXACTLY_ONE
CHECK: " v['value'] >= 0 "

- KEY:
NAME: aggregated_eval_accuracy
REQ: AT_LEAST(2)
CHECK:
- "'FID' in v['value']"
- "'CLIP' in v['value']"
- "'step_num' in v['value']"
ATLEAST_ONE_CHECK: "(0.0 <= v['value']['FID'] <= 90.0) and (0.15 <= v['value']['CLIP'] <= 1.0)"

Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
- KEY:
NAME: submission_benchmark
REQ: EXACTLY_ONE
CHECK: " v['value'] in ['resnet', 'ssd', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
CHECK: " v['value'] in ['resnet', 'ssd', 'stable_diffusion', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
POST: " enqueue_config('training_3.1.0/open_{}.yaml'.format(v['value'])) "

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Stable diffusion uses two metrics, FID and CLIP.
# These metrics can be calculated offline, using different scripts
# and logged seperatly. Therefore, we create a virtual key
# called aggregated_eval_accuracy, which aggregates
# both metrics into a single log line

- BEGIN:
CODE: |
from dataclasses import replace
agg_eval_lines = {}
for line in loglines:
if line.key == "eval_accuracy":
step_num = line.value['metadata']['step_num']
if step_num not in agg_eval_lines:
new_line = replace(line) # Make a copy
new_line.key = "aggregated_eval_accuracy"
new_line.full_string = "" # Not needed
new_line.lineno = -1 # Not needed
new_line.value = {'value': {'step_num': step_num}, 'metadata':{}}
agg_eval_lines[step_num] = new_line

agg_eval_lines[step_num].timestamp = max(line.timestamp, agg_eval_lines[step_num].timestamp)
agg_eval_lines[step_num].value['value'][line.value['metadata']['metric']] = line.value['value']
loglines.extend(agg_eval_lines.values())

- KEY:
NAME: aggregated_eval_accuracy
REQ: AT_LEAST(2)
CHECK:
- "'FID' in v['value']"
- "'CLIP' in v['value']"
- "'step_num' in v['value']"
ATLEAST_ONE_CHECK: "v['value']['FID'] >= 0.0 and v['value']['CLIP'] <= 1.0"

34 changes: 28 additions & 6 deletions mlperf_logging/rcp_checker/rcp_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
'''

import argparse
from collections import defaultdict
import glob
import json
import logging
Expand All @@ -26,6 +27,7 @@
'ssd' : 5,
'unet3d' : 40,
'rnnt': 10,
'stable_diffusion': 10,
},
"hpc": {
'cosmoflow': 10,
Expand All @@ -48,6 +50,10 @@ def read_submission_file(result_file, use_train_samples):
bs = -1
benchmark = None

# FID and CLIP metrics for stable diffusion are logged asynchronously
# and indepently from each others. We track the eval results
# so we can get the first eval step that passes the convergence criteria
stable_diffusion_eval_results = defaultdict(dict)
with open(result_file, 'r', encoding='latin-1') as f:
# TODO: use mlperf_logging.compliance_checker.mlp_parser instead
file_contents = f.readlines()
Expand All @@ -63,23 +69,39 @@ def read_submission_file(result_file, use_train_samples):
benchmark = json.loads(str)["value"]
if benchmark != "bert" and use_train_samples:
use_train_samples = False
if not use_train_samples and ("eval_error" in str or "eval_accuracy" in str):

if benchmark == "stable_diffusion" and ("eval_error" in str or "eval_accuracy" in str):
eval_accuracy_str = str
eval_step = json.loads(eval_accuracy_str)["metadata"]["step_num"]
eval_metric = json.loads(eval_accuracy_str)["metadata"]["metric"]
eval_score = json.loads(eval_accuracy_str)["value"]
stable_diffusion_eval_results[eval_step][eval_metric] = eval_score
elif not use_train_samples and ("eval_error" in str or "eval_accuracy" in str):
eval_accuracy_str = str
conv_epoch = json.loads(eval_accuracy_str)["metadata"]["epoch_num"]
conv_epoch = round(conv_epoch, 3)
if use_train_samples and "train_samples" in str:
elif use_train_samples and "train_samples" in str:
eval_accuracy_str = str
conv_epoch = json.loads(eval_accuracy_str)["value"]

if "run_stop" in str:
# Epochs to converge is the the last epochs value on
# eval_accuracy line before run_stop
conv_result = json.loads(str)["metadata"]["status"]
if conv_result == "success":
subm_epochs = conv_epoch
not_converged = 0
# Epochs to converge is the the last epochs value on
# eval_accuracy line before run_stop. Except for Stable Diffusion
# where we use the first eval step that passes the convergence criteria
if benchmark == "stable_diffusion":
passing_epochs = []
for eval_step, eval_result in stable_diffusion_eval_results.items():
# TODO: we shouldn't hardcode the convergence criteria here !
if eval_result["FID"] <= 90.0 and eval_result["CLIP"] >= 0.15:
passing_epochs.append(eval_step)
conv_epoch = min(passing_epochs)
subm_epochs = conv_epoch
else:
subm_epochs = 1e9
not_converged = 1
subm_epochs = 1e9

if not_converged:
logging.warning(' Run incomplete or did not converge. Marking as infinite.')
Expand Down
Loading