Skip to content

Commit

Permalink
Replace ImageNet ensemble baseline with Robustness Metrics.
Browse files Browse the repository at this point in the history
There are two approaches to implement ensembles.

1. Load all SavedModels into a single model.
  + Pro: Simple to compute results.
  + Con: All models must fit in memory and compute can't parallelize across models.
2. Eval each model in parallel, saving predictions. Then load predictions and compute metrics. (approach in Uncertainty Baselines)
  + Pro: Scales with compute and memory.
  + Con: Requires two stages (the first uses accelerators, the second is CPU-only). We're already doing the first stage to report non-ensemble results. So two stages is not that inconvenient.

This CL does #2.

Fixes google/uncertainty-baselines#63, google/uncertainty-baselines#71.

Note: I added 'ece' back to the imagenet_variants report.

TODOs in later PRs
+ Loading predictions is slow. Each file is at most 200MB with 50K predictions of 1000 float32 values, and read_predictions shouldn't take this long. np.load gets, say, read speeds of 200 MB/s (https://stackoverflow.com/a/30332316). It may be because we're loading batch_size=1?
+ Replace het_ensemble.py and sngp_ensemble.py.

PiperOrigin-RevId: 370938990
  • Loading branch information
dustinvtran authored and copybara-github committed May 18, 2021
1 parent 4929fd6 commit fa6cafc
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 5 deletions.
12 changes: 12 additions & 0 deletions robustness_metrics/metrics/serialization.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ def add_predictions(self,
tf.convert_to_tensor(model_predictions.predictions, dtype=tf.float32))
serialized_metadata = {}
for key, value in metadata.items():
if isinstance(value, tf.Tensor):
value = value.numpy()
if hasattr(value, "dtype") and value.dtype == np.int:
if isinstance(value, np.ndarray):
value = [int(x) for x in value.tolist()]
Expand All @@ -56,6 +58,9 @@ def add_predictions(self,
value = [float(x) for x in value.tolist()]
else:
value = float(value)
# Convert bytes (e.g., ImageNetVidRobust's video_frame_id).
if isinstance(value, bytes):
value = value.decode("utf-8")
serialized_metadata[key] = value
serialized_metadata = json.dumps(serialized_metadata).encode()
tf_example = tf.train.Example(features=tf.train.Features(feature={
Expand Down Expand Up @@ -98,4 +103,11 @@ def parse(features_serialized):
prediction = types.ModelPredictions(
predictions=example["predictions"].numpy())
metadata = json.loads(example["metadata"].numpy())
# Apply a special case to lists of size 1. We need to adjust for the fact
# that int-casting a Tensor with shape [1] works (this may be the original
# element), but int-casting a list of size 1 (this may be the saved
# element) doesn't work.
for key, value in metadata.items():
if isinstance(value, list) and len(value) == 1:
metadata[key] = value[0]
yield prediction, metadata
11 changes: 6 additions & 5 deletions robustness_metrics/reports/imagenet_variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,12 @@ class ImagenetVariantsReport(base.Report):
This report contains the following ImageNet variants:
* imagenet
* imagenet_a,
* imagenet_v2 (all variants)
* imagenet_a
* imagenet_v2/matched_frequency
* imagenet_c (all variants)
For each dataset, we compute accuracy, expected calibration
error (ece), log-likelihood, Brier, timing, and adaptive ECE.
For each dataset, we compute accuracy, expected calibration error,
log-likelihood, Brier.
"""

def __init__(self):
Expand All @@ -77,7 +78,7 @@ def __init__(self):

def _yield_metrics_to_evaluate(self, use_dataset_labelset=None):
"""Yields metrics to be evaluated."""
metrics = ["accuracy", "nll", "brier"]
metrics = ["accuracy", "ece", "nll", "brier"]
if use_dataset_labelset is not None:
metrics = [f"{metric}(use_dataset_labelset={use_dataset_labelset})"
for metric in metrics]
Expand Down

0 comments on commit fa6cafc

Please sign in to comment.