Merged commit includes the following changes:

532682786 by Waymo Research: Internal change. -- 532582773 by Waymo Research: Fix the MeanErrorMatcher and document Pose Estimation Metric. -- 531623382 by Waymo Research: Fix a bug of LET matching where prediction and ground truth is at different side of the sensor. -- 530728277 by Waymo Research: Update 2d_pvps_tutorial description. -- 530687829 by Waymo Research: Cleanup comments for keypoint metrics. -- 530686850 by Waymo Research: Internal -- 529494211 by Waymo Research: Fix v2 lidar utils. -- 529475406 by Waymo Research: Make sure all metrics for keypoints support padded predictions. -- 529475148 by Waymo Research: Fix a bug where processing submissions without predicted boxes failed. -- 529470846 by Waymo Research: Fix PEM matcher to properly assign overlapping labeled and unlabeled ground truth boxes to predictions. -- 526117149 by Waymo Research: Ignore predicted keypoints for objects without ground truth keypoints. -- 526116856 by Waymo Research: Fix box transform computation. This bug impacted MEAN_ERROR matcher resulted in mismatching a set of points with itself. -- 526106501 by Waymo Research: Update the motion tutorial with the correct number of map samples. -- 525224093 by Waymo Research: Improves error messages in Sim Agent evaluation code. -- 525194618 by Waymo Research: 2D Semseg Tutorial bugfix. -- PiperOrigin-RevId: 532682786
waymo-research · May 17, 2023 · 7275bc1 · 7275bc1
1 parent 1eabdda
commit 7275bc1
Show file tree

Hide file tree

Showing 19 changed files with 2,816 additions and 413 deletions.
diff --git a/README.md b/README.md
@@ -66,15 +66,15 @@ We released v1.4.0 of the Perception dataset.
 We released v1.3.2 of the Perception dataset to improve the quality and accuracy of the labels.
  - Updated 3D semantic segmentation labels, for better temporal consistency and to fix mislabeled points.
  - Updated 2D key point labels to fix image cropping issues.
- - Added `num_top_lidar_points_in_box` in [dataset.proto](waymo_open_dataset/dataset.proto) for the 3D Camera-Only Detection Challenge.
+ - Added `num_top_lidar_points_in_box` in [dataset.proto](src/waymo_open_dataset/dataset.proto) for the 3D Camera-Only Detection Challenge.
 
 ## April 2022 Update
 We released v1.3.1 of the Perception dataset to support the 2022 Challenges and have updated this repository accordingly.
  - Added metrics (LET-3D-APL and LET-3D-AP) for the 3D Camera-Only Detection Challenge.
  - Added 80 segments of 20-second camera imagery, as a test set for the 3D Camera-Only Detection Challenge.
- - Added z-axis speed and acceleration in [lidar label metadata](waymo_open_dataset/label.proto#L53-L60).
- - Fixed some inconsistencies in `projected_lidar_labels` in [dataset.proto](waymo_open_dataset/dataset.proto).
- - Updated the default configuration for the Occupancy and Flow Challenge, switching from aggregate waypoints to [subsampled waypoints](waymo_open_dataset/protos/occupancy_flow_metrics.proto#L38-L55).
+ - Added z-axis speed and acceleration in [lidar label metadata](src/waymo_open_dataset/label.proto#L53-L60).
+ - Fixed some inconsistencies in `projected_lidar_labels` in [dataset.proto](src/waymo_open_dataset/dataset.proto).
+ - Updated the default configuration for the Occupancy and Flow Challenge, switching from aggregate waypoints to [subsampled waypoints](src/waymo_open_dataset/protos/occupancy_flow_metrics.proto#L38-L55).
  - Updated the [tutorial](tutorial/tutorial_3d_semseg.ipynb) for 3D Semantic Segmentation Challenge with more detailed instructions.
 
 ## March 2022 Update
@@ -99,7 +99,7 @@ We released v1.1 of the Motion dataset to include lane connectivity information.
 
 We expanded the Waymo Open Dataset to also include a Motion dataset comprising object trajectories and corresponding 3D maps for over 100,000 segments. We have updated this repository to add support for this new dataset.
 
-Additionally, we added instructions and examples for the real-time detection challenges. Please follow these [Instructions](waymo_open_dataset/latency/README.md).
+Additionally, we added instructions and examples for the real-time detection challenges. Please follow these [Instructions](src/waymo_open_dataset/latency/README.md).
 
 ## Website
 

diff --git a/docs/images/pem_matching_fig.png b/docs/images/pem_matching_fig.png
diff --git a/docs/pose_estimation_metric.md b/docs/pose_estimation_metric.md
@@ -0,0 +1,168 @@
+# Metrics for the Pose Estimation Challenge
+
+## Supported metrics
+
+We provide a python library to compute and report on the result page a number of
+different metrics. The challenge participants may want to inspect different
+subsets of them by selecting a group of keypoints and thresholds.
+
+- [Pose Estimation Metric (**PEM**)](#PEM), a new metric created specifically
+  for the Pose Estimation challenge, in meters (lower is better). It is used to
+  rank the leaderboard.
+- <a name="mpjpe">Mean Per Joint Position Error (**MPJPE**)</a>, in meters
+  (lower is better). Useful to measure the quality of matched keypoints in easy to
+  interpret units.
+- <a name="pck">Percentage Of Correct Keypoints (**PCK**)</a>. The ratio of the
+  number of keypoints close to ground truth to the total number of keypoints. We
+  use thresholds `(0.05, 0.1, 0.2, 0.3, 0.4, 0.5)` relative to bounding
+  box scales, for example for a human-like 1x1x2m box, the box's scale will be
+  $(1 \cdot 1 \cdot 2)^\frac{1}{3} = 1.26$, so the $0.20$ threshold of this
+  scale will be 25cm and keypoints with errors less than 25cm will be considered
+  correct. The metric takes values in the `[0, 1]` range (higher is better).
+  Useful to understand the distribution of errors.
+- Precision of keypoints visibility (**Precision**). Values are
+  in the `[0, 1]` range (higher is better). Useful to gauge the precision of the
+  keypoint visibility classification and number of false positive detections.
+- Recall of keypoints visibility (**Recall**). Values are in the `[0, 1]` range
+  (higher is better). Useful to gauge recall of the keypoint visibility
+  classification and number of false negative detections.
+- <a name="oks">Precision at Object Keypoint Similarity (**OKS**)</a>
+  for different thresholds. Values are in the `[0, 1]` range (higher is better).
+  The OKS measures the distance between predicted and ground-truth keypoints
+  relative to a scale, specific for each keypoint type. For example,
+  the scale for hips is larger than the scale of wrists. Thus a 5mm error for
+  hips will result in larger Precision OKS values for hips compared with the
+  same 5mm error for wrists. The OKS metric can be used to evaluate the accuracy
+  of 2D keypoint detectors in a consistent and standardized way. By using OKS as
+  a point of comparison, participants can gain insights into the quality of
+  their 3D keypoint detectors relative to state-of-the-art 2D keypoint
+  detectors.
+- Average Precision at OKS (**OKS_AP**), averaged over
+  `[.5, .55, .60, .65, .70, .75, .80, 0.95]` thresholds. Values are in the
+   `[0, 1]` range (higher is better).
+
+
+NOTE: All auxiliary metrics for locations of keypoints (MPJPE, PCK, OKS)
+take into account only matched keypoints and provided only for information
+purposes.
+
+
+## PEM
+
+The set of well established metrics such as [MPJPE](#mpjpe), [PCK](#pck) or
+[OKS](#oks) provide valuable insights on quality of a keypoint localization
+method, but they do not take into account specifics of the partially labeled
+data and ignore quality of the object detection. In order to rank submissions
+for the challenge we introducing a new single metric called Pose Estimation
+Metric (**PEM**), which is
+
+- easily interpretable
+- sensitive to
+  - keypoint localization error and visibility classification accuracy
+  - number of false positive and false negative object detections
+- and not sensitive to
+  - Intersection over Union (**IoU**) of object detection to avoid a strong
+  dependency on 3D box accuracy.
+
+The PEM is a weighted sum of the [MPJPE](#mpjpe) over visible
+[matched](#object-matching-algorithm) keypoints and a penalty for
+unmatched keypoints (aka `mismatch_penalty`), expressed in meters.
+
+We compute the PEM on a set of candidate pairs of predicted and ground truth
+objects, for which at least one predicted keypoint is within a distance
+threshold constant $C$ from the ground truth box. The final object assignment
+is selected using the Hungarian method to minimize:
+
+$$\textbf{PEM}(Y,\hat{Y}) = \frac{\sum_{i\in M}\left\|y_{i} -
+\hat{y}_{i}\right\|_2 + C|U|}{|M| + |U|}$$
+
+where $M$ - a set of indices of matched keypoints, $U$ - a set of indices of
+unmatched keypoints (ground truth keypoints without matching predicted keypoints
+or predicted keypoints for unmatched objects); Sets $$Y= \left\{y_i\right\}_{i
+\in M}$ and $\hat{Y} = \left\{\hat{y}_i\right\}_{i \in M}$$ are ground truth
+and predicted 3D coordinates of keypoints; $C=0.25$ - a constant penalty for
+an unmatched keypoint.
+
+
+## Object Matching Algorithm
+
+
+The [Pose Estimation challenge](https://waymo.com/open/challenges/2023/pose-estimation/)
+requires participants to provide keypoints for all human objects in a scene. To
+evaluate the performance of the predictions, the evaluation service uses one of
+the provided matching algorithms to automatically find correspondence between
+predicted (**PR**) and ground truth (**GT**) objects. The matching algorithm
+outputs three sets of objects:
+
+
+ - true positives (**TP**), which are pairs of a GT object and its corresponding PR object
+ - false positives (**FP**), which are PR objects without a corresponding GT object
+ - false negatives (**FN**), which are GT objects without a corresponding PR object
+
+However, matching is complicated by the fact that not all GT objects in WOD have
+visible keypoints. To address this, two kinds of GT objects are distinguished:
+
+  - $GT_i$ - GT objects without any visible keypoints, which includes unlabeled
+  or heavily occluded human objects.
+  - $GT_v$ - GT boxes with at least one
+  visible keypoint.
+
+| ![a toy example to illustrate $GT_v$ and $GT_i$](images/pem_matching_fig.png) |
+| :-: |
+| Fig 1. A toy scene |
+
+On the Fig. 1 you can see:
+
+- Ground truth objects:
+  - $GT_i$: $G_0$, $G_1$, $G_3$, $G_5$, $G_7$
+  - $GT_v$: $G_2$, $G_4$, $G_6$, $G_8$, $G_9$
+- Predicted objects:
+  $P_0$, $P_1$, $P_2$, $P_3$, $P_4$, $P_5$, $P_6$, $P_7$
+
+If a PR object corresponds to a $GT_i$ object, no penalty is assigned since the
+MPJPE cannot be computed for such matches. Only matches between $GT_v$ objects and
+PR objects are considered for the computation of the PEM metric.
+
+Since computing the PEM metric for all possible matches between GT and PR is not
+feasible for scenes with many objects, several heuristics are used to narrow
+down the set of candidate matches. The official matching algorithm for the
+challenge is the
+[`MeanErrorMatcher`](src/waymo_open_dataset/metrics/python/keypoint_metrics.py),
+which computes keypoint errors for each pair of candidate matches. It has two stages:
+
+  1. When keypoints clearly fall in $GT_i$ objects (see criterion in
+    [keypoint_metrics.py](src/waymo_open_dataset/metrics/python/keypoint_metrics.py)),
+    remove them from considerations, without any penalties.
+  2. For all remaining candidate GTv ground truth boxes and detections pairs,
+     perform Hungarian matching that minimizes the PEM metric.
+For the example on the Fig 1. stages of the matching algorithm should work like
+this:
+
+- stage #1:
+    - Select pairs of GT and PR objects for which at least one PR keypoint is
+      inside GT box enlarged by 25cm.
+    - assume $PEM(G_4, P_5) > C$ and $PEM(G_6, P_6) < C$
+    - should exclude: $(G_0, P_0)$, $(G_1, P_1)$, $(G_3, P_3)$,
+      $(G_5, P_5)$ pairs.
+- stage #2:
+    - consider only GTv objects
+    - compute errors for candidate pairs and populate the assignment error $A$
+     (aka cost matrix): $A_{k,j}=PEM(G_k, P_j)$ for
+     $(G_2, P_2)$, $(G_4, P_5)$, $(G_6, P_6)$, $(G_8, P_7)$,
+     $(G_9, P_7)$ and set the rest of the 8x7 matrix $A=\infty$.
+    - assuming $PEM(G_9, P_7) < PEM(G_8, P_7)$, the matching assignment should
+     output the following pairs:
+      $(G_1, P_1)$, $(G_2, P_2)$, $(G_6, P_6)$, $(G_9, P_7)$
+- the final output of the matcher should be:
+      $(G_2, P_2)$, $(G_6, P_6)$, $(G_9, P_7)$,
+      $(G_4, \emptyset)$, $(G_8, \emptyset)$,
+      $(\emptyset, P_4)$
+
+For the PEM metric, each ground-truth box – GTV and GTi – can only be
+associated with a maximum of 1 detection. To maximize your PEM scores, you are
+responsible for removing duplicate detections.
+
+NOTE: The WOD library also implements the [`CppMatcher`](src/waymo_open_dataset/metrics/python/keypoint_metrics.py)
+which maximizes total Intersection over Union (IoU) between predicted and ground
+truth boxes. However, this matcher requires all predictions to have bounding
+boxes and is provided only as a reference.
diff --git a/src/waymo_open_dataset/metrics/detection_metrics.cc b/src/waymo_open_dataset/metrics/detection_metrics.cc
@@ -443,6 +443,18 @@ std::vector<DetectionMetrics> ComputeDetectionMetrics(
   return metrics;
 }
 
+std::vector<DetectionMeasurements> MergeDetectionMeasurements(
+    const Config& config,
+    const std::vector<std::vector<DetectionMeasurements>>& measurements) {
+  const int num_frames = measurements.size();
+  if (measurements.empty()) return {};
+  std::vector<DetectionMeasurements> measurements_merged = measurements[0];
+  for (int i = 1; i < num_frames; ++i) {
+    MergeDetectionMeasurementsVector(measurements[i], &measurements_merged);
+  }
+  return measurements_merged;
+}
+
 Config EstimateScoreCutoffs(const Config& config,
                             const std::vector<std::vector<Object>>& pds,
                             const std::vector<std::vector<Object>>& gts) {

diff --git a/src/waymo_open_dataset/metrics/detection_metrics.h b/src/waymo_open_dataset/metrics/detection_metrics.h
@@ -72,6 +72,21 @@ std::vector<DetectionMetrics> ComputeDetectionMetrics(
     const std::vector<std::vector<Object>>& gts,
     ComputeIoUFunc custom_iou_func = nullptr);
 
+// Merges detection measurements for multiple frames.
+// Each element of `measurements` is an output of ComputeDetectionMeasurements.
+// The output vector is ordered as:
+// [{generator_i_shard_j_difficulty_level_k}].
+// i \in [0, num_breakdown_generators).
+// j \in [0, num_shards for the i-th breakdown generator).
+// k \in [0, num_difficulty_levels for each shard in the  i-th breakdown
+//   generator).
+//
+// Requires: Every element of `measurements` is computed with the same
+// configuration.
+std::vector<DetectionMeasurements> MergeDetectionMeasurements(
+    const Config& config,
+    const std::vector<std::vector<DetectionMeasurements>>& measurements);
+
 // Estimates the score cutoffs that evenly sample the P/R curve.
 // pds: the predicted objects.
 // gts: the ground truths.

diff --git a/src/waymo_open_dataset/metrics/iou.cc b/src/waymo_open_dataset/metrics/iou.cc
@@ -256,9 +256,9 @@ double ComputeLongitudinalAffinity(
       std::max(CenterVectorLength(calibrated_prediction_box), kEpsilon);
 
   // Compute the cos(theta), where theta is the angle between the center vectors
-  // of prediction and ground truth.
-  const double cos_of_gt_pd_angle =
-      Clamp(gt_dot_pd / gt_range / pd_range, 0.0, 1.0);
+  // of prediction and ground truth. Note, this value can be negative, meaning
+  // the angle between the prediction and ground truth is larger than 90 degree.
+  const double cos_of_gt_pd_angle = gt_dot_pd / gt_range / pd_range;
 
   // Compute the error terms as a percentage of the max tolerance.
   const float max_range_tolerance_meter =
@@ -302,6 +302,9 @@ Label::Box AlignedPredictionBox(
       //   P' = |G|* cos(theta) * P/|P| = dot(G, P)/|P|^2 * P,
       // where G = [gt_x, gt_y, gt_z] and P = [pd_x, pd_y, pd_z] are the vectors
       // that describe the centers of a ground truth box and a prediction box.
+      // Note this still applies in the case where dot(G, P) < 0, when the
+      // multiplier is negative, i.e. P and G are at different side of the
+      // sensor.
       const double gt_dot_pd =
           CenterDotProduct(prediction_box, ground_truth_box);
       const double pd_range_sq =