Skip to content

Commit

Permalink
Merged commit includes the following changes:
Browse files Browse the repository at this point in the history
532682786  by Waymo Research:

    Internal change.

--
532582773  by Waymo Research:

    Fix the MeanErrorMatcher and document Pose Estimation Metric.

--
531623382  by Waymo Research:

    Fix a bug of LET matching where prediction and ground truth is at different side of the sensor.

--
530728277  by Waymo Research:

    Update 2d_pvps_tutorial description.

--
530687829  by Waymo Research:

    Cleanup comments for keypoint metrics.

--
530686850  by Waymo Research:

    Internal

--
529494211  by Waymo Research:

    Fix v2 lidar utils.

--
529475406  by Waymo Research:

    Make sure all metrics for keypoints support padded predictions.

--
529475148  by Waymo Research:

    Fix a bug where processing submissions without predicted boxes failed.

--
529470846  by Waymo Research:

    Fix PEM matcher to properly assign overlapping labeled and unlabeled ground truth boxes to predictions.

--
526117149  by Waymo Research:

    Ignore predicted keypoints for objects without ground truth keypoints.

--
526116856  by Waymo Research:

    Fix box transform computation.
    This bug impacted MEAN_ERROR matcher resulted in mismatching a set of points with itself.

--
526106501  by Waymo Research:

    Update the motion tutorial with the correct number of map samples.

--
525224093  by Waymo Research:

    Improves error messages in Sim Agent evaluation code.

--
525194618  by Waymo Research:

    2D Semseg Tutorial bugfix.

--

PiperOrigin-RevId: 532682786
  • Loading branch information
Alexander Gorban committed May 17, 2023
1 parent 1eabdda commit 7275bc1
Show file tree
Hide file tree
Showing 19 changed files with 2,816 additions and 413 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,15 @@ We released v1.4.0 of the Perception dataset.
We released v1.3.2 of the Perception dataset to improve the quality and accuracy of the labels.
- Updated 3D semantic segmentation labels, for better temporal consistency and to fix mislabeled points.
- Updated 2D key point labels to fix image cropping issues.
- Added `num_top_lidar_points_in_box` in [dataset.proto](waymo_open_dataset/dataset.proto) for the 3D Camera-Only Detection Challenge.
- Added `num_top_lidar_points_in_box` in [dataset.proto](src/waymo_open_dataset/dataset.proto) for the 3D Camera-Only Detection Challenge.

## April 2022 Update
We released v1.3.1 of the Perception dataset to support the 2022 Challenges and have updated this repository accordingly.
- Added metrics (LET-3D-APL and LET-3D-AP) for the 3D Camera-Only Detection Challenge.
- Added 80 segments of 20-second camera imagery, as a test set for the 3D Camera-Only Detection Challenge.
- Added z-axis speed and acceleration in [lidar label metadata](waymo_open_dataset/label.proto#L53-L60).
- Fixed some inconsistencies in `projected_lidar_labels` in [dataset.proto](waymo_open_dataset/dataset.proto).
- Updated the default configuration for the Occupancy and Flow Challenge, switching from aggregate waypoints to [subsampled waypoints](waymo_open_dataset/protos/occupancy_flow_metrics.proto#L38-L55).
- Added z-axis speed and acceleration in [lidar label metadata](src/waymo_open_dataset/label.proto#L53-L60).
- Fixed some inconsistencies in `projected_lidar_labels` in [dataset.proto](src/waymo_open_dataset/dataset.proto).
- Updated the default configuration for the Occupancy and Flow Challenge, switching from aggregate waypoints to [subsampled waypoints](src/waymo_open_dataset/protos/occupancy_flow_metrics.proto#L38-L55).
- Updated the [tutorial](tutorial/tutorial_3d_semseg.ipynb) for 3D Semantic Segmentation Challenge with more detailed instructions.

## March 2022 Update
Expand All @@ -99,7 +99,7 @@ We released v1.1 of the Motion dataset to include lane connectivity information.

We expanded the Waymo Open Dataset to also include a Motion dataset comprising object trajectories and corresponding 3D maps for over 100,000 segments. We have updated this repository to add support for this new dataset.

Additionally, we added instructions and examples for the real-time detection challenges. Please follow these [Instructions](waymo_open_dataset/latency/README.md).
Additionally, we added instructions and examples for the real-time detection challenges. Please follow these [Instructions](src/waymo_open_dataset/latency/README.md).

## Website

Expand Down
Binary file added docs/images/pem_matching_fig.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
168 changes: 168 additions & 0 deletions docs/pose_estimation_metric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Metrics for the Pose Estimation Challenge

## Supported metrics

We provide a python library to compute and report on the result page a number of
different metrics. The challenge participants may want to inspect different
subsets of them by selecting a group of keypoints and thresholds.

- [Pose Estimation Metric (**PEM**)](#PEM), a new metric created specifically
for the Pose Estimation challenge, in meters (lower is better). It is used to
rank the leaderboard.
- <a name="mpjpe">Mean Per Joint Position Error (**MPJPE**)</a>, in meters
(lower is better). Useful to measure the quality of matched keypoints in easy to
interpret units.
- <a name="pck">Percentage Of Correct Keypoints (**PCK**)</a>. The ratio of the
number of keypoints close to ground truth to the total number of keypoints. We
use thresholds `(0.05, 0.1, 0.2, 0.3, 0.4, 0.5)` relative to bounding
box scales, for example for a human-like 1x1x2m box, the box's scale will be
$(1 \cdot 1 \cdot 2)^\frac{1}{3} = 1.26$, so the $0.20$ threshold of this
scale will be 25cm and keypoints with errors less than 25cm will be considered
correct. The metric takes values in the `[0, 1]` range (higher is better).
Useful to understand the distribution of errors.
- Precision of keypoints visibility (**Precision**). Values are
in the `[0, 1]` range (higher is better). Useful to gauge the precision of the
keypoint visibility classification and number of false positive detections.
- Recall of keypoints visibility (**Recall**). Values are in the `[0, 1]` range
(higher is better). Useful to gauge recall of the keypoint visibility
classification and number of false negative detections.
- <a name="oks">Precision at Object Keypoint Similarity (**OKS**)</a>
for different thresholds. Values are in the `[0, 1]` range (higher is better).
The OKS measures the distance between predicted and ground-truth keypoints
relative to a scale, specific for each keypoint type. For example,
the scale for hips is larger than the scale of wrists. Thus a 5mm error for
hips will result in larger Precision OKS values for hips compared with the
same 5mm error for wrists. The OKS metric can be used to evaluate the accuracy
of 2D keypoint detectors in a consistent and standardized way. By using OKS as
a point of comparison, participants can gain insights into the quality of
their 3D keypoint detectors relative to state-of-the-art 2D keypoint
detectors.
- Average Precision at OKS (**OKS_AP**), averaged over
`[.5, .55, .60, .65, .70, .75, .80, 0.95]` thresholds. Values are in the
`[0, 1]` range (higher is better).


NOTE: All auxiliary metrics for locations of keypoints (MPJPE, PCK, OKS)
take into account only matched keypoints and provided only for information
purposes.


## PEM

The set of well established metrics such as [MPJPE](#mpjpe), [PCK](#pck) or
[OKS](#oks) provide valuable insights on quality of a keypoint localization
method, but they do not take into account specifics of the partially labeled
data and ignore quality of the object detection. In order to rank submissions
for the challenge we introducing a new single metric called Pose Estimation
Metric (**PEM**), which is

- easily interpretable
- sensitive to
- keypoint localization error and visibility classification accuracy
- number of false positive and false negative object detections
- and not sensitive to
- Intersection over Union (**IoU**) of object detection to avoid a strong
dependency on 3D box accuracy.

The PEM is a weighted sum of the [MPJPE](#mpjpe) over visible
[matched](#object-matching-algorithm) keypoints and a penalty for
unmatched keypoints (aka `mismatch_penalty`), expressed in meters.

We compute the PEM on a set of candidate pairs of predicted and ground truth
objects, for which at least one predicted keypoint is within a distance
threshold constant $C$ from the ground truth box. The final object assignment
is selected using the Hungarian method to minimize:

$$\textbf{PEM}(Y,\hat{Y}) = \frac{\sum_{i\in M}\left\|y_{i} -
\hat{y}_{i}\right\|_2 + C|U|}{|M| + |U|}$$

where $M$ - a set of indices of matched keypoints, $U$ - a set of indices of
unmatched keypoints (ground truth keypoints without matching predicted keypoints
or predicted keypoints for unmatched objects); Sets $$Y= \left\{y_i\right\}_{i
\in M}$ and $\hat{Y} = \left\{\hat{y}_i\right\}_{i \in M}$$ are ground truth
and predicted 3D coordinates of keypoints; $C=0.25$ - a constant penalty for
an unmatched keypoint.


## Object Matching Algorithm


The [Pose Estimation challenge](https://waymo.com/open/challenges/2023/pose-estimation/)
requires participants to provide keypoints for all human objects in a scene. To
evaluate the performance of the predictions, the evaluation service uses one of
the provided matching algorithms to automatically find correspondence between
predicted (**PR**) and ground truth (**GT**) objects. The matching algorithm
outputs three sets of objects:


- true positives (**TP**), which are pairs of a GT object and its corresponding PR object
- false positives (**FP**), which are PR objects without a corresponding GT object
- false negatives (**FN**), which are GT objects without a corresponding PR object

However, matching is complicated by the fact that not all GT objects in WOD have
visible keypoints. To address this, two kinds of GT objects are distinguished:

- $GT_i$ - GT objects without any visible keypoints, which includes unlabeled
or heavily occluded human objects.
- $GT_v$ - GT boxes with at least one
visible keypoint.

| ![a toy example to illustrate $GT_v$ and $GT_i$](images/pem_matching_fig.png) |
| :-: |
| Fig 1. A toy scene |

On the Fig. 1 you can see:

- Ground truth objects:
- $GT_i$: $G_0$, $G_1$, $G_3$, $G_5$, $G_7$
- $GT_v$: $G_2$, $G_4$, $G_6$, $G_8$, $G_9$
- Predicted objects:
$P_0$, $P_1$, $P_2$, $P_3$, $P_4$, $P_5$, $P_6$, $P_7$

If a PR object corresponds to a $GT_i$ object, no penalty is assigned since the
MPJPE cannot be computed for such matches. Only matches between $GT_v$ objects and
PR objects are considered for the computation of the PEM metric.

Since computing the PEM metric for all possible matches between GT and PR is not
feasible for scenes with many objects, several heuristics are used to narrow
down the set of candidate matches. The official matching algorithm for the
challenge is the
[`MeanErrorMatcher`](src/waymo_open_dataset/metrics/python/keypoint_metrics.py),
which computes keypoint errors for each pair of candidate matches. It has two stages:

1. When keypoints clearly fall in $GT_i$ objects (see criterion in
[keypoint_metrics.py](src/waymo_open_dataset/metrics/python/keypoint_metrics.py)),
remove them from considerations, without any penalties.
2. For all remaining candidate GTv ground truth boxes and detections pairs,
perform Hungarian matching that minimizes the PEM metric.
For the example on the Fig 1. stages of the matching algorithm should work like
this:

- stage #1:
- Select pairs of GT and PR objects for which at least one PR keypoint is
inside GT box enlarged by 25cm.
- assume $PEM(G_4, P_5) > C$ and $PEM(G_6, P_6) < C$
- should exclude: $(G_0, P_0)$, $(G_1, P_1)$, $(G_3, P_3)$,
$(G_5, P_5)$ pairs.
- stage #2:
- consider only GTv objects
- compute errors for candidate pairs and populate the assignment error $A$
(aka cost matrix): $A_{k,j}=PEM(G_k, P_j)$ for
$(G_2, P_2)$, $(G_4, P_5)$, $(G_6, P_6)$, $(G_8, P_7)$,
$(G_9, P_7)$ and set the rest of the 8x7 matrix $A=\infty$.
- assuming $PEM(G_9, P_7) < PEM(G_8, P_7)$, the matching assignment should
output the following pairs:
$(G_1, P_1)$, $(G_2, P_2)$, $(G_6, P_6)$, $(G_9, P_7)$
- the final output of the matcher should be:
$(G_2, P_2)$, $(G_6, P_6)$, $(G_9, P_7)$,
$(G_4, \emptyset)$, $(G_8, \emptyset)$,
$(\emptyset, P_4)$

For the PEM metric, each ground-truth box – GTV and GTi – can only be
associated with a maximum of 1 detection. To maximize your PEM scores, you are
responsible for removing duplicate detections.

NOTE: The WOD library also implements the [`CppMatcher`](src/waymo_open_dataset/metrics/python/keypoint_metrics.py)
which maximizes total Intersection over Union (IoU) between predicted and ground
truth boxes. However, this matcher requires all predictions to have bounding
boxes and is provided only as a reference.
12 changes: 12 additions & 0 deletions src/waymo_open_dataset/metrics/detection_metrics.cc
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,18 @@ std::vector<DetectionMetrics> ComputeDetectionMetrics(
return metrics;
}

std::vector<DetectionMeasurements> MergeDetectionMeasurements(
const Config& config,
const std::vector<std::vector<DetectionMeasurements>>& measurements) {
const int num_frames = measurements.size();
if (measurements.empty()) return {};
std::vector<DetectionMeasurements> measurements_merged = measurements[0];
for (int i = 1; i < num_frames; ++i) {
MergeDetectionMeasurementsVector(measurements[i], &measurements_merged);
}
return measurements_merged;
}

Config EstimateScoreCutoffs(const Config& config,
const std::vector<std::vector<Object>>& pds,
const std::vector<std::vector<Object>>& gts) {
Expand Down
15 changes: 15 additions & 0 deletions src/waymo_open_dataset/metrics/detection_metrics.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,21 @@ std::vector<DetectionMetrics> ComputeDetectionMetrics(
const std::vector<std::vector<Object>>& gts,
ComputeIoUFunc custom_iou_func = nullptr);

// Merges detection measurements for multiple frames.
// Each element of `measurements` is an output of ComputeDetectionMeasurements.
// The output vector is ordered as:
// [{generator_i_shard_j_difficulty_level_k}].
// i \in [0, num_breakdown_generators).
// j \in [0, num_shards for the i-th breakdown generator).
// k \in [0, num_difficulty_levels for each shard in the i-th breakdown
// generator).
//
// Requires: Every element of `measurements` is computed with the same
// configuration.
std::vector<DetectionMeasurements> MergeDetectionMeasurements(
const Config& config,
const std::vector<std::vector<DetectionMeasurements>>& measurements);

// Estimates the score cutoffs that evenly sample the P/R curve.
// pds: the predicted objects.
// gts: the ground truths.
Expand Down
9 changes: 6 additions & 3 deletions src/waymo_open_dataset/metrics/iou.cc
Original file line number Diff line number Diff line change
Expand Up @@ -256,9 +256,9 @@ double ComputeLongitudinalAffinity(
std::max(CenterVectorLength(calibrated_prediction_box), kEpsilon);

// Compute the cos(theta), where theta is the angle between the center vectors
// of prediction and ground truth.
const double cos_of_gt_pd_angle =
Clamp(gt_dot_pd / gt_range / pd_range, 0.0, 1.0);
// of prediction and ground truth. Note, this value can be negative, meaning
// the angle between the prediction and ground truth is larger than 90 degree.
const double cos_of_gt_pd_angle = gt_dot_pd / gt_range / pd_range;

// Compute the error terms as a percentage of the max tolerance.
const float max_range_tolerance_meter =
Expand Down Expand Up @@ -302,6 +302,9 @@ Label::Box AlignedPredictionBox(
// P' = |G|* cos(theta) * P/|P| = dot(G, P)/|P|^2 * P,
// where G = [gt_x, gt_y, gt_z] and P = [pd_x, pd_y, pd_z] are the vectors
// that describe the centers of a ground truth box and a prediction box.
// Note this still applies in the case where dot(G, P) < 0, when the
// multiplier is negative, i.e. P and G are at different side of the
// sensor.
const double gt_dot_pd =
CenterDotProduct(prediction_box, ground_truth_box);
const double pd_range_sq =
Expand Down
Loading

0 comments on commit 7275bc1

Please sign in to comment.