Skip to content

Commit

Permalink
[Doc] Add SUN RGB-D doc (#770)
Browse files Browse the repository at this point in the history
* Add doc

* Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc

* Revert mistakenly modified file

* Fix typos

* Add multi-modality related info

* Add doc

* Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc

* Revert mistakenly modified file

* Fix typos

* Add multi-modality related info

* Add multi-modality related info

* Update according to comments

* Add chinese doc and frevised the docs

* Add some script

* Fix typos and formats

* Fix typos

* Fix typos
  • Loading branch information
yezhen17 authored Jul 30, 2021
1 parent 111f33b commit 43b4632
Show file tree
Hide file tree
Showing 8 changed files with 716 additions and 23 deletions.
8 changes: 4 additions & 4 deletions data/sunrgbd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).

1. Download SUNRGBD v2 data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
1. Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.

2. Enter the `matlab` folder, Extract point clouds and annotations by running `extract_split.m`, `extract_rgbd_data_v2.m` and `extract_rgbd_data_v1.m`.

Expand Down Expand Up @@ -47,12 +47,12 @@ sunrgbd
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── image
│ ├── label_v1
│ ├── train_data_idx.txt
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
├── points
├── sunrgbd_infos_train.pkl
Expand Down
1 change: 1 addition & 0 deletions docs/datasets/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
:maxdepth: 2

waymo_det.md
sunrgbd_det.md
scannet_det.md
scannet_sem_seg.md
s3dis_sem_seg.md
32 changes: 16 additions & 16 deletions docs/datasets/scannet_det.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def export(mesh_file,
# bbox format is [x, y, z, dx, dy, dz, label_id]
# [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
# [label_id] is semantic label id in 'nyu40id' standard
# Note: since 3d bbox is axis-aligned, the yaw is 0.
# Note: since 3D bbox is axis-aligned, the yaw is 0.
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
Expand Down Expand Up @@ -221,7 +221,7 @@ scannet
├── scannet_infos_test.pkl
```

- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task.
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
Expand All @@ -231,21 +231,21 @@ scannet
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
- info['annos']: The annotations of each scan.
- annotations['gt_num']: The number of ground truth.
- annotations['gt_num']: The number of ground truths.
- annotations['name']: The semantic name of all ground truths, e.g. `chair`.
- annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
- annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
- annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
- annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
- annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
- annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths.
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3].
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes.
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes.
- annotations['index']: The index of all ground truths, i.e. [0, K).
- annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
- annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ].


## Training pipeline

A typical training pipeline of ScanNet for 3d detection is as below.
A typical training pipeline of ScanNet for 3D detection is as follows.

```python
train_pipeline = [
Expand Down Expand Up @@ -291,12 +291,12 @@ train_pipeline = [
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
- Data augmentation:
- `IndoorPointSample`: downsample input point cloud.
- `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
- `IndoorPointSample`: downsample the input point cloud.
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet.

## Metrics

Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py).

As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
Loading

0 comments on commit 43b4632

Please sign in to comment.