[Doc] Add SUN RGB-D doc (#770)

* Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add multi-modality related info * Update according to comments * Add chinese doc and frevised the docs * Add some script * Fix typos and formats * Fix typos * Fix typos
open-mmlab · Jul 30, 2021 · 43b4632 · 43b4632
1 parent 111f33b
commit 43b4632
Show file tree

Hide file tree

Showing 8 changed files with 716 additions and 23 deletions.
diff --git a/data/sunrgbd/README.md b/data/sunrgbd/README.md
@@ -2,7 +2,7 @@
 
 We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).
 
-1. Download SUNRGBD v2 data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
+1. Download SUNRGBD data [HERE](http://rgbd.cs.princeton.edu/data/). Then, move SUNRGBD.zip, SUNRGBDMeta2DBB_v2.mat, SUNRGBDMeta3DBB_v2.mat and SUNRGBDtoolbox.zip to the OFFICIAL_SUNRGBD folder, unzip the zip files.
 
 2. Enter the `matlab` folder, Extract point clouds and annotations by running `extract_split.m`, `extract_rgbd_data_v2.m` and `extract_rgbd_data_v1.m`.
 
@@ -47,12 +47,12 @@ sunrgbd
 │   ├── SUNRGBDtoolbox
 ├── sunrgbd_trainval
 │   ├── calib
-│   ├── image
-│   ├── label_v1
-│   ├── train_data_idx.txt
 │   ├── depth
+│   ├── image
 │   ├── label
+│   ├── label_v1
 │   ├── seg_label
+│   ├── train_data_idx.txt
 │   ├── val_data_idx.txt
 ├── points
 ├── sunrgbd_infos_train.pkl

diff --git a/docs/datasets/index.rst b/docs/datasets/index.rst
@@ -2,6 +2,7 @@
    :maxdepth: 2
 
    waymo_det.md
+   sunrgbd_det.md
    scannet_det.md
    scannet_sem_seg.md
    s3dis_sem_seg.md
diff --git a/docs/datasets/scannet_det.md b/docs/datasets/scannet_det.md
@@ -113,7 +113,7 @@ def export(mesh_file,
         # bbox format is [x, y, z, dx, dy, dz, label_id]
         # [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned
         # [label_id] is semantic label id in 'nyu40id' standard
-        # Note: since 3d bbox is axis-aligned, the yaw is 0.
+        # Note: since 3D bbox is axis-aligned, the yaw is 0.
         unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
                                         object_id_to_label_id, instance_ids)
         aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
@@ -221,7 +221,7 @@ scannet
 ├── scannet_infos_test.pkl
 ```
 
-- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task.
+- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task.
 - `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.
 - `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`.
 - `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix.
@@ -231,21 +231,21 @@ scannet
     - info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`.
     - info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`.
     - info['annos']: The annotations of each scan.
-        - annotations['gt_num']: The number of ground truth.
+        - annotations['gt_num']: The number of ground truths.
         - annotations['name']： The semantic name of all ground truths, e.g. `chair`.
-        - annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth.
-        - annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3].
-        - annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6].
-        - annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box.
-        - annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box.
-        - annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box.
+        - annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths.
+        - annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3].
+        - annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6].
+        - annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes.
+        - annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes.
+        - annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes.
         - annotations['index']: The index of all ground truths, i.e. [0, K).
-        - annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ].
+        - annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ].
 
 
 ## Training pipeline
 
-A typical training pipeline of ScanNet for 3d detection is as below.
+A typical training pipeline of ScanNet for 3D detection is as follows.
 
 ```python
 train_pipeline = [
@@ -291,12 +291,12 @@ train_pipeline = [
 - `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix.
 - `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training.
 - Data augmentation:
-    - `IndoorPointSample`: downsample input point cloud.
-    - `RandomFlip3D`: randomly flip input point cloud horizontally or vertically.
-    - `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree.
+    - `IndoorPointSample`: downsample the input point cloud.
+    - `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically.
+    - `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet.
 
 ## Metrics
 
-Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py).
+Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py).
 
-As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.
+As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation.