-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add doc * Creation of SUN RGB-D dataset doc and some mods on ScanNet dataset doc * Revert mistakenly modified file * Fix typos * Add multi-modality related info * Add multi-modality related info * Update according to comments * Add chinese doc and frevised the docs * Add some script * Fix typos and formats * Fix typos * Fix typos
- Loading branch information
Showing
8 changed files
with
716 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
:maxdepth: 2 | ||
|
||
waymo_det.md | ||
sunrgbd_det.md | ||
scannet_det.md | ||
scannet_sem_seg.md | ||
s3dis_sem_seg.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -113,7 +113,7 @@ def export(mesh_file, | |
# bbox format is [x, y, z, dx, dy, dz, label_id] | ||
# [x, y, z] is gravity center of bbox, [dx, dy, dz] is axis-aligned | ||
# [label_id] is semantic label id in 'nyu40id' standard | ||
# Note: since 3d bbox is axis-aligned, the yaw is 0. | ||
# Note: since 3D bbox is axis-aligned, the yaw is 0. | ||
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs, | ||
object_id_to_label_id, instance_ids) | ||
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs, | ||
|
@@ -221,7 +221,7 @@ scannet | |
├── scannet_infos_test.pkl | ||
``` | ||
|
||
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3d detection task takes axis-aligned point clouds as input, while ScanNet 3d semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3d detection task. | ||
- `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline `GlobalAlignment` of 3D detection task. | ||
- `instance_mask/xxxxx.bin`: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated. | ||
- `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: [1, 40], i.e. `nyu40id` standard. Note: the `nyu40id` id will be mapped to train id in train pipeline `PointSegClassMapping`. | ||
- `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix. | ||
|
@@ -231,21 +231,21 @@ scannet | |
- info['pts_instance_mask_path']: The path of `instance_mask/xxxxx.bin`. | ||
- info['pts_semantic_mask_path']: The path of `semantic_mask/xxxxx.bin`. | ||
- info['annos']: The annotations of each scan. | ||
- annotations['gt_num']: The number of ground truth. | ||
- annotations['gt_num']: The number of ground truths. | ||
- annotations['name']: The semantic name of all ground truths, e.g. `chair`. | ||
- annotations['location']: The gravity center of axis-aligned 3d bounding box. Shape: [K, 3], K is the number of ground truth. | ||
- annotations['dimensions']: The dimensions of axis-aligned 3d bounding box, i.e. x_size, y_size, z_size, shape: [K, 3]. | ||
- annotations['gt_boxes_upright_depth']: Axis-aligned 3d bounding box, each bounding box is x, y, z, x_size, y_size, z_size, shape: [K, 6]. | ||
- annotations['unaligned_location']: The gravity center of axis-unaligned 3d bounding box. | ||
- annotations['unaligned_dimensions']: The dimensions of axis-unaligned 3d bounding box. | ||
- annotations['unaligned_gt_boxes_upright_depth']: Axis-unaligned 3d bounding box. | ||
- annotations['location']: The gravity center of the axis-aligned 3D bounding boxes. Shape: [K, 3], K is the number of ground truths. | ||
- annotations['dimensions']: The dimensions of the axis-aligned 3D bounding boxes, i.e. (x_size, y_size, z_size), shape: [K, 3]. | ||
- annotations['gt_boxes_upright_depth']: The axis-aligned 3D bounding boxes, each bounding box is (x, y, z, x_size, y_size, z_size), shape: [K, 6]. | ||
- annotations['unaligned_location']: The gravity center of the axis-unaligned 3D bounding boxes. | ||
- annotations['unaligned_dimensions']: The dimensions of the axis-unaligned 3D bounding boxes. | ||
- annotations['unaligned_gt_boxes_upright_depth']: The axis-unaligned 3D bounding boxes. | ||
- annotations['index']: The index of all ground truths, i.e. [0, K). | ||
- annotations['class']: The train class id of each bounding box, value range: [0, 18), shape: [K, ]. | ||
- annotations['class']: The train class id of the bounding boxes, value range: [0, 18), shape: [K, ]. | ||
|
||
|
||
## Training pipeline | ||
|
||
A typical training pipeline of ScanNet for 3d detection is as below. | ||
A typical training pipeline of ScanNet for 3D detection is as follows. | ||
|
||
```python | ||
train_pipeline = [ | ||
|
@@ -291,12 +291,12 @@ train_pipeline = [ | |
- `GlobalAlignment`: The previous point cloud would be axis-aligned using the axis-aligned matrix. | ||
- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like [0, 18) during training. | ||
- Data augmentation: | ||
- `IndoorPointSample`: downsample input point cloud. | ||
- `RandomFlip3D`: randomly flip input point cloud horizontally or vertically. | ||
- `GlobalRotScaleTrans`: rotate input point cloud, usually [-5, 5] degree. | ||
- `IndoorPointSample`: downsample the input point cloud. | ||
- `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically. | ||
- `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet; finally translate the input point cloud, usually by 0 for ScanNet. | ||
|
||
## Metrics | ||
|
||
Typically mean average precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic functions to compute precision and recall for 3d object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3d/core/evaluation/indoor_eval.py). | ||
Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called, please refer to [indoor_eval](https://github.com/open-mmlab/mmdetection3d/blob/master/mmdet3D/core/evaluation/indoor_eval.py). | ||
|
||
As introduced in section `Export ScanNet data`, all ground truth 3d bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3d bounding box is also zero and axis-aligned 3d non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation. | ||
As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D non-maximum suppression (NMS) is adopted during post-processing without reagrd to rotation. |
Oops, something went wrong.