Skip to content

Commit

Permalink
[Docs] add details (#2558)
Browse files Browse the repository at this point in the history
  • Loading branch information
Tau-J authored Jul 20, 2023
1 parent b225a77 commit 947f013
Show file tree
Hide file tree
Showing 7 changed files with 149 additions and 39 deletions.
4 changes: 2 additions & 2 deletions docs/en/advanced_guides/customize_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ An example of the dataset config is as follows.
1. `name`: the keypoint name. The keypoint name must be unique.
2. `id`: the keypoint id.
3. `color`: (\[B, G, R\]) is used for keypoint visualization.
4. `type`: 'upper' or 'lower', will be used in data augmentation.
5. `swap`: indicates the 'swap pair' (also known as 'flip pair'). When applying image horizontal flip, the left part will become the right part. We need to flip the keypoints accordingly.
4. `type`: 'upper' or 'lower', will be used in data augmentation [RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/b225a773d168fc2afd48cde5f76c0202d1ba2f52/mmpose/datasets/transforms/common_transforms.py#L263).
5. `swap`: indicates the 'swap pair' (also known as 'flip pair'). When applying image horizontal flip, the left part will become the right part, used in data augmentation [RandomFlip](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L94). We need to flip the keypoints accordingly.

`skeleton_info` contains information about the keypoint connectivity, which is used for visualization.

Expand Down
62 changes: 53 additions & 9 deletions docs/en/guide_to_framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This tutorial covers what developers will concern when using MMPose 1.0:
The content of this tutorial is organized as follows:

- [A 20 Minute Guide to MMPose Framework](#a-20-minute-guide-to-mmpose-framework)
- [Structure](#structure)
- [Overview](#overview)
- [Step1: Configs](#step1-configs)
- [Step2: Data](#step2-data)
Expand All @@ -33,6 +34,47 @@ The content of this tutorial is organized as follows:
- [Neck](#neck)
- [Head](#head)

## Structure

The file structure of MMPose 1.0 is as follows:

```shell
mmpose
|----apis
|----structures
|----datasets
|----transforms
|----codecs
|----models
|----pose_estimators
|----data_preprocessors
|----backbones
|----necks
|----heads
|----losses
|----engine
|----hooks
|----evaluation
|----visualization
```

- **apis** provides high-level APIs for model inference
- **structures** provides data structures like bbox, keypoint and PoseDataSample
- **datasets** supports various datasets for pose estimation
- **transforms** contains a lot of useful data augmentation transforms
- **codecs** provides pose encoders and decoders: an encoder encodes poses (mostly keypoints) into learning targets (e.g. heatmaps), and a decoder decodes model outputs into pose predictions
- **models** provides all components of pose estimation models in a modular structure
- **pose_estimators** defines all pose estimation model classes
- **data_preprocessors** is for preprocessing the input data of the model
- **backbones** provides a collection of backbone networks
- **necks** contains various neck modules
- **heads** contains various prediction heads that perform pose estimation
- **losses** contains various loss functions
- **engine** provides runtime components related to pose estimation
- **hooks** provides various hooks of the runner
- **evaluation** provides metrics for evaluating model performance
- **visualization** is for visualizing skeletons, heatmaps and other information

## Overview

![overall-en](https://user-images.githubusercontent.com/13503330/187372008-2a94bad5-5252-4155-9ae3-3da1c426f569.png)
Expand Down Expand Up @@ -62,9 +104,7 @@ Note that all new modules need to be registered using `Registry` and imported in
The organization of data in MMPose contains:

- Dataset Meta Information

- Dataset

- Pipeline

### Dataset Meta Information
Expand Down Expand Up @@ -264,6 +304,10 @@ When supporting MPII dataset, since we need to use `head_size` to calculate `PCK

To support a dataset that is beyond the scope of [BaseCocoStyleDataset](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/datasets/base/base_coco_style_dataset.py), you may need to subclass from the `BaseDataset` provided by [MMEngine](https://github.com/open-mmlab/mmengine). Please refer to the [documents](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/basedataset.html) for details.

```{note}
If you wish to customize a new dataset, you can refer to [Customize Datasets](./advanced_guides/customize_datasets.md) for more details.
```

### Pipeline

Data augmentations and transformations during pre-processing are organized as a pipeline. Here is an example of typical pipelines:
Expand Down Expand Up @@ -306,21 +350,21 @@ In MMPose, the modules used for data transformation are under `[$MMPOSE/mmpose/d

#### i. Augmentation

Commonly used transforms are defined in [$MMPOSE/mmpose/datasets/transforms/common_transforms.py](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/common_transforms.py), such as `RandomFlip`, `RandomHalfBody`, etc.
Commonly used transforms are defined in [$MMPOSE/mmpose/datasets/transforms/common_transforms.py](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/common_transforms.py), such as [RandomFlip](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L94), [RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L263), etc.

For top-down methods, `Shift`, `Rotate`and `Resize` are implemented by `RandomBBoxTransform`**.** For bottom-up methods, `BottomupRandomAffine` is used.
For top-down methods, `Shift`, `Rotate`and `Resize` are implemented by [RandomBBoxTransform](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L433). For bottom-up methods, [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134) is used.

```{note}
Most data transforms depend on `bbox_center` and `bbox_scale`, which can be obtained by `GetBBoxCenterScale`.
Most data transforms depend on `bbox_center` and `bbox_scale`, which can be obtained by [GetBBoxCenterScale](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L31).
```

#### ii. Transformation

Affine transformation is used to convert images and annotations from the original image space to the input space. This is done by `TopdownAffine` for top-down methods and `BottomupRandomAffine` for bottom-up methods.
Affine transformation is used to convert images and annotations from the original image space to the input space. This is done by [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14) for top-down methods and [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134) for bottom-up methods.

#### iii. Encoding

In training phase, after the data is transformed from the original image space into the input space, it is necessary to use `GenerateTarget` to obtain the training target(e.g. Gaussian Heatmaps). We name this process **Encoding**. Conversely, the process of getting the corresponding coordinates from Gaussian Heatmaps is called **Decoding**.
In training phase, after the data is transformed from the original image space into the input space, it is necessary to use [GenerateTarget](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L873) to obtain the training target(e.g. Gaussian Heatmaps). We name this process **Encoding**. Conversely, the process of getting the corresponding coordinates from Gaussian Heatmaps is called **Decoding**.

In MMPose, we collect Encoding and Decoding processes into a **Codec**, in which `encode()` and `decode()` are implemented.

Expand Down Expand Up @@ -360,15 +404,15 @@ If you wish to customize a new codec, you can refer to [Codec](./user_guides/cod

After the data is transformed, you need to pack it using [PackPoseInputs](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/formatting.py).

This method converts the data stored in the dictionary `results` into standard data structures in MMPose, such as `InstanceData`, `PixelData`, `PoseDataSample`, etc.
This method converts the data stored in the dictionary `results` into standard data structures in MMPose, such as `InstanceData`, `PixelData`, [PoseDataSample](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/structures/pose_data_sample.py), etc.

Specifically, we divide the data into `gt` (ground-truth) and `pred` (prediction), each of which has the following types:

- **instances**(numpy.array): instance-level raw annotations or predictions in the original scale space
- **instance_labels**(torch.tensor): instance-level training labels (e.g. normalized coordinates, keypoint visibility) in the output scale space
- **fields**(torch.tensor): pixel-level training labels or predictions (e.g. Gaussian Heatmaps) in the output scale space

The following is an example of the implementation of `PoseDataSample` under the hood:
The following is an example of the implementation of [PoseDataSample](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/structures/pose_data_sample.py) under the hood:

```Python
def get_pose_data_sample(self):
Expand Down
19 changes: 19 additions & 0 deletions docs/en/user_guides/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,25 @@

We use python files as configs and incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.

## Structure

The file structure of configs is as follows:

```shell
configs
|----_base_
|----datasets
|----default_runtime.py
|----animal_2d_keypoint
|----body_2d_keypoint
|----body_3d_keypoint
|----face_2d_keypoint
|----fashion_2d_keypoint
|----hand_2d_keypoint
|----hand_3d_keypoint
|----wholebody_2d_keypoint
```

## Introduction

MMPose is equipped with a powerful config system. Cooperating with Registry, a config file can organize all the configurations in the form of python dictionaries and create instances of the corresponding modules.
Expand Down
4 changes: 2 additions & 2 deletions docs/zh_cn/advanced_guides/customize_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ config/_base_/datasets/custom.py
1. `name`: 关键点名称,必须是唯一的,例如 `nose``left_eye` 等。
2. `id`: 关键点 ID,必须是唯一的,从 0 开始。
3. `color`: 关键点可视化时的颜色,以 (\[B, G, R\]) 格式组织起来,用于可视化。
4. `type`: 关键点类型,可以是 `upper``lower`\`\`,用于数据增强。
5. `swap`: 关键点交换关系,用于水平翻转数据增强。
4. `type`: 关键点类型,可以是 `upper``lower``''`,用于数据增强 [RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/b225a773d168fc2afd48cde5f76c0202d1ba2f52/mmpose/datasets/transforms/common_transforms.py#L263)
5. `swap`: 关键点交换关系,用于水平翻转数据增强 [RandomFlip](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L94)
- `skeleton_info`:骨架连接关系,用于可视化。
- `joint_weights`:每个关键点的权重,用于损失函数计算。
- `sigma`:标准差,用于计算 OKS 分数,详细信息请参考 [keypoints-eval](https://cocodataset.org/#keypoints-eval)
Expand Down
70 changes: 54 additions & 16 deletions docs/zh_cn/guide_to_framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ MMPose 1.0 采用了全新的模块结构设计以精简代码,提升运行效
以下是这篇教程的目录:

- [20 分钟了解 MMPose 架构设计](#20-分钟了解-mmpose-架构设计)
- [文件结构](#文件结构)
- [总览](#总览)
- [Step1:配置文件](#step1配置文件)
- [Step2:数据](#step2数据)
Expand All @@ -35,6 +36,47 @@ MMPose 1.0 采用了全新的模块结构设计以精简代码,提升运行效
- [颈部模块(Neck)](#颈部模块neck)
- [预测头(Head)](#预测头head)

## 文件结构

MMPose 1.0 的文件结构如下所示:

```shell
mmpose
|----apis
|----structures
|----datasets
|----transforms
|----codecs
|----models
|----pose_estimators
|----data_preprocessors
|----backbones
|----necks
|----heads
|----losses
|----engine
|----hooks
|----evaluation
|----visualization
```

- **apis** 提供用于模型推理的高级 API
- **structures** 提供 bbox、keypoint 和 PoseDataSample 等数据结构
- **datasets** 支持用于姿态估计的各种数据集
- **transforms** 包含各种数据增强变换
- **codecs** 提供姿态编解码器:编码器用于将姿态信息(通常为关键点坐标)编码为模型学习目标(如热力图),解码器则用于将模型输出解码为姿态估计结果
- **models** 以模块化结构提供了姿态估计模型的各类组件
- **pose_estimators** 定义了所有姿态估计模型类
- **data_preprocessors** 用于预处理模型的输入数据
- **backbones** 包含各种骨干网络
- **necks** 包含各种模型颈部组件
- **heads** 包含各种模型头部
- **losses** 包含各种损失函数
- **engine** 包含与姿态估计任务相关的运行时组件
- **hooks** 提供运行时的各种钩子
- **evaluation** 提供各种评估模型性能的指标
- **visualization** 用于可视化关键点骨架和热力图等信息

## 总览

![overall-cn](https://user-images.githubusercontent.com/13503330/187830967-f2d7bf40-6261-42f3-91a5-ae045fa0dc0c.png)
Expand Down Expand Up @@ -262,6 +304,10 @@ class MpiiDataset(BaseCocoStyleDataset):

如果自定义数据集无法被 [BaseCocoStyleDataset](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/datasets/base/base_coco_style_dataset.py) 支持,你需要直接继承 [MMEngine](https://github.com/open-mmlab/mmengine) 中提供的 `BaseDataset` 基类。具体方法请参考相关[文档](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/basedataset.html)。

```{note}
如果你想自定义数据集,请参考 [自定义数据集](./advanced_guides/customize_datasets.md)。
```

### 数据流水线

一个典型的数据流水线配置如下:
Expand Down Expand Up @@ -304,46 +350,38 @@ test_pipeline = [

#### i. 数据增强

数据增强中常用的变换存放在 [$MMPOSE/mmpose/datasets/transforms/common_transforms.py](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/common_transforms.py) 中,如 `RandomFlip``RandomHalfBody` 等。
数据增强中常用的变换存放在 [$MMPOSE/mmpose/datasets/transforms/common_transforms.py](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/common_transforms.py) 中,如 [RandomFlip](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L94)、[RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L263) 等。

对于 top-down 方法,`Shift``Rotate``Resize` 操作由 `RandomBBoxTransform`来实现;对于 bottom-up 方法,这些则是由 `BottomupRandomAffine` 实现。
对于 top-down 方法,`Shift``Rotate``Resize` 操作由 [RandomBBoxTransform](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L433) 来实现;对于 bottom-up 方法,这些则是由 [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134) 实现。

```{note}
值得注意的是,大部分数据变换都依赖于 `bbox_center``bbox_scale`,它们可以通过 `GetBBoxCenterScale` 来得到。
值得注意的是,大部分数据变换都依赖于 `bbox_center``bbox_scale`,它们可以通过 [GetBBoxCenterScale](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L31) 来得到。
```

#### ii. 数据变换

我们使用仿射变换,将图像和坐标标注从原始图片空间变换到输入图片空间。这一操作在 top-down 方法中由 `TopdownAffine` 完成,在 bottom-up 方法中则由 `BottomupRandomAffine` 完成。
我们使用仿射变换,将图像和坐标标注从原始图片空间变换到输入图片空间。这一操作在 top-down 方法中由 [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14) 完成,在 bottom-up 方法中则由 [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134) 完成。

#### iii. 数据编码

在模型训练时,数据从原始空间变换到输入图片空间后,需要使用 `GenerateTarget` 来生成训练所需的监督目标(比如用坐标值生成高斯热图),我们将这一过程称为编码(Encode),反之,通过高斯热图得到对应坐标值的过程称为解码(Decode)。
在模型训练时,数据从原始空间变换到输入图片空间后,需要使用 [GenerateTarget](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L873) 来生成训练所需的监督目标(比如用坐标值生成高斯热图),我们将这一过程称为编码(Encode),反之,通过高斯热图得到对应坐标值的过程称为解码(Decode)。

在 MMPose 中,我们将编码和解码过程集合成一个编解码器(Codec),在其中实现 `encode()``decode()`

目前 MMPose 支持生成以下类型的监督目标:

- `heatmap`: 高斯热图

- `keypoint_label`: 关键点标签(如归一化的坐标值)

- `keypoint_xy_label`: 单个坐标轴关键点标签

- `heatmap+keypoint_label`: 同时生成高斯热图和关键点标签

- `multiscale_heatmap`: 多尺度高斯热图

生成的监督目标会按以下关键字进行封装:

- `heatmaps`:高斯热图

- `keypoint_labels`:关键点标签(如归一化的坐标值)

- `keypoint_x_labels`:x 轴关键点标签

- `keypoint_y_labels`:y 轴关键点标签

- `keypoint_weights`:关键点权重

```Python
Expand Down Expand Up @@ -374,9 +412,9 @@ class GenerateTarget(BaseTransform):

#### iv. 数据打包

数据经过前处理变换后,最终需要通过 `PackPoseInputs` 打包成数据样本。该操作定义在 [$MMPOSE/mmpose/datasets/transforms/formatting.py](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/formatting.py)
数据经过前处理变换后,最终需要通过 [PackPoseInputs](https://github.com/open-mmlab/mmpose/blob/main/mmpose/datasets/transforms/formatting.py) 打包成数据样本

打包过程会将数据流水线中用字典 `results` 存储的数据转换成用 MMPose 所需的标准数据结构, 如 `InstanceData``PixelData``PoseDataSample` 等。
打包过程会将数据流水线中用字典 `results` 存储的数据转换成用 MMPose 所需的标准数据结构, 如 `InstanceData``PixelData`[PoseDataSample](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/structures/pose_data_sample.py) 等。

具体而言,我们将数据样本内容分为 `gt`(标注真值) 和 `pred`(模型预测)两部分,它们都包含以下数据项:

Expand All @@ -386,7 +424,7 @@ class GenerateTarget(BaseTransform):

- **fields**(torch.tensor):像素级别的训练标签(如高斯热图)或预测结果,属于输出尺度空间

下面是 `PoseDataSample` 底层实现的例子:
下面是 [PoseDataSample](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/structures/pose_data_sample.py) 底层实现的例子:

```Python
def get_pose_data_sample(self):
Expand Down
Loading

0 comments on commit 947f013

Please sign in to comment.