From 0b86c4d8378c0fb5b0a31c0b70a71e9836d27f00 Mon Sep 17 00:00:00 2001 From: Range King Date: Wed, 9 Nov 2022 21:59:36 +0800 Subject: [PATCH 01/14] Update yolov5_description.md --- .../yolov5_description.md | 26 ++++++++++++++----- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/yolov5_description.md b/docs/zh_cn/algorithm_descriptions/yolov5_description.md index b4e04d18b..4e8d48c62 100644 --- a/docs/zh_cn/algorithm_descriptions/yolov5_description.md +++ b/docs/zh_cn/algorithm_descriptions/yolov5_description.md @@ -241,11 +241,15 @@ train_pipeline = [ ### 1.2 网络结构 +
+YOLOv5_P6_structure_v1.0 +
+ 本小结由 RangeKing@github 撰写,非常感谢!!! YOLOv5 网络结构是标准的 `CSPDarknet` + `PAFPN` + `非解耦 Head`。 -YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决定。其中 `deepen_factor` 控制网络结构深度,即 `CSPLayer` 中 `DarknetBottleneck` 模块堆叠的数量;`widen_factor` 控制网络结构宽度,即模块输出特征图的通道数。以 YOLOv5-l 为例,其 `deepen_factor = widen_factor = 1.0` ,整体结构图如上所示。 +YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决定。其中 `deepen_factor` 控制网络结构深度,即 `CSPLayer` 中 `DarknetBottleneck` 模块堆叠的数量;`widen_factor` 控制网络结构宽度,即模块输出特征图的通道数。以 YOLOv5-l 为例,其 `deepen_factor = widen_factor = 1.0` 。P5 模型整体结构如本页面第一张图所示,P6 模型整体结构如上图所示。 图的上半部分为模型总览;下半部分为具体网络结构,其中的模块均标有序号,方便用户与 YOLOv5 官方仓库的配置文件对应;中间部分为各子模块的具体构成。 @@ -253,13 +257,18 @@ YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决 #### 1.2.1 Backbone -在 MMYOLO 中 `CSPDarknet` 继承自 `BaseBackbone`,整体结构和 `ResNet` 类似,共 5 层结构,包含 1 个 `Stem Layer` 和 4 个 `Stage Layer`: +在 MMYOLO 中 `CSPDarknet` 继承自 `BaseBackbone`,整体结构和 `ResNet` 类似。P5 模型共 5 层结构,包含 1 个 `Stem Layer` 和 4 个 `Stage Layer`: - `Stem Layer` 是 1 个 6x6 kernel 的 `ConvModule`,相较于 v6.1 版本之前的 `Focus` 模块更加高效。 -- 前 3 个 `Stage Layer` 均由 1 个 `ConvModule` 和 1 个 `CSPLayer` 组成。如上图 Details 部分所示。 +- 除了最后一个 `Stage Layer`,其他均由 1 个 `ConvModule` 和 1 个 `CSPLayer` 组成。如上图 Details 部分所示。 其中 `ConvModule` 为 3x3的 `Conv2d` + `BatchNorm` + `SiLU 激活函数`。`CSPLayer` 即 YOLOv5 官方仓库中的 C3 模块,由 3 个 `ConvModule` + n 个 `DarknetBottleneck`(带残差连接) 组成。 -- 第 4 个 `Stage Layer` 在最后增加了 `SPPF` 模块。`SPPF` 模块是将输入串行通过多个 5x5 大小的 `MaxPool2d` 层,与 `SPP` 模块效果相同,但速度更快。 -- P5 模型结构会在 `Stage Layer` 2-4 之后分别输出一个特征图进入 `Neck` 结构。以 640x640 输入图片为例,其输出特征为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32。 +- 最后一个 `Stage Layer` 在最后增加了 `SPPF` 模块。`SPPF` 模块是将输入串行通过多个 5x5 大小的 `MaxPool2d` 层,与 `SPP` 模块效果相同,但速度更快。 +- P5 模型会在 `Stage Layer` 2-4 之后分别输出一个特征图进入 `Neck` 结构。以 640x640 输入图片为例,其输出特征为 (B,256,80,80)、(B,512,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32。 +- P6 模型会在 `Stage Layer` 2-5 之后分别输出一个特征图进入 `Neck` 结构。以 1280x1280 输入图片为例,其输出特征为 (B,256,160,160)、(B,512,80,80)、(B,768,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32/64。 + +```{note} +1.2 小结涉及的特征纬度(shape)都为 (B, C, H, W)。 +``` #### 1.2.2 Neck @@ -267,14 +276,17 @@ YOLOv5 官方仓库的配置文件中并没有 Neck 部分,为方便用户与 基于 `BaseYOLONeck` 结构,YOLOv5 `Neck` 也是遵循同一套构建流程,对于不存在的模块,我们采用 `nn.Identity` 代替。 -Neck 模块输出的特征图和 Backbone 完全一致即为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20)。 +Neck 模块输出的特征图和 Backbone 完全一致。即 P5 模型为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20);P6 模型为 (B,256,160,160)、(B,512,80,80)、(B,768,40,40) 和 (B,1024,20,20)。 #### 1.2.3 Head YOLOv5 Head 结构和 YOLOv3 完全一样,为 `非解耦 Head`。Head 模块只包括 3 个不共享权重的卷积,用于将输入特征图进行变换而已。 前面的 PAFPN 依然是输出 3 个不同尺度的特征图,shape 为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20)。 -由于 YOLOv5 是非解耦输出,即分类和 bbox 检测等都是在同一个卷积的不同通道中完成。以 COCO 80 类为例,在输入为 640x640 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。其中 3 表示 3 个 anchor,4 表示 bbox 预测分支,1 表示 obj 预测分支,80 表示 COCO 数据集类别预测分支。 +由于 YOLOv5 是非解耦输出,即分类和 bbox 检测等都是在同一个卷积的不同通道中完成。以 COCO 80 类为例: +- P5 模型在输入为 640x640 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 +- P6 模型在输入为 1280x1280 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),160,160), (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 +其中 3 表示 3 个 anchor,4 表示 bbox 预测分支,1 表示 obj 预测分支,80 表示 COCO 数据集类别预测分支。 ### 1.3 正负样本匹配策略 From b95ddf283e170c4443ce8db2b1bde681eb9e3287 Mon Sep 17 00:00:00 2001 From: Range King Date: Wed, 9 Nov 2022 22:07:12 +0800 Subject: [PATCH 02/14] Update model_design.md --- docs/zh_cn/algorithm_descriptions/model_design.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/model_design.md b/docs/zh_cn/algorithm_descriptions/model_design.md index 1b0c1dd96..9a1d3f46f 100644 --- a/docs/zh_cn/algorithm_descriptions/model_design.md +++ b/docs/zh_cn/algorithm_descriptions/model_design.md @@ -5,7 +5,14 @@ 下图为 RangeKing@GitHub 提供,非常感谢!
-基类 +基类 P5 +图 1: P5 模型结构图 +
+ + +
+基类 P6 +图 2: P6 模型结构图
YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Darknet + PAFPN。为了让用户快速理解 YOLO 系列算法架构,我们特意设计了如上图中的 BaseBackbone + BaseYOLONeck 结构。 @@ -20,7 +27,9 @@ YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Da ### BaseBackbone -如上图所示,对于 P5 而言,BaseBackbone 包括 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构,不同算法的主干网络继承 BaseBackbone,用户可以通过实现内部的 `build_xx` 方法,使用自定义的基础模块来构建每一层的内部结构。 +如图 1 所示,对于 P5 而言,BaseBackbone 为包含 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构。 +如图 2 所示,对于 P6 而言,BaseBackbone 为包含 1 个 stem 层 + 5 个 stage 层结构。 +不同算法的主干网络继承 BaseBackbone,用户可以通过实现内部的 `build_xx` 方法,使用自定义的基础模块来构建每一层的内部结构。 ### BaseYOLONeck From 4d4929c73f4cc0fed3e1edc6488cd8b89f5ade6b Mon Sep 17 00:00:00 2001 From: RangeKing Date: Wed, 9 Nov 2022 22:09:27 +0800 Subject: [PATCH 03/14] Fix typo in yolov5_description.md --- docs/zh_cn/algorithm_descriptions/yolov5_description.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/yolov5_description.md b/docs/zh_cn/algorithm_descriptions/yolov5_description.md index 4e8d48c62..9ac620061 100644 --- a/docs/zh_cn/algorithm_descriptions/yolov5_description.md +++ b/docs/zh_cn/algorithm_descriptions/yolov5_description.md @@ -267,7 +267,7 @@ YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决 - P6 模型会在 `Stage Layer` 2-5 之后分别输出一个特征图进入 `Neck` 结构。以 1280x1280 输入图片为例,其输出特征为 (B,256,160,160)、(B,512,80,80)、(B,768,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32/64。 ```{note} -1.2 小结涉及的特征纬度(shape)都为 (B, C, H, W)。 +1.2 小结涉及的特征维度(shape)都为 (B, C, H, W)。 ``` #### 1.2.2 Neck @@ -284,9 +284,10 @@ YOLOv5 Head 结构和 YOLOv3 完全一样,为 `非解耦 Head`。Head 模块 前面的 PAFPN 依然是输出 3 个不同尺度的特征图,shape 为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20)。 由于 YOLOv5 是非解耦输出,即分类和 bbox 检测等都是在同一个卷积的不同通道中完成。以 COCO 80 类为例: + - P5 模型在输入为 640x640 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 - P6 模型在输入为 1280x1280 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),160,160), (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 -其中 3 表示 3 个 anchor,4 表示 bbox 预测分支,1 表示 obj 预测分支,80 表示 COCO 数据集类别预测分支。 + 其中 3 表示 3 个 anchor,4 表示 bbox 预测分支,1 表示 obj 预测分支,80 表示 COCO 数据集类别预测分支。 ### 1.3 正负样本匹配策略 From de3b56d3956a400c13280b960f9f0b9b258b1aca Mon Sep 17 00:00:00 2001 From: RangeKing Date: Wed, 9 Nov 2022 22:20:17 +0800 Subject: [PATCH 04/14] Update model_design.md --- docs/zh_cn/algorithm_descriptions/model_design.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/model_design.md b/docs/zh_cn/algorithm_descriptions/model_design.md index 9a1d3f46f..dde61cb82 100644 --- a/docs/zh_cn/algorithm_descriptions/model_design.md +++ b/docs/zh_cn/algorithm_descriptions/model_design.md @@ -9,7 +9,6 @@ 图 1: P5 模型结构图 -
基类 P6 图 2: P6 模型结构图 @@ -27,8 +26,9 @@ YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Da ### BaseBackbone -如图 1 所示,对于 P5 而言,BaseBackbone 为包含 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构。 -如图 2 所示,对于 P6 而言,BaseBackbone 为包含 1 个 stem 层 + 5 个 stage 层结构。 +- 如图 1 所示,对于 P5 而言,BaseBackbone 为包含 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构。 +- 如图 2 所示,对于 P6 而言,BaseBackbone 为包含 1 个 stem 层 + 5 个 stage 层的结构。 + 不同算法的主干网络继承 BaseBackbone,用户可以通过实现内部的 `build_xx` 方法,使用自定义的基础模块来构建每一层的内部结构。 ### BaseYOLONeck From 2500470940e81d1287975def3a8e5dec10541791 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:07:32 +0800 Subject: [PATCH 05/14] Update yolov5_description.md --- .../yolov5_description.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/yolov5_description.md b/docs/zh_cn/algorithm_descriptions/yolov5_description.md index 9ac620061..e30fe6e9a 100644 --- a/docs/zh_cn/algorithm_descriptions/yolov5_description.md +++ b/docs/zh_cn/algorithm_descriptions/yolov5_description.md @@ -4,6 +4,12 @@
YOLOv5_structure_v3.4 +图 1:YOLOv5-P5 模型结构图 +
+ +
+YOLOv5_P6_structure_v1.0 +图 2:YOLOv5-P6 模型结构图
以上结构图由 RangeKing@github 绘制。 @@ -14,7 +20,11 @@ YOLOv5 是一个面向实时工业应用而开源的目标检测算法,受到 2. **算法训练速度极快**,在 300 epoch 情况下训练时长和大部分 one-stage 算法如 RetinaNet、ATSS 和 two-stage 算法如 Faster R-CNN 在 12 epoch 的训练时间接近 3. 框架进行了**非常多的 corner case 优化**,功能和文档也比较丰富 -本文将从 YOLOv5 算法本身原理讲起,然后重点分析 MMYOLO 中的实现。关于 YOLOv5 的使用指南和速度等对比请阅读本文的后续内容。 +如图 1 和 2 所示,YOLOv5 的 P5 和 P6 版本主要差异在于网络结构和图片输入分辨率。其他区别,如 anchors 个数和 loss 权重可详见[配置文件](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/)。本文将从 YOLOv5 算法本身原理讲起,然后重点分析 MMYOLO 中的实现。关于 YOLOv5 的使用指南和速度等对比请阅读本文的后续内容。 + +```{hint} +没有特殊说明情况下,本文默认描述的是 P5 模型。 +``` 希望本文能够成为你入门和掌握 YOLOv5 的核心文档。由于 YOLOv5 本身也在不断迭代更新,我们也会不断的更新本文档。请注意阅读最新版本。 @@ -241,15 +251,11 @@ train_pipeline = [ ### 1.2 网络结构 -
-YOLOv5_P6_structure_v1.0 -
- 本小结由 RangeKing@github 撰写,非常感谢!!! YOLOv5 网络结构是标准的 `CSPDarknet` + `PAFPN` + `非解耦 Head`。 -YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决定。其中 `deepen_factor` 控制网络结构深度,即 `CSPLayer` 中 `DarknetBottleneck` 模块堆叠的数量;`widen_factor` 控制网络结构宽度,即模块输出特征图的通道数。以 YOLOv5-l 为例,其 `deepen_factor = widen_factor = 1.0` 。P5 模型整体结构如本页面第一张图所示,P6 模型整体结构如上图所示。 +YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决定。其中 `deepen_factor` 控制网络结构深度,即 `CSPLayer` 中 `DarknetBottleneck` 模块堆叠的数量;`widen_factor` 控制网络结构宽度,即模块输出特征图的通道数。以 YOLOv5-l 为例,其 `deepen_factor = widen_factor = 1.0` 。P5 和 P6 的模型整体结构分别如图 1 和图 2 所示。 图的上半部分为模型总览;下半部分为具体网络结构,其中的模块均标有序号,方便用户与 YOLOv5 官方仓库的配置文件对应;中间部分为各子模块的具体构成。 From 9901fc68807f79b747b3d6d6f4cfa1bf93a18d42 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:13:37 +0800 Subject: [PATCH 06/14] Update model_design.md --- docs/zh_cn/algorithm_descriptions/model_design.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/model_design.md b/docs/zh_cn/algorithm_descriptions/model_design.md index dde61cb82..7d4634fac 100644 --- a/docs/zh_cn/algorithm_descriptions/model_design.md +++ b/docs/zh_cn/algorithm_descriptions/model_design.md @@ -6,12 +6,12 @@
基类 P5 -图 1: P5 模型结构图 +图 1:P5 模型结构图
基类 P6 -图 2: P6 模型结构图 +图 2:P6 模型结构图
YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Darknet + PAFPN。为了让用户快速理解 YOLO 系列算法架构,我们特意设计了如上图中的 BaseBackbone + BaseYOLONeck 结构。 From f35486a52896068f83eddee37bcd20fac185f3bf Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:13:52 +0800 Subject: [PATCH 07/14] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 49a858e9a..74dbace61 100644 --- a/README.md +++ b/README.md @@ -62,8 +62,9 @@ The master branch works with **PyTorch 1.6+**. MMYOLO decomposes the framework into different components where users can easily customize a model by combining different modules with various training and testing strategies. -BaseModule - The figure is contributed by RangeKing@GitHub, thank you very much! +BaseModule-P5 + The figure above is contributed by RangeKing@GitHub, thank you very much! + And the figure of P6 model is in [model_design.md](docs\en\algorithm_descriptions\model_design.md). From 8534178e03f6f992666e6d6a1e4728680b90d3f7 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:14:04 +0800 Subject: [PATCH 08/14] Update README_zh-CN.md --- README_zh-CN.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README_zh-CN.md b/README_zh-CN.md index effec6ba7..16043f701 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -62,8 +62,9 @@ MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具 MMYOLO 将框架解耦成不同的模块组件,通过组合不同的模块和训练测试策略,用户可以便捷地构建自定义模型。 -基类 +基类-P5 图为 RangeKing@GitHub 提供,非常感谢! + P6 模型图详见 [model_design.md](docs\en\algorithm_descriptions\model_design.md)。 From 29036620073cc72232de7f001e21a5963e420159 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:16:59 +0800 Subject: [PATCH 09/14] Fix format --- README.md | 3 ++- README_zh-CN.md | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 74dbace61..45e4e4f5b 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,8 @@ The master branch works with **PyTorch 1.6+**. BaseModule-P5 The figure above is contributed by RangeKing@GitHub, thank you very much! - And the figure of P6 model is in [model_design.md](docs\en\algorithm_descriptions\model_design.md). + +And the figure of P6 model is in [model_design.md](docs\en\algorithm_descriptions\model_design.md). diff --git a/README_zh-CN.md b/README_zh-CN.md index 16043f701..18ad53789 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -64,7 +64,8 @@ MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具 基类-P5 图为 RangeKing@GitHub 提供,非常感谢! - P6 模型图详见 [model_design.md](docs\en\algorithm_descriptions\model_design.md)。 + +P6 模型图详见 [model_design.md](docs\en\algorithm_descriptions\model_design.md)。 From d3e87e86232e6fe4a539d5ffca8252e66023546b Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:50:55 +0800 Subject: [PATCH 10/14] Update en docs --- .../en/algorithm_descriptions/model_design.md | 16 ++++++-- .../yolov5_description.md | 39 ++++++++++++++----- 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/docs/en/algorithm_descriptions/model_design.md b/docs/en/algorithm_descriptions/model_design.md index 96d022c5c..e1fc5b822 100644 --- a/docs/en/algorithm_descriptions/model_design.md +++ b/docs/en/algorithm_descriptions/model_design.md @@ -2,13 +2,19 @@ ## YOLO series model basic class -The structural graph is provided by RangeKing@GitHub. Thank you RangeKing! +The structural figure is provided by RangeKing@GitHub. Thank you RangeKing!
-BaseModule +BaseModule-P5 +Figure 1: P5 model structure
-Most YOLO series algorithms adopt a unified algorithm-building structure, typically as Darknet + PAFPN. In order to let users quickly understand the YOLO series algorithm architecture, we deliberately designed the `BaseBackbone` + `BaseYOLONeck` structure, as shown in the above graph. +
+BaseModule-P6 +Figure 2: P6 model structure +
+ +Most YOLO series algorithms adopt a unified algorithm-building structure, typically as Darknet + PAFPN. In order to let users quickly understand the YOLO series algorithm architecture, we deliberately designed the `BaseBackbone` + `BaseYOLONeck` structure, as shown in the above figure. The benefits of the abstract `BaseBackbone` include: @@ -20,7 +26,9 @@ The benefits of the abstract `BaseBackbone` include: ### BaseBackbone -We can see in the above graph, as for P5, `BaseBackbone` includes 1 stem layer and 4 stage layers which are similar to the basic structure of ResNet. Different backbone network algorithms inherit the `BaseBackbone`. Users can build each layer of the whole network by implementing customized basic modules through the internal `build_xx` method. +- As shown in Figure 1, for P5, `BaseBackbone` includes 1 stem layer and 4 stage layers which are similar to the basic structure of ResNet. +- As shown in Figure 2, for P6, `BaseBackbone` includes 1 stem layer and 5 stage layers. + Different backbone network algorithms inherit the `BaseBackbone`. Users can build each layer of the whole network by implementing customized basic modules through the internal `build_xx` method. ### BaseYOLONeck diff --git a/docs/en/algorithm_descriptions/yolov5_description.md b/docs/en/algorithm_descriptions/yolov5_description.md index 3e714556a..1f61ef0ae 100644 --- a/docs/en/algorithm_descriptions/yolov5_description.md +++ b/docs/en/algorithm_descriptions/yolov5_description.md @@ -3,7 +3,13 @@ ## 0 Introduction
-YOLOv5_structure_v3.4 +YOLOv5-P5_structure_v3.4 +Figure 1: YOLOv5-P5 model structure +
+ +
+YOLOv5-P6_structure_v1.0 +Figure 2: YOLOv5-P6 model structure
RangeKing@github provides the graph above. Thanks, RangeKing! @@ -15,7 +21,11 @@ In short, the main features of YOLOv5 are: 2. **Fast training speed**: the training time in the case of 300 epochs is similar to most of the one-stage and two-stage algorithms under 12 epochs, such as RetinaNet, ATSS, and Faster R-CNN. 3. **Abundant optimization for corner cases**: YOLOv5 has implemented many optimizations. The functions and documentation are richer as well. -This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5. +Figures 1 and 2 show that the main differences between the P5 and P6 versions of YOLOv5 are the network structure and the image input resolution. Other differences, such as the number of anchors and loss weights, can be found in [configuration files](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/). This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5. + +```{hint} +Unless specified, the P5 model is described by default in this documentation. +``` We hope this article becomes your core document to start and master YOLOv5. Since YOLOv5 is still constantly updated, we will also keep updating this document. So please always catch up with the latest version. @@ -245,20 +255,25 @@ This section was written by RangeKing@github. Thanks a lot! The YOLOv5 network structure is the standard `CSPDarknet` + `PAFPN` + `non-decoupled Head`. -The size of the YOLOv5 network structure is determined by the `deepen_factor` and `widen_factor` parameters. `deepen_factor` controls the depth of the network structure, that is, the number of stacks of `DarknetBottleneck` modules in `CSPLayer`. `widen_factor` controls the width of the network structure, that is, the number of channels of the module output feature map. Take YOLOv5-l as an example. Its `deepen_factor = widen_factor = 1.0` , the overall structure is shown in the graph above. +The size of the YOLOv5 network structure is determined by the `deepen_factor` and `widen_factor` parameters. `deepen_factor` controls the depth of the network structure, that is, the number of stacks of `DarknetBottleneck` modules in `CSPLayer`. `widen_factor` controls the width of the network structure, that is, the number of channels of the module output feature map. Take YOLOv5-l as an example. Its `deepen_factor = widen_factor = 1.0`. the overall structure is shown in the graph above. The upper part of the figure is an overview of the model; the lower part is the specific network structure, in which the modules are marked with numbers in serial, which is convenient for users to correspond to the configuration files of the YOLOv5 official repository. The middle part is the detailed composition of each sub-module. -If you want to use **netron** to visualize the details of the network structure, just open the ONNX file format exported by MMDeploy in netron. +If you want to use **netron** to visualize the details of the network structure, open the ONNX file format exported by MMDeploy in netron. + +```{hint} +The shapes of the feature map in Section 1.2 are (B, C, H, W) by default. +``` #### 1.2.1 Backbone `CSPDarknet` in MMYOLO inherits from `BaseBackbone`. The overall structure is similar to `ResNet` with a total of 5 layers of design, including one `Stem Layer` and four `Stage Layer`: - `Stem Layer` is a `ConvModule` whose kernel size is 6x6. It is more efficient than the `Focus` module used before v6.1. -- Each of the first three `Stage Layer` consists of one `ConvModule` and one `CSPLayer`, as shown in the Details part in the graph above. `ConvModule` is a 3x3 `Conv2d` + `BatchNorm` + `SiLU activation function` module. `CSPLayer` is the C3 module in the official YOLOv5 repository, consisting of three `ConvModule` + n `DarknetBottleneck` with residual connections. -- The 4th `Stage Layer` adds an `SPPF` module at the end. The `SPPF` module is to serialize the input through multiple 5x5 `MaxPool2d` layers, which has the same effect as the `SPP` module but is faster. -- The P5 model passes the corresponding results from the second to the fourth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 640x640 input image as an example. The output features are (B, 256, 80, 80), (B,512,40,40), and (B,1024,20,20), the corresponding stride is 8/16/32. +- Except for the last `Stage Layer`, each `Stage Layer` consists of one `ConvModule` and one `CSPLayer`, as shown in the Details part in the graph above. `ConvModule` is a 3x3 `Conv2d` + `BatchNorm` + `SiLU activation function` module. `CSPLayer` is the C3 module in the official YOLOv5 repository, consisting of three `ConvModule` + n `DarknetBottleneck` with residual connections. +- The last `Stage Layer` adds an `SPPF` module at the end. The `SPPF` module is to serialize the input through multiple 5x5 `MaxPool2d` layers, which has the same effect as the `SPP` module but is faster. +- The P5 model passes the corresponding results from the second to the fourth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 640x640 input image as an example. The output features are (B, 256, 80, 80), (B,512,40,40), and (B,1024,20,20). The corresponding stride is 8/16/32. +- The P6 model passes the corresponding results from the second to the fifth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 1280x1280 input image as an example. The output features are (B, 256, 160, 160), (B,512,80,80), (B,768,40,40), and (B,1024,20,20). The corresponding stride is 8/16/32/64. #### 1.2.2 Neck @@ -266,7 +281,7 @@ There is no **Neck** part in the official YOLOv5. However, to facilitate users t Based on the `BaseYOLONeck` structure, YOLOv5's `Neck` also follows the same build process. However, for non-existed modules, we use `nn.Identity` instead. -The feature maps output by the Neck module are the same as the Backbone, which is (B,256,80,80), (B,512,40,40), and (B,1024,20,20). +The feature maps output by the Neck module is the same as the Backbone. The P5 model is (B,256,80,80), (B,512,40,40) and (B,1024,20,20); the P6 model is (B,256,160,160), (B,512,80,80), (B,768,40,40) and (B,1024,20,20). #### 1.2.3 Head @@ -274,7 +289,13 @@ The `Head` structure of YOLOv5 is the same as YOLOv3, which is a `non-decoupled The `PAFPN` outputs three feature maps of different scales, whose shapes are (B,256,80,80), (B,512,40,40), and (B,1024,20,20) accordingly. -Since YOLOv5 has a non-decoupled output, that is, classification and bbox detection results are all in different channels of the same convolution module. Taking the COCO dataset as an example, when the input is 640x640 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`. `3` represents three anchors, `4` represents the bbox prediction branch, `1` represents the obj prediction branch, and `80` represents the class prediction branch of the COCO dataset. +Since YOLOv5 has a non-decoupled output, that is, classification and bbox detection results are all in different channels of the same convolution module. Taking the COCO dataset as an example: + +- When the input of P5 model is 640x640 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`. + +- When the input of P6 model is 1280x1280 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),160,160)`, `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`. + + `3` represents three anchors, `4` represents the bbox prediction branch, `1` represents the obj prediction branch, and `80` represents the class prediction branch of the COCO dataset. ### 1.3 Positive and negative sample assignment strategy From 5d6039cf621f4a3dda032e0a1806d8ed4f4045bc Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:51:32 +0800 Subject: [PATCH 11/14] Refine zh_cn docs --- .../algorithm_descriptions/model_design.md | 4 ++-- .../yolov5_description.md | 20 +++++++++---------- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/zh_cn/algorithm_descriptions/model_design.md b/docs/zh_cn/algorithm_descriptions/model_design.md index 7d4634fac..92b8f2c43 100644 --- a/docs/zh_cn/algorithm_descriptions/model_design.md +++ b/docs/zh_cn/algorithm_descriptions/model_design.md @@ -6,12 +6,12 @@
基类 P5 -图 1:P5 模型结构图 +图 1:P5 模型结构
基类 P6 -图 2:P6 模型结构图 +图 2:P6 模型结构
YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Darknet + PAFPN。为了让用户快速理解 YOLO 系列算法架构,我们特意设计了如上图中的 BaseBackbone + BaseYOLONeck 结构。 diff --git a/docs/zh_cn/algorithm_descriptions/yolov5_description.md b/docs/zh_cn/algorithm_descriptions/yolov5_description.md index e30fe6e9a..30b9aef31 100644 --- a/docs/zh_cn/algorithm_descriptions/yolov5_description.md +++ b/docs/zh_cn/algorithm_descriptions/yolov5_description.md @@ -3,13 +3,13 @@ ## 0 简介
-YOLOv5_structure_v3.4 -图 1:YOLOv5-P5 模型结构图 +YOLOv5-P5_structure_v3.4 +图 1:YOLOv5-P5 模型结构
-YOLOv5_P6_structure_v1.0 -图 2:YOLOv5-P6 模型结构图 +YOLOv5-P6_structure_v1.0 +图 2:YOLOv5-P6 模型结构
以上结构图由 RangeKing@github 绘制。 @@ -261,6 +261,10 @@ YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决 如果想使用 netron 可视化网络结构图细节,可以直接在 netron 中将 MMDeploy 导出的 ONNX 文件格式文件打开。 +```{hint} +1.2 小节涉及的特征维度(shape)都为 (B, C, H, W)。 +``` + #### 1.2.1 Backbone 在 MMYOLO 中 `CSPDarknet` 继承自 `BaseBackbone`,整体结构和 `ResNet` 类似。P5 模型共 5 层结构,包含 1 个 `Stem Layer` 和 4 个 `Stage Layer`: @@ -272,10 +276,6 @@ YOLOv5 网络结构大小由 `deepen_factor` 和 `widen_factor` 两个参数决 - P5 模型会在 `Stage Layer` 2-4 之后分别输出一个特征图进入 `Neck` 结构。以 640x640 输入图片为例,其输出特征为 (B,256,80,80)、(B,512,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32。 - P6 模型会在 `Stage Layer` 2-5 之后分别输出一个特征图进入 `Neck` 结构。以 1280x1280 输入图片为例,其输出特征为 (B,256,160,160)、(B,512,80,80)、(B,768,40,40) 和 (B,1024,20,20),对应的 stride 分别为 8/16/32/64。 -```{note} -1.2 小结涉及的特征维度(shape)都为 (B, C, H, W)。 -``` - #### 1.2.2 Neck YOLOv5 官方仓库的配置文件中并没有 Neck 部分,为方便用户与其他目标检测网络结构相对应,我们将官方仓库的 `Head` 拆分成 `PAFPN` 和 `Head` 两部分。 @@ -291,8 +291,8 @@ YOLOv5 Head 结构和 YOLOv3 完全一样,为 `非解耦 Head`。Head 模块 前面的 PAFPN 依然是输出 3 个不同尺度的特征图,shape 为 (B,256,80,80)、 (B,512,40,40) 和 (B,1024,20,20)。 由于 YOLOv5 是非解耦输出,即分类和 bbox 检测等都是在同一个卷积的不同通道中完成。以 COCO 80 类为例: -- P5 模型在输入为 640x640 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 -- P6 模型在输入为 1280x1280 分辨率情况下,其 Head 模块输出的 shape 分别为 (B, 3x(4+1+80),160,160), (B, 3x(4+1+80),80,80), (B, 3x(4+1+80),40,40) 和 (B, 3x(4+1+80),20,20)。 +- P5 模型在输入为 640x640 分辨率情况下,其 Head 模块输出的 shape 分别为 `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` 和 `(B, 3x(4+1+80),20,20)`。 +- P6 模型在输入为 1280x1280 分辨率情况下,其 Head 模块输出的 shape 分别为 `(B, 3x(4+1+80),160,160)`, `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` 和 `(B, 3x(4+1+80),20,20)`。 其中 3 表示 3 个 anchor,4 表示 bbox 预测分支,1 表示 obj 预测分支,80 表示 COCO 数据集类别预测分支。 ### 1.3 正负样本匹配策略 From ead47abb35f79547248395bfbdf6ff0b9f740560 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 10:56:13 +0800 Subject: [PATCH 12/14] Update README_zh-CN.md --- README_zh-CN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README_zh-CN.md b/README_zh-CN.md index 18ad53789..82a150906 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -65,7 +65,7 @@ MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具 基类-P5 图为 RangeKing@GitHub 提供,非常感谢! -P6 模型图详见 [model_design.md](docs\en\algorithm_descriptions\model_design.md)。 +P6 模型图详见 [model_design.md](docs\zh_CN\algorithm_descriptions\model_design.md)。 From 741d4da5bdc2180607c0e938e0bcca834390b6ee Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 11:00:46 +0800 Subject: [PATCH 13/14] Fix links of model_design.md --- README.md | 2 +- README_zh-CN.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 45e4e4f5b..24112bc8d 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ The master branch works with **PyTorch 1.6+**. BaseModule-P5 The figure above is contributed by RangeKing@GitHub, thank you very much! -And the figure of P6 model is in [model_design.md](docs\en\algorithm_descriptions\model_design.md). +And the figure of P6 model is in [model_design.md](docs/en/algorithm_descriptions/model_design.md). diff --git a/README_zh-CN.md b/README_zh-CN.md index 82a150906..25f26fb22 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -65,7 +65,7 @@ MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具 基类-P5 图为 RangeKing@GitHub 提供,非常感谢! -P6 模型图详见 [model_design.md](docs\zh_CN\algorithm_descriptions\model_design.md)。 +P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_design.md)。 From 108c2765803fdd51a6ec315a3dbd3cb2fd390667 Mon Sep 17 00:00:00 2001 From: RangeKing Date: Thu, 10 Nov 2022 11:03:33 +0800 Subject: [PATCH 14/14] specify the config file --- docs/en/algorithm_descriptions/yolov5_description.md | 2 +- docs/zh_cn/algorithm_descriptions/yolov5_description.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/algorithm_descriptions/yolov5_description.md b/docs/en/algorithm_descriptions/yolov5_description.md index 1f61ef0ae..e22eef64b 100644 --- a/docs/en/algorithm_descriptions/yolov5_description.md +++ b/docs/en/algorithm_descriptions/yolov5_description.md @@ -21,7 +21,7 @@ In short, the main features of YOLOv5 are: 2. **Fast training speed**: the training time in the case of 300 epochs is similar to most of the one-stage and two-stage algorithms under 12 epochs, such as RetinaNet, ATSS, and Faster R-CNN. 3. **Abundant optimization for corner cases**: YOLOv5 has implemented many optimizations. The functions and documentation are richer as well. -Figures 1 and 2 show that the main differences between the P5 and P6 versions of YOLOv5 are the network structure and the image input resolution. Other differences, such as the number of anchors and loss weights, can be found in [configuration files](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/). This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5. +Figures 1 and 2 show that the main differences between the P5 and P6 versions of YOLOv5 are the network structure and the image input resolution. Other differences, such as the number of anchors and loss weights, can be found in the [configuration file](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/yolov5_s-p6-v62_syncbn_fast_8xb16-300e_coco.py). This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5. ```{hint} Unless specified, the P5 model is described by default in this documentation. diff --git a/docs/zh_cn/algorithm_descriptions/yolov5_description.md b/docs/zh_cn/algorithm_descriptions/yolov5_description.md index 30b9aef31..61db0a0d5 100644 --- a/docs/zh_cn/algorithm_descriptions/yolov5_description.md +++ b/docs/zh_cn/algorithm_descriptions/yolov5_description.md @@ -20,7 +20,7 @@ YOLOv5 是一个面向实时工业应用而开源的目标检测算法,受到 2. **算法训练速度极快**,在 300 epoch 情况下训练时长和大部分 one-stage 算法如 RetinaNet、ATSS 和 two-stage 算法如 Faster R-CNN 在 12 epoch 的训练时间接近 3. 框架进行了**非常多的 corner case 优化**,功能和文档也比较丰富 -如图 1 和 2 所示,YOLOv5 的 P5 和 P6 版本主要差异在于网络结构和图片输入分辨率。其他区别,如 anchors 个数和 loss 权重可详见[配置文件](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/)。本文将从 YOLOv5 算法本身原理讲起,然后重点分析 MMYOLO 中的实现。关于 YOLOv5 的使用指南和速度等对比请阅读本文的后续内容。 +如图 1 和 2 所示,YOLOv5 的 P5 和 P6 版本主要差异在于网络结构和图片输入分辨率。其他区别,如 anchors 个数和 loss 权重可详见[配置文件](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/yolov5_s-p6-v62_syncbn_fast_8xb16-300e_coco.py)。本文将从 YOLOv5 算法本身原理讲起,然后重点分析 MMYOLO 中的实现。关于 YOLOv5 的使用指南和速度等对比请阅读本文的后续内容。 ```{hint} 没有特殊说明情况下,本文默认描述的是 P5 模型。