yolox_nano速度问题 #9

1VeniVediVeci1 · 2021-07-29T16:31:30Z

我使用您的代码框架测试了一下yolox系列的推理速度，yolox_nano以外的模型推理速度都很正常，但是使用nano模型时，推理速度甚至低于yolox_s。所用的onnx文件均为利用官方coco数据集训练出来的pth文件转化得到。
我注意到yolox在定义nano模型时，有一段额外代码（./exps/default/nano.py中），如下图所示

这是否会有影响？

DefTruth · 2021-07-30T14:09:39Z

按理说，用depth-wise在计算量和参数量上都是会减少的（按照MobileNetV1论文中的公式来看），但是实际在推理引擎跑的耗时，受影响的因素可能会有很多，你可以尝试下官网的ncnn版本。或者调整下我这里onnxruntime c++版本的线程数，看看是否有改善。我这默认的线程数是1，你可以尝试别的线程数。

class LITE_EXPORTS YoloX : public BasicOrtHandler
{
  public:
    explicit YoloX(const std::string &_onnx_path, unsigned int _num_threads = 1) :  // 线程数默认为1
        BasicOrtHandler(_onnx_path, _num_threads)
    {};
}

可以修改为别的线程数：

auto *yolox = new lite::cv::detection::YoloX(onnx_path, 8);  // 8 threads.

1VeniVediVeci1 · 2021-07-31T05:39:07Z

按理说，用depth-wise在计算量和参数量上都是会减少的（按照MobileNetV1论文中的公式来看），但是实际在推理引擎跑的耗时，受影响的因素可能会有很多，你可以尝试下官网的ncnn版本。或者调整下我这里onnxruntime c++版本的线程数，看看是否有改善。我这默认的线程数是1，你可以尝试别的线程数。
class LITE_EXPORTS YoloX : public BasicOrtHandler
{
  public:
    explicit YoloX(const std::string &_onnx_path, unsigned int _num_threads = 1) :  // 线程数默认为1
        BasicOrtHandler(_onnx_path, _num_threads)
    {};
}
可以修改为别的线程数：
auto *yolox = new lite::cv::detection::YoloX(onnx_path, 8);  // 8 threads.

感谢回复。通过一系列的测试，我感觉应该是网络并行加速能力的问题而非代码问题。
使用CPU进行推理时，onnx与ncnn的速度一致，而且耗时上nano<tiny<s，和经验一致。增加推理的线程数，nano网络提升很小，但是tiny与s提升非常大。
使用GPU进行推理时，耗时上tiny<s<nano。其实这与CPU推理时增加线程数的现象是吻合的，反应了nano网络实际上非常适合单线程处理，如果降低单线程处理速度并增加线程数，反而会增加推理时间。

P.S. 用CPU推理时，将ort_handler.cpp中session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);修改成session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);可以显著提升推理速度

1VeniVediVeci1 · 2021-07-31T05:46:51Z

除此之外我还发现一个新问题，就是当网络输入尺寸的长宽不一样时（也就是说是个长方形），推理的结果会发生错误，我不知道这是为什么

DefTruth · 2021-07-31T05:56:36Z

除此之外我还发现一个新问题，就是当网络输入尺寸的长宽不一样时（也就是说是个长方形），推理的结果会发生错误，我不知道这是为什么

应该不会的，我测试了YOLOX上的onnx模型文件，无论图像是否为方形都是正常的。不过nano的精确度确实比其他的差些。你方便把你训练的后处理和nano模型文件发出来吗？

DefTruth · 2021-07-31T06:04:13Z

除此之外我还发现一个新问题，就是当网络输入尺寸的长宽不一样时（也就是说是个长方形），推理的结果会发生错误，我不知道这是为什么
在 yolox.cpp 所做的尺度反算对应到输入的原图位置上，是按照官方的后处理流程来处理的。可能你需要核对一下这段逻辑是否和你训练时的后处理一致。

    const float scale_height = img_height / input_height;
    const float scale_width = img_width / input_width;  
    //   .....
    types::Boxf box;
    box.x1 = (cx - w / 2.f) * scale_width;
    box.y1 = (cy - h / 2.f) * scale_height;
    box.x2 = (cx + w / 2.f) * scale_width;
    box.y2 = (cy + h / 2.f) * scale_height;
    box.score = conf;
    box.label = label;
    box.label_text = class_names[label];
    box.flag = true;
    bbox_collection.push_back(box);

DefTruth · 2021-07-31T06:07:01Z

感谢回复。通过一系列的测试，我感觉应该是网络并行加速能力的问题而非代码问题。
使用CPU进行推理时，onnx与ncnn的速度一致，而且耗时上nano<tiny<s，和经验一致。增加推理的线程数，nano网络提升很小，但是tiny与s提升非常大。
使用GPU进行推理时，耗时上tiny<s<nano。其实这与CPU推理时增加线程数的现象是吻合的，反应了nano网络实际上非常适合单线程处理，如果降低单线程处理速度并增加线程数，反而会增加推理时间。

P.S. 用CPU推理时，将ort_handler.cpp中session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);修改成session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);可以显著提升推理速度

赞👍🏻~ 学到了新知识 ~

DefTruth · 2021-07-31T06:15:15Z

除此之外我还发现一个新问题，就是当网络输入尺寸的长宽不一样时（也就是说是个长方形），推理的结果会发生错误，我不知道这是为什么

但如果你说的是模型输入是类似640x480的tensor，那么你说的情况是有可能，按照YOLOX官方放出的C++代码，它在生成Anchor(GridAndStride)时，假设了target_size是一个const int，你看：

static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
    for (auto stride : strides)
    {
        int num_grid = target_size / stride;
        for (int g1 = 0; g1 < num_grid; g1++)
        {
            for (int g0 = 0; g0 < num_grid; g0++)
            {
                grid_strides.push_back((GridAndStride){g0, g1, stride});
            }
        }
    }
}

也就是他认为输入张量是个方形。我这边的实现也是这样，使用了height作为target_size. 在 yolox.cpp中：

this->generate_anchors(input_height, strides, anchors);

所以你应该只要修改这段，让他能适应一般矩形的输入，应该就能解决问题

void YoloX::generate_anchors(const int target_height, const int target_width,
         std::vector<int> &strides, std::vector<Anchor> &anchors)
{
  for (auto stride : strides)
  {
    int num_grid_w = target_width / stride;
    int num_grid_h = target_height / stride;
    for (int g1 = 0; g1 < num_grid_h; g1++)
    {
      for (int g0 = 0; g0 < num_grid_w; g0++)
      {
        anchors.push_back((Anchor) {g0, g1, stride});
      }
    }
  }
}

然后将生成anchor的调用修改成

this->generate_anchors(input_height, input_width, strides, anchors);

1VeniVediVeci1 · 2021-07-31T07:46:54Z

除此之外我还发现一个新问题，就是当网络输入尺寸的长宽不一样时（也就是说是个长方形），推理的结果会发生错误，我不知道这是为什么

但如果你说的是模型输入是类似640x480的tensor，那么你说的情况是有可能，按照YOLOX官方放出的C++代码，它在生成Anchor(GridAndStride)时，假设了target_size是一个const int，你看：
static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
    for (auto stride : strides)
    {
        int num_grid = target_size / stride;
        for (int g1 = 0; g1 < num_grid; g1++)
        {
            for (int g0 = 0; g0 < num_grid; g0++)
            {
                grid_strides.push_back((GridAndStride){g0, g1, stride});
            }
        }
    }
}
也就是他认为输入张量是个方形。我这边的实现也是这样，使用了height作为target_size. 在 yolox.cpp中：
this->generate_anchors(input_height, strides, anchors);
所以你应该只要修改这段，让他能适应一般矩形的输入，应该就能解决问题
void YoloX::generate_anchors(const int target_height, const int target_width,
         std::vector<int> &strides, std::vector<Anchor> &anchors)
{
  for (auto stride : strides)
  {
    int num_grid_w = target_width / stride;
    int num_grid_h = target_height / stride;
    for (int g1 = 0; g1 < num_grid_h; g1++)
    {
      for (int g0 = 0; g0 < num_grid_w; g0++)
      {
        anchors.push_back((Anchor) {g0, g1, stride});
      }
    }
  }
}
然后将生成anchor的调用修改成
this->generate_anchors(input_height, input_width, strides, anchors);

非常感谢，确实是这个问题

xinsuinizhuan · 2021-08-01T15:12:25Z

GPU

使用gpu，需要设置什么，怎么设置？我这边换了gpu的库，没效果，设置还不如cpu的速度？

DefTruth added the question Further information is requested label Jul 30, 2021

DefTruth added a commit that referenced this issue Jul 31, 2021

#9 fixed YOLOX inference error for non-square shape

fd92a3c

DefTruth added the YOLOX:Inference label Jul 31, 2021

DefTruth added a commit that referenced this issue Jul 31, 2021

#9 fixed YOLOX inference error for non-square shape

4150622

DefTruth added a commit that referenced this issue Jul 31, 2021

#9 fixed YOLOX inference error for non-square shape

668aa5d

DefTruth added a commit that referenced this issue Jul 31, 2021

#9 fixed YOLOX inference error for non-square shape

c7eb55c

DefTruth added a commit that referenced this issue Aug 1, 2021

#9 fixed YOLOX inference error for non-square shape

d243458

DefTruth added a commit that referenced this issue Aug 1, 2021

fixed YOLOX inference error for non-square shape (#9)

ac23924

DefTruth mentioned this issue Aug 1, 2021

onnxruntime的gpu怎么支持？ #10

Closed

DefTruth closed this as completed Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yolox_nano速度问题 #9

yolox_nano速度问题 #9

1VeniVediVeci1 commented Jul 29, 2021

DefTruth commented Jul 30, 2021

1VeniVediVeci1 commented Jul 31, 2021

1VeniVediVeci1 commented Jul 31, 2021

DefTruth commented Jul 31, 2021

DefTruth commented Jul 31, 2021 •

edited

Loading

DefTruth commented Jul 31, 2021 •

edited

Loading

DefTruth commented Jul 31, 2021 •

edited

Loading

1VeniVediVeci1 commented Jul 31, 2021

xinsuinizhuan commented Aug 1, 2021

yolox_nano速度问题 #9

yolox_nano速度问题 #9

Comments

1VeniVediVeci1 commented Jul 29, 2021

DefTruth commented Jul 30, 2021

1VeniVediVeci1 commented Jul 31, 2021

1VeniVediVeci1 commented Jul 31, 2021

DefTruth commented Jul 31, 2021

DefTruth commented Jul 31, 2021 • edited Loading

DefTruth commented Jul 31, 2021 • edited Loading

DefTruth commented Jul 31, 2021 • edited Loading

1VeniVediVeci1 commented Jul 31, 2021

xinsuinizhuan commented Aug 1, 2021

DefTruth commented Jul 31, 2021 •

edited

Loading

DefTruth commented Jul 31, 2021 •

edited

Loading

DefTruth commented Jul 31, 2021 •

edited

Loading