打扰了，有些问题想向您请教一下 #68

Lemonononon · 2024-12-23T10:08:48Z

我在通过TensorRT（10.6） C++ api构建dynam batch的yolov5 engine遇到了一个问题。
下面这一段是yolov5早期版本（v1.0-v2.0的一个中间版本）的第一层；

class Focus(nn.Module):
    # Focus wh information into c-space
    def __init__(self, c1, c2, k=1):
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, 1)

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))

在C++中我的实现如下：

ILayer* addFocus( INetworkDefinition* network, std::map<std::string, Weights>& weightMap, ITensor& input, int input_h, int input_w, int output_c, int kernel_size, int stride, int g ){

    auto shape = input.getDimensions();

    ISliceLayer *s1 = network->addSlice(input, Dims4{0, 0, 0, 0}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s2 = network->addSlice(input, Dims4{0, 0, 1, 0}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s3 = network->addSlice(input, Dims4{0, 0, 0, 1}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s4 = network->addSlice(input, Dims4{0, 0, 1, 1}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});

    // ISliceLayer *s1 = network->addSlice(input, Dims4{0, 0, 0, 0}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s2 = network->addSlice(input, Dims4{0, 0, 1, 0}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s3 = network->addSlice(input, Dims4{0, 0, 0, 1}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s4 = network->addSlice(input, Dims4{0, 0, 1, 1}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ITensor* inputTensors[] = {s1->getOutput(0), s2->getOutput(0), s3->getOutput(0), s4->getOutput(0)};
    auto cat = network->addConcatenation(inputTensors, 4);

    return addConvBNLeaky( network, weightMap, *cat->getOutput(0), output_c, kernel_size, stride, g, "model.0.conv" );
}

现在的问题是，build engine时SliceLayer会报错：ITensor::getDimensions: Error Code 4: API Usage Error (Tensor (Unnamed Layer* 0) [Slice]_output has axis 0 with inherently negative length. Proven upper bound is -1. Network must have an instance where axis has non-negative length.) [E] [TRT] ITensor::getDimensions: Error Code 4: API Usage Error (Output shape can not be computed for node (Unnamed Layer* 0) [Slice].) 是因为数据的第一个维度是-1，然而想实现dynamic batch第一个维度值就应该设置为-1，随后查阅资料发现官方文档描述了trt slice的限制 link 见9.7。TensorRT官方仓库也有几条相关issue，但是nv方并没有给出解决方案。
还有一个事实：通过pt->onnx->trt，是可以实现dynamic batch的，即 trtexec --onnx=Petrichor-Rbc-detect-v3.0-20240918.onnx --minShapes=input:1x3x640x640 --optShapes=input:60x3x640x640 --maxShapes=input:100x3x640x640 onnx，观察onnx结构可以看出pt到onnx时，是用了6个slice代替python中的操作，但是onnx->trt的实现我就不太了解。
想问一下您对这个问题有没有什么解决思路，谢谢！

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-12-28T08:11:17Z

4 slices + concat (EE + OE + EO + OO) equal reshape + permute

see follow

from loguru import logger as LOG
import torch

# original  
def img_slice(img_feature):
  B,C,H,W =img_feature.shape
  la = img_feature[:,:,0::2, 0::2 ] # E E   H W
  lb = img_feature[:,:,0::2, 1::2 ] # E O

  lc = img_feature[:,:,1::2, 0::2 ] # O E
  ld = img_feature[:,:,1::2, 1::2 ] # O O
  m = torch.cat((la, lc, lb, ld), dim=1)
  return m

# equivalent_transformation
def img_slice_convert():  
  img_feature = torch.arange(0, 16).view(4,4)
  H, W = img_feature.shape 
  a = img_feature.view(H//2, 2, W//2, 2)

  LOG.info("--0-->>\n{}".format(a))
  LOG.info("--1-->>\n{}".format(a.permute(2, 3, 0, 1)))
  LOG.info("--2-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0)))  
  LOG.info("--3-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)))  
  
  v1 = a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)
  
  # permute obey merge rule  
  v2 = a.permute(1, 3, 0, 2).permute(1, 0, 2, 3)
  
  # further merge  
  v3 = a.permute(3, 1, 0, 2)
    
  if not (torch.equal(v1, v2) and torch.equal(v1, v3)):         
    LOG.info("fatal, not reach here !"); exit(1)  
  
  B = 1
  C = 1
  e = v1.reshape(B, C*4, H//2, W//2)

Lemonononon · 2024-12-28T18:11:27Z

Thanky you very much!
What you said is correct. Using permute(0,3,2,1) and reshape can achieve the same effect as Focus. However, building permute (shuffle layer) and reshape layer through TensorRT's API encounters the same issue as the slice layer (they do not accept dimensions represented by negative values). So, my real problem is similar to the issue described here: NVIDIA/TensorRT#3480. At build time, addSlice throws an error because the value of shape.d[0] is -1. I tried to create the simplest case as follows:

    IRuntime* runtime = nullptr;
    ICudaEngine* engine = nullptr;
    IExecutionContext* context = nullptr;

    IBuilder* builder = createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();

    INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

    ITensor* input = network->addInput("input", DataType::kFLOAT, Dims4{-1, 3, 960, 960});

    auto shape = input->getDimensions();

    ISliceLayer *s1 = network->addSlice(*input, Dims4{0, 0, 0, 0}, Dims4{shape.d[0], 3, 960 / 2, 960 / 2}, Dims4{1, 1, 2, 2});

    s1->getOutput(0)->setName("output");
    network->markOutput(*s1->getOutput(0));

    auto profile = builder->createOptimizationProfile();
    
    profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{1, 3, 960, 960});
    profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{4, 3, 960, 960});
    profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{16, 3, 960, 960});
    config->addOptimizationProfile(profile);

    config->setFlag(BuilderFlag::kFP16);
    auto engine = builder->buildSerializedNetwork(*network, *config);

Lemonononon · 2024-12-30T08:34:44Z

I solved this problem using the following code:

INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

ITensor* data = network->addInput(kInputTensorName, dt, Dims4{-1, 3, kInputH, kInputW});

auto sliceLayer = network->addSlice(*data, Dims4{0, 0, 0, 0}, Dims4{-1, 3, kInputH / 2, kInputW / 2}, Dims4{1, 1, 2, 2});

// auto sliceSize = network->getInput(0)->getDimensions();
auto shape = network->addShape(*network->getInput(0))->getOutput(0);

auto shapeInt32Layer = network->addIdentity(*shape);
shapeInt32Layer->setOutputType(0, DataType::kINT32);
auto shapeInt32 = shapeInt32Layer->getOutput(0);


int32_t subSliceValue[4] = {0, 0, kInputH/2, kInputW/2};
Weights subSliceWeight{DataType::kINT32, subSliceValue, 4};

auto constLayer = network->addConstant(Dims{1, {4}}, subSliceWeight);

auto elementLayer = network->addElementWise(*shapeInt32, *constLayer->getOutput(0), ElementWiseOperation::kSUB);

auto newShape = elementLayer->getOutput(0);

sliceLayer->setInput(2, *newShape);
sliceLayer->getOutput(0)->setName(kOutputTensorName);

network->markOutput(*sliceLayer->getOutput(0));

auto profile = builder->createOptimizationProfile();

profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{minBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{optBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{maxBatchSize, 3, kInputH, kInputW});

// builder->setMaxBatchSize(maxBatchSize);
config->addOptimizationProfile(profile);
config->setFlag(BuilderFlag::kFP16);
auto engine = builder->buildSerializedNetwork(*network, *config);

return engine;

The key to the problem lies in the fact that when working with dynamic shapes, the ISliceLayer in runtime requires calling set_input to specify the start, size, and stride parameters based on the actual input shapes. Anyway, thank you very much!

lix19937 · 2025-01-01T14:41:26Z

Good, but slice op usually not the best choice, poor performance.

Lemonononon closed this as completed Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

打扰了，有些问题想向您请教一下 #68

打扰了，有些问题想向您请教一下 #68

Lemonononon commented Dec 23, 2024

lix19937 commented Dec 28, 2024 •

edited

Loading

Lemonononon commented Dec 28, 2024

Lemonononon commented Dec 30, 2024

lix19937 commented Jan 1, 2025

打扰了，有些问题想向您请教一下 #68

打扰了，有些问题想向您请教一下 #68

Comments

Lemonononon commented Dec 23, 2024

lix19937 commented Dec 28, 2024 • edited Loading

Lemonononon commented Dec 28, 2024

Lemonononon commented Dec 30, 2024

lix19937 commented Jan 1, 2025

lix19937 commented Dec 28, 2024 •

edited

Loading