Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

打扰了,有些问题想向您请教一下 #68

Closed
Lemonononon opened this issue Dec 23, 2024 · 4 comments
Closed

打扰了,有些问题想向您请教一下 #68

Lemonononon opened this issue Dec 23, 2024 · 4 comments

Comments

@Lemonononon
Copy link

我在通过TensorRT(10.6) C++ api构建dynam batch的yolov5 engine遇到了一个问题。
下面这一段是yolov5早期版本(v1.0-v2.0的一个中间版本)的第一层;

class Focus(nn.Module):
    # Focus wh information into c-space
    def __init__(self, c1, c2, k=1):
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, 1)

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))

在C++中我的实现如下:

ILayer* addFocus( INetworkDefinition* network, std::map<std::string, Weights>& weightMap, ITensor& input, int input_h, int input_w, int output_c, int kernel_size, int stride, int g ){

    auto shape = input.getDimensions();

    ISliceLayer *s1 = network->addSlice(input, Dims4{0, 0, 0, 0}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s2 = network->addSlice(input, Dims4{0, 0, 1, 0}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s3 = network->addSlice(input, Dims4{0, 0, 0, 1}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ISliceLayer *s4 = network->addSlice(input, Dims4{0, 0, 1, 1}, Dims4{shape.d[0], 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});

    // ISliceLayer *s1 = network->addSlice(input, Dims4{0, 0, 0, 0}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s2 = network->addSlice(input, Dims4{0, 0, 1, 0}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s3 = network->addSlice(input, Dims4{0, 0, 0, 1}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    // ISliceLayer *s4 = network->addSlice(input, Dims4{0, 0, 1, 1}, Dims4{-1, 3, input_h / 2, input_w / 2}, Dims4{1, 1, 2, 2});
    ITensor* inputTensors[] = {s1->getOutput(0), s2->getOutput(0), s3->getOutput(0), s4->getOutput(0)};
    auto cat = network->addConcatenation(inputTensors, 4);

    return addConvBNLeaky( network, weightMap, *cat->getOutput(0), output_c, kernel_size, stride, g, "model.0.conv" );
}

现在的问题是,build engine时SliceLayer会报错:ITensor::getDimensions: Error Code 4: API Usage Error (Tensor (Unnamed Layer* 0) [Slice]_output has axis 0 with inherently negative length. Proven upper bound is -1. Network must have an instance where axis has non-negative length.) [E] [TRT] ITensor::getDimensions: Error Code 4: API Usage Error (Output shape can not be computed for node (Unnamed Layer* 0) [Slice].) 是因为数据的第一个维度是-1,然而想实现dynamic batch第一个维度值就应该设置为-1,随后查阅资料发现官方文档描述了trt slice的限制 link 见9.7。TensorRT官方仓库也有几条相关issue,但是nv方并没有给出解决方案。
还有一个事实:通过pt->onnx->trt,是可以实现dynamic batch的,即 trtexec --onnx=Petrichor-Rbc-detect-v3.0-20240918.onnx --minShapes=input:1x3x640x640 --optShapes=input:60x3x640x640 --maxShapes=input:100x3x640x640 onnx,观察onnx结构可以看出pt到onnx时,是用了6个slice代替python中的操作,但是onnx->trt的实现我就不太了解。
想问一下您对这个问题有没有什么解决思路,谢谢!

@lix19937
Copy link
Owner

lix19937 commented Dec 28, 2024

4 slices + concat (EE + OE + EO + OO) equal reshape + permute

image

see follow

from loguru import logger as LOG
import torch

# original  
def img_slice(img_feature):
  B,C,H,W =img_feature.shape
  la = img_feature[:,:,0::2, 0::2 ] # E E   H W
  lb = img_feature[:,:,0::2, 1::2 ] # E O

  lc = img_feature[:,:,1::2, 0::2 ] # O E
  ld = img_feature[:,:,1::2, 1::2 ] # O O
  m = torch.cat((la, lc, lb, ld), dim=1)
  return m

# equivalent_transformation
def img_slice_convert():  
  img_feature = torch.arange(0, 16).view(4,4)
  H, W = img_feature.shape 
  a = img_feature.view(H//2, 2, W//2, 2)

  LOG.info("--0-->>\n{}".format(a))
  LOG.info("--1-->>\n{}".format(a.permute(2, 3, 0, 1)))
  LOG.info("--2-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0)))  
  LOG.info("--3-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)))  
  
  v1 = a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)
  
  # permute obey merge rule  
  v2 = a.permute(1, 3, 0, 2).permute(1, 0, 2, 3)
  
  # further merge  
  v3 = a.permute(3, 1, 0, 2)
    
  if not (torch.equal(v1, v2) and torch.equal(v1, v3)):         
    LOG.info("fatal, not reach here !"); exit(1)  
  
  B = 1
  C = 1
  e = v1.reshape(B, C*4, H//2, W//2) 

@Lemonononon
Copy link
Author

Thanky you very much!
What you said is correct. Using permute(0,3,2,1) and reshape can achieve the same effect as Focus. However, building permute (shuffle layer) and reshape layer through TensorRT's API encounters the same issue as the slice layer (they do not accept dimensions represented by negative values). So, my real problem is similar to the issue described here: NVIDIA/TensorRT#3480. At build time, addSlice throws an error because the value of shape.d[0] is -1. I tried to create the simplest case as follows:

    IRuntime* runtime = nullptr;
    ICudaEngine* engine = nullptr;
    IExecutionContext* context = nullptr;

    IBuilder* builder = createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();

    INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

    ITensor* input = network->addInput("input", DataType::kFLOAT, Dims4{-1, 3, 960, 960});

    auto shape = input->getDimensions();

    ISliceLayer *s1 = network->addSlice(*input, Dims4{0, 0, 0, 0}, Dims4{shape.d[0], 3, 960 / 2, 960 / 2}, Dims4{1, 1, 2, 2});

    s1->getOutput(0)->setName("output");
    network->markOutput(*s1->getOutput(0));

    auto profile = builder->createOptimizationProfile();
    
    profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{1, 3, 960, 960});
    profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{4, 3, 960, 960});
    profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{16, 3, 960, 960});
    config->addOptimizationProfile(profile);

    config->setFlag(BuilderFlag::kFP16);
    auto engine = builder->buildSerializedNetwork(*network, *config);

@Lemonononon
Copy link
Author

I solved this problem using the following code:

INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));

ITensor* data = network->addInput(kInputTensorName, dt, Dims4{-1, 3, kInputH, kInputW});

auto sliceLayer = network->addSlice(*data, Dims4{0, 0, 0, 0}, Dims4{-1, 3, kInputH / 2, kInputW / 2}, Dims4{1, 1, 2, 2});

// auto sliceSize = network->getInput(0)->getDimensions();
auto shape = network->addShape(*network->getInput(0))->getOutput(0);

auto shapeInt32Layer = network->addIdentity(*shape);
shapeInt32Layer->setOutputType(0, DataType::kINT32);
auto shapeInt32 = shapeInt32Layer->getOutput(0);


int32_t subSliceValue[4] = {0, 0, kInputH/2, kInputW/2};
Weights subSliceWeight{DataType::kINT32, subSliceValue, 4};

auto constLayer = network->addConstant(Dims{1, {4}}, subSliceWeight);

auto elementLayer = network->addElementWise(*shapeInt32, *constLayer->getOutput(0), ElementWiseOperation::kSUB);

auto newShape = elementLayer->getOutput(0);

sliceLayer->setInput(2, *newShape);
sliceLayer->getOutput(0)->setName(kOutputTensorName);

network->markOutput(*sliceLayer->getOutput(0));

auto profile = builder->createOptimizationProfile();

profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{minBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{optBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{maxBatchSize, 3, kInputH, kInputW});

// builder->setMaxBatchSize(maxBatchSize);
config->addOptimizationProfile(profile);
config->setFlag(BuilderFlag::kFP16);
auto engine = builder->buildSerializedNetwork(*network, *config);

return engine;

The key to the problem lies in the fact that when working with dynamic shapes, the ISliceLayer in runtime requires calling set_input to specify the start, size, and stride parameters based on the actual input shapes. Anyway, thank you very much!

@lix19937
Copy link
Owner

lix19937 commented Jan 1, 2025

Good, but slice op usually not the best choice, poor performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants