-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
打扰了,有些问题想向您请教一下 #68
Comments
see follow from loguru import logger as LOG
import torch
# original
def img_slice(img_feature):
B,C,H,W =img_feature.shape
la = img_feature[:,:,0::2, 0::2 ] # E E H W
lb = img_feature[:,:,0::2, 1::2 ] # E O
lc = img_feature[:,:,1::2, 0::2 ] # O E
ld = img_feature[:,:,1::2, 1::2 ] # O O
m = torch.cat((la, lc, lb, ld), dim=1)
return m
# equivalent_transformation
def img_slice_convert():
img_feature = torch.arange(0, 16).view(4,4)
H, W = img_feature.shape
a = img_feature.view(H//2, 2, W//2, 2)
LOG.info("--0-->>\n{}".format(a))
LOG.info("--1-->>\n{}".format(a.permute(2, 3, 0, 1)))
LOG.info("--2-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0)))
LOG.info("--3-->>\n{}".format(a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)))
v1 = a.permute(2, 3, 0, 1).permute(3, 1, 2, 0).permute(1, 0, 2, 3)
# permute obey merge rule
v2 = a.permute(1, 3, 0, 2).permute(1, 0, 2, 3)
# further merge
v3 = a.permute(3, 1, 0, 2)
if not (torch.equal(v1, v2) and torch.equal(v1, v3)):
LOG.info("fatal, not reach here !"); exit(1)
B = 1
C = 1
e = v1.reshape(B, C*4, H//2, W//2)
|
Thanky you very much! IRuntime* runtime = nullptr;
ICudaEngine* engine = nullptr;
IExecutionContext* context = nullptr;
IBuilder* builder = createInferBuilder(gLogger);
IBuilderConfig* config = builder->createBuilderConfig();
INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
ITensor* input = network->addInput("input", DataType::kFLOAT, Dims4{-1, 3, 960, 960});
auto shape = input->getDimensions();
ISliceLayer *s1 = network->addSlice(*input, Dims4{0, 0, 0, 0}, Dims4{shape.d[0], 3, 960 / 2, 960 / 2}, Dims4{1, 1, 2, 2});
s1->getOutput(0)->setName("output");
network->markOutput(*s1->getOutput(0));
auto profile = builder->createOptimizationProfile();
profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{1, 3, 960, 960});
profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{4, 3, 960, 960});
profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{16, 3, 960, 960});
config->addOptimizationProfile(profile);
config->setFlag(BuilderFlag::kFP16);
auto engine = builder->buildSerializedNetwork(*network, *config); |
I solved this problem using the following code: INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
ITensor* data = network->addInput(kInputTensorName, dt, Dims4{-1, 3, kInputH, kInputW});
auto sliceLayer = network->addSlice(*data, Dims4{0, 0, 0, 0}, Dims4{-1, 3, kInputH / 2, kInputW / 2}, Dims4{1, 1, 2, 2});
// auto sliceSize = network->getInput(0)->getDimensions();
auto shape = network->addShape(*network->getInput(0))->getOutput(0);
auto shapeInt32Layer = network->addIdentity(*shape);
shapeInt32Layer->setOutputType(0, DataType::kINT32);
auto shapeInt32 = shapeInt32Layer->getOutput(0);
int32_t subSliceValue[4] = {0, 0, kInputH/2, kInputW/2};
Weights subSliceWeight{DataType::kINT32, subSliceValue, 4};
auto constLayer = network->addConstant(Dims{1, {4}}, subSliceWeight);
auto elementLayer = network->addElementWise(*shapeInt32, *constLayer->getOutput(0), ElementWiseOperation::kSUB);
auto newShape = elementLayer->getOutput(0);
sliceLayer->setInput(2, *newShape);
sliceLayer->getOutput(0)->setName(kOutputTensorName);
network->markOutput(*sliceLayer->getOutput(0));
auto profile = builder->createOptimizationProfile();
profile->setDimensions(kInputTensorName, OptProfileSelector::kMIN, Dims4{minBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kOPT, Dims4{optBatchSize, 3, kInputH, kInputW});
profile->setDimensions(kInputTensorName, OptProfileSelector::kMAX, Dims4{maxBatchSize, 3, kInputH, kInputW});
// builder->setMaxBatchSize(maxBatchSize);
config->addOptimizationProfile(profile);
config->setFlag(BuilderFlag::kFP16);
auto engine = builder->buildSerializedNetwork(*network, *config);
return engine; The key to the problem lies in the fact that when working with dynamic shapes, the ISliceLayer in runtime requires calling set_input to specify the start, size, and stride parameters based on the actual input shapes. Anyway, thank you very much! |
Good, but slice op usually not the best choice, poor performance. |
我在通过TensorRT(10.6) C++ api构建dynam batch的yolov5 engine遇到了一个问题。
下面这一段是yolov5早期版本(v1.0-v2.0的一个中间版本)的第一层;
在C++中我的实现如下:
现在的问题是,build engine时SliceLayer会报错:ITensor::getDimensions: Error Code 4: API Usage Error (Tensor (Unnamed Layer* 0) [Slice]_output has axis 0 with inherently negative length. Proven upper bound is -1. Network must have an instance where axis has non-negative length.) [E] [TRT] ITensor::getDimensions: Error Code 4: API Usage Error (Output shape can not be computed for node (Unnamed Layer* 0) [Slice].) 是因为数据的第一个维度是-1,然而想实现dynamic batch第一个维度值就应该设置为-1,随后查阅资料发现官方文档描述了trt slice的限制 link 见9.7。TensorRT官方仓库也有几条相关issue,但是nv方并没有给出解决方案。
还有一个事实:通过pt->onnx->trt,是可以实现dynamic batch的,即
trtexec --onnx=Petrichor-Rbc-detect-v3.0-20240918.onnx --minShapes=input:1x3x640x640 --optShapes=input:60x3x640x640 --maxShapes=input:100x3x640x640
onnx,观察onnx结构可以看出pt到onnx时,是用了6个slice代替python中的操作,但是onnx->trt的实现我就不太了解。想问一下您对这个问题有没有什么解决思路,谢谢!
The text was updated successfully, but these errors were encountered: