padding the length of input for vit_attention #45506

fengxiaoshuai · 2022-08-29T05:58:05Z

PR types

Others

PR changes

Others

Describe

当attention的输入length不是8的整数倍时，fp16的性能很差，这里对multihead plugin的输入进行padding，对于vit_384模型，batch=1时，时间由13.5ms降低到10.5ms

paddle-bot · 2022-08-29T05:58:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

b3602sss · 2022-08-29T12:26:53Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

@@ -310,6 +349,11 @@ int QkvToContextPluginDynamic::enqueue(
  // input[0], (B, S, 3 * N * H, 1, 1)
  int batch = input_dims.d[0];
  int seq_len = input_dims.d[1];
+  int real_seq_len = seq_len;
+  if (input_desc[0].type == nvinfer1::DataType::kHALF) {


注释下fp16需要pading

注释下fp16需要pading

好的

b3602sss · 2022-08-29T12:27:38Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

+__global__ void reset_qk_bias(T *input, int real_seq_len, int seq_len) {
+  if (threadIdx.x < seq_len) {
+    int id = threadIdx.x + blockIdx.x * seq_len;
+    input[id] = threadIdx.x >= real_seq_len ? (T)-1e20f : (T)0.0f;


-1e20f 注意低精度下的表示能力

b3602sss · 2022-08-29T12:28:19Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

+    if (ProductDim(input_desc[1].dims) == ProductDim(input_desc[0].dims)) {
+      qk_bias = reinterpret_cast<float *>(workspace);
+      auto size = batch * head_number_ * seq_len * seq_len;
+      cudaMemset(qk_bias, 0, sizeof(float) * size);


memsetasync下面几个调用都一样

b3602sss · 2022-08-29T12:34:00Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

@@ -373,6 +423,35 @@ int QkvToContextPluginDynamic::enqueue(
  } else if (input_type == nvinfer1::DataType::kHALF) {
 #ifdef TRT_PLUGIN_FP16_AVALIABLE
    VLOG(1) << "TRT Plugin DataType selected. QkvToContext-->fp16";
+    int *padding_offset = nullptr;
+    half *padding_input = nullptr;
+    framework::Tensor padding_offset_tensor;


改成workspace或者成员变量避免显存分配

b3602sss · 2022-08-29T12:37:55Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

+          0,
+          sizeof(half) * batch * seq_len * 3 * head_number_ * head_size_);
+
+      set_padding_offset<<<1, 1, 0, stream>>>(


可以再提升下并发

可以再提升下并发
好的

b3602sss · 2022-08-29T12:38:35Z

python/paddle/fluid/tests/unittests/ir/inference/test_trt_convert_multihead_matmul.py

@@ -1105,6 +1113,9 @@ def generate_trt_nodes_num():
        self.trt_param.precision = paddle_infer.PrecisionType.Half
        yield self.create_inference_config(), generate_trt_nodes_num(), (1e-3,
                                                                         1e-3)
+        self.trt_param.precision = paddle_infer.PrecisionType.Float32
+        yield self.create_inference_config(), generate_trt_nodes_num(), (1e-3,
+                                                                         1e-3)


fp32的精度应该可以高点

b3602sss · 2022-09-02T03:38:47Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

@@ -342,6 +427,12 @@ int QkvToContextPluginDynamic::enqueue(
          head_number_);
      qk_bias = temp_qk_bias;
    }
+    // fake qk_bias
+    if (ProductDim(input_desc[1].dims) == ProductDim(input_desc[0].dims)) {


config的时候就可以确定不用每次enque判断.下面几个也一样

config的时候就可以确定不用每次enque判断.下面几个也一样

好的，和下面memset的统一放到configure处理

b3602sss · 2022-09-02T03:38:56Z

paddle/fluid/inference/tensorrt/plugin/qkv_to_context_plugin.cu

+    if (ProductDim(input_desc[1].dims) == ProductDim(input_desc[0].dims)) {
+      qk_bias = reinterpret_cast<float *>(workspace);
+      auto size = batch * head_number_ * seq_len * seq_len;
+      cudaMemset(qk_bias, 0, sizeof(float) * size);


async接口

vit_384_opt

f65a35a

just support trt8

95321c7

b3602sss reviewed Aug 29, 2022

View reviewed changes

padding + unpadding

f1a5717

fengxiaoshuai changed the title ~~vit_384_opt~~ vit_attention_length_padding Sep 2, 2022

fengshuai added 3 commits September 2, 2022 02:54

fix:unit test

1cd73e1

refactor:padding

3c71afc

fix: change the position of round_up

fba3dbe

b3602sss reviewed Sep 2, 2022

View reviewed changes

refactor: delete workspace

b02924f

fengxiaoshuai changed the title ~~vit_attention_length_padding~~ padding the length of input for vit_attention Sep 2, 2022

b3602sss approved these changes Sep 2, 2022

View reviewed changes

b3602sss merged commit f79be65 into PaddlePaddle:develop Sep 2, 2022

fengxiaoshuai deleted the new_vit_382_opt branch October 8, 2022 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

padding the length of input for vit_attention #45506

padding the length of input for vit_attention #45506

fengxiaoshuai commented Aug 29, 2022 •

edited

Loading

paddle-bot bot commented Aug 29, 2022

b3602sss Aug 29, 2022

fengxiaoshuai Aug 30, 2022

b3602sss Aug 29, 2022

b3602sss Aug 29, 2022

b3602sss Aug 29, 2022

b3602sss Aug 29, 2022

fengxiaoshuai Aug 29, 2022

b3602sss Aug 29, 2022

b3602sss Sep 2, 2022

fengxiaoshuai Sep 2, 2022

b3602sss Sep 2, 2022

padding the length of input for vit_attention #45506

padding the length of input for vit_attention #45506

Conversation

fengxiaoshuai commented Aug 29, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Aug 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fengxiaoshuai commented Aug 29, 2022 •

edited

Loading