Skip to content

Support output cross qk in masked decoder multihead attention kernel #18614

Support output cross qk in masked decoder multihead attention kernel

Support output cross qk in masked decoder multihead attention kernel #18614

Triggered via push September 19, 2023 23:50
Status Success
Total duration 57s
Artifacts
This run and associated checks have been archived and are scheduled for deletion. Learn more about checks retention
Validation
48s
Validation
Fit to window
Zoom out
Zoom in

Annotations

137 warnings
[cpplint] onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_whisper.h#L141: onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_whisper.h#L141
Using C-style cast. Use reinterpret_cast<void*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cpu/transformers/logits_processor.h#L263: onnxruntime/contrib_ops/cpu/transformers/logits_processor.h#L263
Add #include <limits> for numeric_limits<> [build/include_what_you_use] [4]
[cpplint] onnxruntime/contrib_ops/cuda/bert/attention.cc#L225: onnxruntime/contrib_ops/cuda/bert/attention.cc#L225
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/contrib_ops/cuda/bert/attention.cc#L252: onnxruntime/contrib_ops/cuda/bert/attention.cc#L252
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/contrib_ops/cuda/bert/attention.h#L7: onnxruntime/contrib_ops/cuda/bert/attention.h#L7
<mutex> is an unapproved C++11 header. [build/c++11] [5]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L8: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L8
Found C++ system header after other header. Should be: dynamic_time_warping.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L9: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L9
Found C++ system header after other header. Should be: dynamic_time_warping.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L11: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.cc#L11
Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.h#L6: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping.h#L6
Found C system header after other header. Should be: dynamic_time_warping.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L7: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L7
Found C system header after other header. Should be: dynamic_time_warping_impl.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L8: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L8
Found C++ system header after other header. Should be: dynamic_time_warping_impl.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L10: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L10
Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L46: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L46
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L48: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L48
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L49: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L49
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L75: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L75
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L95: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L95
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L96: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L96
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L97: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L97
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L98: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L98
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L113: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L113
Using C-style cast. Use reinterpret_cast<int32_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L114: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L114
Using C-style cast. Use reinterpret_cast<size_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L115: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L115
Using C-style cast. Use reinterpret_cast<float*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L116: onnxruntime/contrib_ops/cuda/tensor/dynamic_time_warping_impl.cu#L116
Using C-style cast. Use reinterpret_cast<int8_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L8: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L8
Found C++ system header after other header. Should be: unfold.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L9: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L9
Found C++ system header after other header. Should be: unfold.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L11: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L11
Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L36: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L36
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L36: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L36
Add #include <functional> for multiplies<> [build/include_what_you_use] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L39: onnxruntime/contrib_ops/cuda/tensor/unfold.cc#L39
Add #include <algorithm> for copy [build/include_what_you_use] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold.h#L6: onnxruntime/contrib_ops/cuda/tensor/unfold.h#L6
Found C system header after other header. Should be: unfold.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L7: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L7
Found C system header after other header. Should be: unfold_impl.h, c system, c++ system, other. [build/include_order] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L9: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L9
Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L20: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L20
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L21: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L21
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L21: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L21
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L38: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L38
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L75: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L75
Using C-style cast. Use reinterpret_cast<int8_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L80: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L80
Using C-style cast. Use reinterpret_cast<int16_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L85: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L85
Using C-style cast. Use reinterpret_cast<int32_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L90: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L90
Using C-style cast. Use reinterpret_cast<int64_t*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L95: onnxruntime/contrib_ops/cuda/tensor/unfold_impl.cu#L95
Using C-style cast. Use reinterpret_cast<float4*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1362: onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1362
Using C-style cast. Use reinterpret_cast<CudaT*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1363: onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1363
Using C-style cast. Use reinterpret_cast<CudaT**>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1440: onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1440
Using C-style cast. Use reinterpret_cast<CudaT*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1524: onnxruntime/contrib_ops/cuda/transformers/generation_cuda_impl.cu#L1524
Using C-style cast. Use reinterpret_cast<CudaT*>(...) instead [readability/casting] [4]
[cpplint] onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc#L463: onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc#L463
Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L502: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L502
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L503: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L503
At least two spaces is best between code and comments [whitespace/comments] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1139: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1139
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1140: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1140
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1163: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1163
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1174: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1174
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1176: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1176
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1195: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1195
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1196: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1196
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1199: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1199
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1200: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1200
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1211: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1211
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1212: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1212
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1213: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1213
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1214: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1214
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1217: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1217
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1218: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1218
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1219: onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L1219
Lines should be <= 120 characters long [whitespace/line_length] [2]
[cpplint] onnxruntime/test/contrib_ops/dynamic_time_warping_op_test.cc#L9: onnxruntime/test/contrib_ops/dynamic_time_warping_op_test.cc#L9
Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]