diff --git a/docs/ContribOperators.md b/docs/ContribOperators.md
index f01a7ab14a61e..f0543f2649205 100644
--- a/docs/ContribOperators.md
+++ b/docs/ContribOperators.md
@@ -6,6 +6,7 @@ Do not modify directly.*
* com.microsoft.Attention
* com.microsoft.AttnLSTM
* com.microsoft.BeamSearch
+ * com.microsoft.BiasAdd
* com.microsoft.BiasDropout
* com.microsoft.BiasGelu
* com.microsoft.BiasSoftmax
@@ -468,6 +469,40 @@ This version of the operator has been available since version 1 of the 'com.micr
+### **com.microsoft.BiasAdd**
+
+ Add input with bias, then add residual inputs.
+
+#### Version
+
+This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
+
+#### Inputs
+
+
+- X : T
+- Input tensor. Dimensions are (N, S, C), where N is the batch size, S is image size H*W, and C is number of channels
+- bias : T
+- Bias tensor. Dimensions are (C)
+- skip : T
+- Residual tensor. Dimensions are (N, S, C)
+
+
+#### Outputs
+
+
+- Y : T
+- The output tensor with dimensions (N, S, C)
+
+
+#### Type Constraints
+
+
+- T : tensor(float16), tensor(float)
+- Constrain input and output types to float tensors.
+
+
+
### **com.microsoft.BiasDropout**
output, dropout_mask = Dropout(data + bias, ratio) + residual, Intended to specialize the dropout pattern commonly found in transformer models.
diff --git a/docs/OperatorKernels.md b/docs/OperatorKernels.md
index 00b71d2946215..08178f206568e 100644
--- a/docs/OperatorKernels.md
+++ b/docs/OperatorKernels.md
@@ -787,6 +787,7 @@ Do not modify directly.*
|**Operator Domain:** *com.microsoft*||||
|Attention|*in* input:**T**
*in* weights:**T**
*in* bias:**T**
*in* mask_index:**M**
*in* past:**T**
*in* relative_position_bias:**T**
*in* past_sequence_length:**M**
*out* output:**T**
*out* present:**T**|1+|**T** = tensor(float), tensor(float16)|
|BeamSearch|*in* input_ids:**I**
*in* max_length:**I**
*in* min_length:**I**
*in* num_beams:**I**
*in* num_return_sequences:**I**
*in* length_penalty:**T**
*in* repetition_penalty:**T**
*in* vocab_mask:**M**
*in* prefix_vocab_mask:**M**
*in* attention_mask:**I**
*out* sequences:**I**
*out* sequences_scores:**T**
*out* scores:**T**|1+|**T** = tensor(float), tensor(float16)|
+|BiasAdd|*in* X:**T**
*in* bias:**T**
*in* skip:**T**
*out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|BiasDropout|*in* data:**T**
*in* bias:**T**
*in* residual:**T**
*in* ratio:**T1**
*in* training_mode:**T2**
*out* output:**T**
*out* mask:**T2**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)
**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)
**T2** = tensor(bool)|
|BiasGelu|*in* A:**T**
*in* B:**T**
*out* C:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|BiasSoftmax|*in* data:**T**
*in* bias:**T**
*out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|