support dynamic sequence length #320

RezaYazdaniAminabadi · 2020-08-15T00:05:18Z

I did some changes to the transformer kernel code to support sequence-lengths dynamically.

conglongli · 2020-08-15T00:19:42Z

Thanks Reza for implementing this feature so fast! Some background motivation: Since bing bert validation data has 512 sequence length, we couldn't calculate the validation loss during seq128 pretraining if we use deepspeed transformer kernel. This is because deepspeed transformer kernel at initialization remember a fixed sequence length. Reza, Elton and I discussed and thought that supporting dynamic sequence length would be quite useful not only for my experiments, and Reza implemented this feature. Reza included a unit test. I will test Reza's implementation in my pre-training experiments and let you know whether it works.

RezaYazdaniAminabadi · 2020-08-15T00:28:33Z

Thanks Conglong for notifying this nice feature :-) I hope it unblocks your testing and we can add this feature to the kernel.

eltonzheng · 2020-08-17T22:28:45Z

csrc/includes/dropout.h

@@ -18,6 +18,7 @@ class Dropout {
        }

        float RATIO() const { return training ? ratio : 0.0; }
+        inline void SetDim(uint32_t d) { dim = d; }


looks "batch" is useless in config, remove it?

yes I remove that

eltonzheng · 2020-08-17T22:29:34Z

csrc/includes/ds_transformer_cuda.h

@@ -121,11 +121,17 @@ class BertTransformerLayer {

    void SetIntermediateBuffers(uint8_t* attn_prob_dropout_mask_ptr,
                                uint8_t* attn_output_dropout_mask_ptr,
-                                uint8_t* layer_output_dropout_mask_ptr);
+                                uint8_t* layer_output_dropout_mask_ptr,
+                                T*,


better give each parameter a meaningful name.

eltonzheng · 2020-08-17T22:32:03Z

csrc/includes/gelu.h

@@ -28,14 +28,12 @@ class Gelu {
                            T* output,
                            cudaStream_t stream)
    {
-        launch_bias_gelu<T>(
-            input_buf, bias, output, _config.intermediate_size, bsz, _config.seq_length, stream);
+        launch_bias_gelu<T>(input_buf, bias, output, _config.intermediate_size, bsz, stream);


both "batch" and "seq_length" in config can be removed?

eltonzheng · 2020-08-17T22:38:44Z

csrc/includes/normalize_layer.h

+
+    inline void SetMean(T* mean)
+    {
+        if (!mean) { throw std::runtime_error("Normalize mean is null."); }


should we check config.use_mean here for consistence? or removing "use_mean", just check if mean is nullptr?

No, I cannot do that. These two mean differently, when the use_mean might be true or false based on the layer_norm inversion. However, here this just checks whether the mean is allocated from the outside. So, the SetMean function should not be called when we have normalize_invetible flag to true.

eltonzheng · 2020-08-17T22:40:30Z

csrc/transformer/cublas_wrappers.cu

@@ -34,7 +34,12 @@ int cublas_gemm_ex(cublasHandle_t handle,
                                         algo);

    if (status != CUBLAS_STATUS_SUCCESS) {
-        fprintf(stderr, "!!!! kernel execution error.\n");
+        fprintf(stderr,
+                "!!!! kernel execution error. (m: %d, n: %d, k: %d, error : %d) \n",


"error :" -> "error:"

below has several same cases.

eltonzheng · 2020-08-17T22:45:07Z

deepspeed/pt/deepspeed_cuda.py

+         layer_output_dropout_mask,
+         norm2_var,
+         norm2_mean,
+         norm3_var,


we have 2 norm_layers, suggest renaming all "norm2" or "norm3" to meaningful names.

great point! :-) I was about to do that a long time ago and I always forgot!

eltonzheng · 2020-08-17T22:46:19Z

csrc/includes/ds_transformer_cuda.h


    inline int GetBatchSize() const { return _batch_size; }
    inline int GetNumHeads() const { return _heads; }
    inline int GetSeqLength() const { return _seq_length; }
+
+    void SetSeqLength(int seq_len, int bsz);


is this used somewhere?

yes it is used in ds_transformer_cuda.cpp: https://github.com/microsoft/DeepSpeed/blob/reyazda/support_dynamic_seqlength/csrc/transformer/ds_transformer_cuda.cpp#L708

eltonzheng · 2020-08-17T22:47:27Z

csrc/includes/softmax.h

-    inline int GetSeqLength() const { return config_.seq_length; }
+    inline size_t GetSeqLength() const { return config_.seq_length; }
+
+    inline void SetSeqlen(size_t seq_len) { config_.seq_length = seq_len; }


suggest naming it as "SetSeqLength" to be consistent with "GetSeqLength".

HFadeel · 2020-09-17T00:50:07Z

It will be great if this could be merged. I 'm really excited about it

RezaYazdaniAminabadi · 2020-09-18T00:36:03Z

Hi @HFadeel ,

We are working on merging this soon.
Thanks for the interest :)

Reza

RezaYazdaniAminabadi · 2020-09-22T05:23:51Z

Hi @HFadeel

We have merged these changes into master branch now. Please check in the new changes for DeepSpeed to use the feature.

Thanks you.
Reza

* Staging compression library v1 (#314) * prototype * add sparse/row/head pruning * add bert test examples, not testing yet * rm moq * add deepspeed based glue example to test compression * add get/set attr * tested replacement module * Custimized Linear Layer Accuracy Checked without any compression technique * sparse pruning tested * head pruning tested * row pruning tested * enable act dy quantization * change l1 mask to buffer for better resume training * add final model saving helper function, only for sparse prunin now * tested sparse pruning resume training and final model saving * row pruning resume training and final saving checked * head pruning resuming training / final model saving * rm bert from deepspeed * restruct the code * add mixed-precision quantization support * add binary/ternary support * add weight quantization FP16 assert * add conv2d * add compression function * move config generation to deepspeed side, need elton to take a look * add activation quantization support * add sparse pruning support * add row pruning * add head pruning * add channel pruning * support matching patterns for module names * update * fix typo in fix_compression * add compression scheduler, rm the offset scheduler from MoQ * fix some errors in head pruning, support redudent clearning (naive version) * add dim-reduction redudent clearning * update linear layer * make cnn example work * add bn2d * fix bias issue * add static act quantization * support mpu row/colomn parallel linear layer * add skip_bias_add for mpu linear layers * make mpu compress work, remove_redundent is not tested yet * fix several small errors * add conv1d to linear converter function * add conv1d to linear converter function * add conv1d to linear converter function * make dy-act-quantization per-token or per-image * cleaning part of the code; more is coming * enable forward weight quantization which supports both FP32 and some tricky settings * update readme * Update README.md * naming cleaning * fix static activation loading issue * update parameter * Update utils.py fix a typo * fix typo * fix typo * replace expand_as with view * Zheweiyao/compression library (#304) * add forward weight quantization constraint * add quantize_weight_in_forward warning: a lot of features are not supported * offset 0 fixing * add forward weight quantization constraint * add quantize_weight_in_forward warning: a lot of features are not supported * offset 0 fixing * fix a small issue * omit bias if the model does not have bias * add contiguous to aviod memory issue * add scale associated to weight, so people can quantize the weight after training * add fix weight quantization, change name based on constant.py file * disable eigen-based MoQ * When a method is disable (enable: false), we do not need to initialize its related parameters * weight quantization cleaning * fix get_quantize_enabled missing problem * fix redundent cleaning issue, make sure we either get mask from related-module or we enable the method in config * sort the redundent cleaning step, so we always do quantization, then sparse pruning, then others * a lot of comment cleaning and args explanation * add args in config-json.md * fix format issue * fix quantization offset step=1 with FP16 optimizer * Zheweiyao/compression library from s1 (#305) * add binary/ternary support for FP32 training; this is used to resolve FP16 unstable extreme compression training * add embedding quantization support * Xiaoxia/compression library v1 (#307) * add layer reduction (Xiaoxia/Zhewei) * fixing bug for sym activation and clean layer reduction (Xiaoxia) * fixing compression initialization (Xiaoxia/Zhewei) * fix format issue (#310) * Xiaoxia/compression library v1 (#311) * add layer reduction * fixing bug for sym activation and clean layer reduction * fixingn compression initialization * pre-commit... * Zheweiyao/compression library from s1 (#312) * fix format issue * fix the accuracy mismatch after quantization cleaning * fix clean_model bug and add layer_reduction configuration Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Elton Zheng <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> * switch to deepspeed comm * dummy tutorial * improve config json * Zheweiyao/compression library based on s2 (#315) * change the name and merge layer reduction to init_compression * add conv1d to linear test unit, fix errors introduced by merging studient initialtization to init_compression * Update config-json.md * fix for cifar10 channel pruning * fix the block_eigenvalue is None bug * fix the block_eigenvalue is None bug * move compression-related constants and configs to compression * tutorial and json config Co-authored-by: Xiaoxia (Shirley) Wu <[email protected]> Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Elton Zheng <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: xiaoxiawu <[email protected]> Co-authored-by: xiaoxiawu <[email protected]>

RezaYazdaniAminabadi requested review from jeffra, conglongli and eltonzheng August 15, 2020 00:05

conglongli marked this pull request as draft August 16, 2020 00:57

conglongli marked this pull request as ready for review August 17, 2020 18:26

eltonzheng reviewed Aug 19, 2020

View reviewed changes

eltonzheng approved these changes Aug 19, 2020

View reviewed changes

jeffra requested review from arashashari, awan-10, cli99, minjiaz, niumanar, samyam, ShadenSmith and tjruwase as code owners August 31, 2020 19:25

Reza Yazdani and others added 8 commits September 11, 2020 17:05

supporting dynamic sequence length for transformer kernel

a308637

set sequence in more layers

0d3039b

fixing the strided-gemm bug

c896b5c

precommit

aad95c4

fixing cublas error

dc3e064

fixing parameters and naming

5963997

fixing the dynamic sequence-length with any arbitrary 8-aligned seq-len

00439bd

running precommit

643c33b

RezaYazdaniAminabadi force-pushed the reyazda/support_dynamic_seqlength branch from ea5841c to 643c33b Compare September 11, 2020 17:17

RezaYazdaniAminabadi closed this Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support dynamic sequence length #320

support dynamic sequence length #320

RezaYazdaniAminabadi commented Aug 15, 2020

conglongli commented Aug 15, 2020

RezaYazdaniAminabadi commented Aug 15, 2020

eltonzheng Aug 17, 2020

RezaYazdaniAminabadi Aug 19, 2020

eltonzheng Aug 17, 2020

eltonzheng Aug 17, 2020

eltonzheng Aug 17, 2020

RezaYazdaniAminabadi Aug 19, 2020

eltonzheng Aug 17, 2020

eltonzheng Aug 17, 2020

eltonzheng Aug 17, 2020

RezaYazdaniAminabadi Aug 19, 2020

eltonzheng Aug 17, 2020

RezaYazdaniAminabadi Aug 19, 2020 •

edited

Loading

eltonzheng Aug 17, 2020

HFadeel commented Sep 17, 2020

RezaYazdaniAminabadi commented Sep 18, 2020

RezaYazdaniAminabadi commented Sep 22, 2020

support dynamic sequence length #320

support dynamic sequence length #320

Conversation

RezaYazdaniAminabadi commented Aug 15, 2020

conglongli commented Aug 15, 2020

RezaYazdaniAminabadi commented Aug 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RezaYazdaniAminabadi Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HFadeel commented Sep 17, 2020

RezaYazdaniAminabadi commented Sep 18, 2020

RezaYazdaniAminabadi commented Sep 22, 2020

RezaYazdaniAminabadi Aug 19, 2020 •

edited

Loading