[Question] Padding in ZeRO optimizer #487

e-sha · 2020-10-27T13:27:55Z

Why do you use padding here, here in ZeRO optimizer?
Moreover, later you split tensors as if it was not padded. For example, here.
From my perspective padding is not necessary at all.

tjruwase · 2020-10-28T22:26:16Z

We pad the tensors to align to the data parallel size to ensure load-balancing across the ranks.

The tensor splitting here is on the padded tensors.

e-sha · 2020-10-30T11:52:07Z

According to this line the optimizer can split tensor that is not aligned. Thus, padding seems to be an unnecessary feature that just makes code more sophisticated and harder to maintain.

tjruwase · 2020-10-30T17:52:42Z

Apologies for the confusion, but that line is old and redundant since the input to the function is the padded tensor. In addition to load-balancing, padding helps to handle cases where the tensor.numel() is smaller than GPU count, which although is a rare case but one that we have seen and led to this unit test.

tjruwase · 2020-11-24T06:55:27Z

Closing for lack of activity. Please reopen as needed.

* Merge chatgpt v2 to v3 - finalized (#484) * [squash] staging chatgpt v1 (#463) Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Tunji Ruwase <[email protected]> * [partial] formatting fixes * quantizer fixes * fix for bert tests * formatting fixes * re-enable _param_slice_mappings in z2 * Enable the QKV requires_grad when in training mode (#466) Co-authored-by: Jeff Rasley <[email protected]> * fixes for attention enable_training flag * commit to trigger CI * fix for distil-bert param * fixes for training context errors * remove reza's qkv-optimization (#469) Co-authored-by: Jeff Rasley <[email protected]> * Chatgpt - Fuse lora params at HybridEngine (#472) Co-authored-by: Jeff Rasley <[email protected]> * add option to enable non-pin mode (#473) * Chatgpt - fuse lora non pinned case (#474) * Fix fuse/unfuse lora for Z3 and non-pinned parameter * unfuse_lora_weight for non-pinned case * fix the multiple issue for lora parameters * formatting * fuse lora only when available --------- Co-authored-by: Jeff Rasley <[email protected]> * Chatgpt/release inference cache (#475) * Fix fuse/unfuse lora for Z3 and non-pinned parameter * unfuse_lora_weight for non-pinned case * release/retake the inference cache after/before generate * remove duplicated _fuse_lora function * fix formatting * fix hybrid-engine config issue * update formatting * Chatgpt - fuse qkv v2 (#478) Co-authored-by: Jeff Rasley <[email protected]> * ChatGPT: Refactor Hybrid Engine Config (#477) Co-authored-by: Lok Chand Koppaka <[email protected]> * Inference Workspace Tweaks (#481) * Safety checks around inference workspace allocation, extra flushing * Formatting fixes * Merge fix * Chatgpt/inference tp (#480) * Update the merged-QKV weights only if there is difference with the model parameter * remove the hard-coded size * always reset qkv params to updated ones after running step * Add the infernce-tp group and tensor sharding to run inference in model-parallel mode * optimize the gather/mp-sharding part * Add hybrid_engine changes * fix config issue * Formatting fixes. Reset_qkv duplicate removal. * fix bloom container. * fix format. --------- Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Lok Chand Koppaka <[email protected]> * fix formatting * more clean-up --------- Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Tunji Ruwase <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Lok Chand Koppaka <[email protected]> Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> * fix a bug on lora-fusion (#487) * Cholmes/v3 workspace bugfixes (#488) * Miscellaneous workspace fixes, new config param * Fix typo --------- Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: yaozhewei <[email protected]> Co-authored-by: Tunji Ruwase <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Lok Chand Koppaka <[email protected]> Co-authored-by: Connor Holmes <[email protected]>

tjruwase closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Padding in ZeRO optimizer #487

[Question] Padding in ZeRO optimizer #487

e-sha commented Oct 27, 2020

tjruwase commented Oct 28, 2020

e-sha commented Oct 30, 2020

tjruwase commented Oct 30, 2020

tjruwase commented Nov 24, 2020

[Question] Padding in ZeRO optimizer #487

[Question] Padding in ZeRO optimizer #487

Comments

e-sha commented Oct 27, 2020

tjruwase commented Oct 28, 2020

e-sha commented Oct 30, 2020

tjruwase commented Oct 30, 2020

tjruwase commented Nov 24, 2020