Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove mask if use fusion mask (#9723)
* Remove mask if use fusion mask Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Apply isort and black reformatting Signed-off-by: hsiehjackson <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: hsiehjackson <[email protected]> Co-authored-by: hsiehjackson <[email protected]>
- Loading branch information
27b5c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaoyu-33, @xuanzic finds our neva_evaluation.py not working on main branch. After checking, we find this commit breaks neva evaluation since in neva, the compute_attention_mask is always assumed to be
True
so the attention_mask will always be computed. This line makes attention_mask to be None. To be specific: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/multimodal/models/multimodal_llm/neva/neva_model.py#L106927b5c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiehjackson For vis.
27b5c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Slyne I think you need to set
get_attention_mask_from_fusion: False
in your config https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L15527b5c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsiehjackson It can be a bit inappropriate if users set
compute_attention_mask=True
on purpose and it actually usesFalse
(get_attention_mask_from_fusion
) without giving warning or error. Andgenerate()
function is public and would be called by some other functions.Could you help check if you can move this state outside the
generate()
function? Such asmegatron_gpt_generate()
. Or simply move this statement to model forward function to set attention_mask to None whenget_attention_mask_from_fusion
is set toTrue
.27b5c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Slyne The reason we want to set False is because we face OOM when creating a giant 2D attention mask for long context. Usually, we use flash attention (built-in causal masking), so we don't need to create this tensor to occupy GPU memory. Move to forward function is a good suggestion; however, the creation of attention mask is under text generation strategy here. Can you check whether attention mask is
None
before calling.cuda()
like MegatronGPTModel? It would be great if you can also prevent creating the attention mask if you are using causal mask with flash attention.