[hotfix] fix parameter shape checking #6124

flybird11111 · 2024-11-11T10:02:44Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

cingtiye · 2024-11-11T11:27:23Z

colossalai/checkpoint_io/utils.py

@@ -111,11 +111,6 @@ def search_tp_partition_dim(current_shape: torch.Size, original_shape: torch.Siz
        if length > current_shape[dim]:
            partition_dim = dim
            break
-    if partition_dim is not None:
-        assert (
-            original_shape[partition_dim] == tp_size * current_shape[partition_dim]


original_shape = [128256, 4096]
current_shape = [16064, 4096]
tp_size = 8

16064 * 8 = 128512 != 128256

When I delete 114~118, the size of model.embed_tokens.weight is [16064, 4096] in the saved model .

cingtiye · 2024-11-11T11:43:48Z

colossalai/checkpoint_io/utils.py

-    if partition_dim is not None:
-        assert (
-            original_shape[partition_dim] == tp_size * current_shape[partition_dim]
-        ), f"The parameter isn't evenly distributed among tensor parallel group: \


current_shape should be [16032, 4096]

original_shape refers to the shape before padding, while tp_size * current_shape refers to the shape after padding. Therefore, the assert will throw an error.

cingtiye · 2024-11-13T06:54:07Z

why make_vocab_size_divisible_by is 64.
When tensor_parallel_size = 8, num_embedding = 128256, make_vocab_size_divisible_by * tensor_parallel_size = 64 * 8 = 512,
num_embeddings % multiple = 128256 % 512 != 0, so the num_embeddings is wrong!

class VocabParallelEmbedding1D(PaddingParallelModule):
    r"""Embedding parallelized in the vocabulary dimension.

    Args:
        num_embeddings (int): number of embeddings.
        embedding_dim (int): dimension of embedding.
        padding_idx (int, optional): If specified, the entries at padding_idx do not contribute to the gradient;
            therefore, the embedding vector at padding_idx is not updated during training,
            i.e. it remains as a fixed “pad”, defaults to None.
        dtype (:class:`torch.dtype`, optional): The dtype of parameters, defaults to None.
        weight_initializer (:class:`typing.Callable`, optional):
            he initializer of weight, defaults to normal initializer.

    The ``args`` and ``kwargs`` used in :class:``torch.nn.functional.embedding`` should contain:
    ::
        max_norm (float, optional): If given, each embedding vector with norm larger than max_norm is
                    renormalized to have norm max_norm. Note: this will modify weight in-place.
        norm_type (float, optional): The p of the p-norm to compute for the max_norm option. Default 2.
        scale_grad_by_freq (bool, optional): If given, this will scale gradients by the inverse
                    of frequency of the words in the mini-batch. Default False.
        sparse (bool, optional): If True, gradient w.r.t. weight will be a sparse tensor. Default False.

    More details about ``args`` and ``kwargs`` could be found in
    `Embedding <https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding.html#torch.nn.functional.embedding>`_.

    More details about initializer please refer to
    `init <https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/nn/init.py>`_.
    """

    def __init__(
        self,
        num_embeddings: int,
        embedding_dim: int,
        padding_idx: int = None,
        dtype: torch.dtype = None,
        device: torch.device = None,
        process_group: ProcessGroup = None,
        weight: Optional[nn.Parameter] = None,
        weight_initializer: Callable = init.normal_(),
        make_vocab_size_divisible_by: int = 64,
        fp8_communication: bool = False,
        *args,
        **kwargs,
    ):
        self.num_embeddings = num_embeddings
        self.embedding_dim = embedding_dim
        self.embed_args = args
        self.embed_kwargs = kwargs
        self.process_group = process_group
        self.fp8_communication = fp8_communication

        tensor_parallel_size = dist.get_world_size(group=process_group)
        tensor_parallel_rank = dist.get_rank(group=process_group)

        # generate weight and bias
        if weight is None:
            factory_kwargs = {"device": device, "dtype": dtype}
            weight = nn.Parameter(torch.empty((num_embeddings, self.embedding_dim), **factory_kwargs))
        else:
            weight.data = weight.data.to(device=device, dtype=dtype)

        # calculate new padding size
        multiple = make_vocab_size_divisible_by * tensor_parallel_size
        if num_embeddings % multiple != 0:
            self.num_embeddings = num_embeddings + multiple - (num_embeddings % multiple)

cingtiye · 2024-11-13T07:26:12Z

Propose to revise the following code:
https://github.com/hpcaitech/ColossalAI/blob/a2596519fd8f25da13da622e5188b9a18024f3c0/colossalai/shardformer/layer/embedding.py
Line 186 and Line 304

insert raise VallueError, and delete self.num_embeddings = ( num_embeddings + make_vocab_size_divisible_by - (num_embeddings % make_vocab_size_divisible_by)

if num_embeddings % make_vocab_size_divisible_by != 0:
              raise VallueError
#            self.num_embeddings = (
                num_embeddings + make_vocab_size_divisible_by - (num_embeddings % make_vocab_size_divisible_by)
            )

flybird11111 · 2024-11-18T06:26:33Z

make_vocab_size_divisible_by

The make_vocab_size_divisible_by is used to ensure that the vocabulary size is divisible by 64. This is because torch.mm will select a faster operator when the size is a multiple of 64. Later, the result will be unpadded.

cingtiye · 2024-11-18T06:53:08Z

make_vocab_size_divisible_by

The make_vocab_size_divisible_by is used to ensure that the vocabulary size is divisible by 64. This is because torch.mm will select a faster operator when the size is a multiple of 64. Later, the result will be unpadded.

The issue has been closed. Could you please look at the #6110?

fix 6108

393c31d

flybird11111 requested a review from a team as a code owner November 11, 2024 10:02

flybird11111 mentioned this pull request Nov 11, 2024

[BUG]: Llama3.1-70B-instruct save model #6108

Closed

1 task

cingtiye reviewed Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hotfix] fix parameter shape checking #6124

[hotfix] fix parameter shape checking #6124

flybird11111 commented Nov 11, 2024

cingtiye Nov 11, 2024

cingtiye Nov 11, 2024

flybird11111 Nov 18, 2024

cingtiye commented Nov 13, 2024

cingtiye commented Nov 13, 2024

flybird11111 commented Nov 18, 2024 •

edited

Loading

cingtiye commented Nov 18, 2024

[hotfix] fix parameter shape checking #6124

Are you sure you want to change the base?

[hotfix] fix parameter shape checking #6124

Conversation

flybird11111 commented Nov 11, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

cingtiye Nov 11, 2024

Choose a reason for hiding this comment

cingtiye Nov 11, 2024

Choose a reason for hiding this comment

flybird11111 Nov 18, 2024

Choose a reason for hiding this comment

cingtiye commented Nov 13, 2024

cingtiye commented Nov 13, 2024

flybird11111 commented Nov 18, 2024 • edited Loading

cingtiye commented Nov 18, 2024

flybird11111 commented Nov 18, 2024 •

edited

Loading