Fix for ffn_hidden_size
of 128, and better error message for incompatible ffn sizes.
#108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously if
ffn_hidden_size
was 128 and top k was equal to the number of experts, the output ofnnz_per_column = ops.inclusive_cumsum(nnz_per_column, 0)
would be something liketorch.Tensor(1)
instead oftorch.Tensor([1])
-- a zero dimensional tensor instead of a one dimensional tensor. This was causing an error during concatenation on the next line:To address the bug, we simply make
nnz_per_column
a 1D tensor if it's 0D. I added a new set of parameters to the dmoe tests that fails without this change and succeeds with the change. I successfully ran the llm foundry torch_dmoe vs mb_dmoe tests to verify correctness of this change as well.The second change is to have better error messages for invalid
ffn_hidden_size
values to help external users.You can reproduce this error with the small script below as well: