Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Transformers MLP Module Type Miss match #25347

Closed
2 of 4 tasks
drunkcoding opened this issue Aug 7, 2023 · 6 comments · Fixed by #25427
Closed
2 of 4 tasks

Switch Transformers MLP Module Type Miss match #25347

drunkcoding opened this issue Aug 7, 2023 · 6 comments · Fixed by #25427

Comments

@drunkcoding
Copy link

System Info

  • transformers version: 4.29.2
  • Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
  • Python version: 3.8.16
  • Huggingface_hub version: 0.15.1
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.0.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @you

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

SwitchTransformersDenseGatedActDense defined but not used

switch-xxl-128 use SwitchTransformersDenseGatedActDense for every mlp but mlp initialization is forced to be type SwitchTransformersDenseActDense

class SwitchTransformersLayerFF(nn.Module):
    r"""
    Switch Transformers Feed Forward layer module. This is a wrapper around the Mixture of Experts module.

    Parameters:
        config : ([`SwitchTransformersConfig`]): Model configuration class with all the parameters of the model.
            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
        is_sparse (`bool`):
            Whether the MLP layer is a `Sparse` layer (contains a Mixture of Experts) or not
    """

    def __init__(self, config: SwitchTransformersConfig, is_sparse=False):
        super().__init__()
        self.is_sparse = is_sparse

        # Check if it is a sparse layer, if not then it is a dense layer
        if not self.is_sparse:
            self.mlp = SwitchTransformersDenseActDense(config)
        else:
            self.mlp = SwitchTransformersSparseMLP(config)

Expected behavior

loading state dict error with strict keys

@amyeroberts
Copy link
Collaborator

Hi @drunkcoding, thanks for raising this issue!

Indeed, it seems the SwitchTransformersDenseGatedActDense isn't used anywhere - thanks for pointing this out. Would you like to open a PR to resolve this so the correct layer is selected? This way you get the github contribution.

cc @younesbelkada

@drunkcoding
Copy link
Author

The config file does not have a field that indicates we should select gated act. One field related is is_gated_act, but this is always set to false in all configuration files, which does not match with the checkpoint. This should also be fixed in order to make the implementation work.

@ArthurZucker
Copy link
Collaborator

Hey! Since Younes is OOO I'll have a look!

@drunkcoding
Copy link
Author

More configuration error, the encoder use num_layers=48 the decoder use num_decoder_layers=12 in switch-xxl-128, the fields are wrong. In checkpoint file, both encoder and decoder has 24 layers

@ArthurZucker
Copy link
Collaborator

Are you talking about the configuration_switch or the config.json because the default values are not for the switch-xxl-128 see here:

Instantiating a configuration with the defaults will yield a similar configuration to that of the SwitchTransformers google/switch-base-8 architecture.

@drunkcoding
Copy link
Author

I mean the values in config.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants