Switch Transformers MLP Module Type Miss match #25347

drunkcoding · 2023-08-07T11:22:14Z

System Info

transformers version: 4.29.2
Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
Python version: 3.8.16
Huggingface_hub version: 0.15.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @you

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

SwitchTransformersDenseGatedActDense defined but not used

switch-xxl-128 use SwitchTransformersDenseGatedActDense for every mlp but mlp initialization is forced to be type SwitchTransformersDenseActDense

class SwitchTransformersLayerFF(nn.Module):
    r"""
    Switch Transformers Feed Forward layer module. This is a wrapper around the Mixture of Experts module.

    Parameters:
        config : ([`SwitchTransformersConfig`]): Model configuration class with all the parameters of the model.
            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
        is_sparse (`bool`):
            Whether the MLP layer is a `Sparse` layer (contains a Mixture of Experts) or not
    """

    def __init__(self, config: SwitchTransformersConfig, is_sparse=False):
        super().__init__()
        self.is_sparse = is_sparse

        # Check if it is a sparse layer, if not then it is a dense layer
        if not self.is_sparse:
            self.mlp = SwitchTransformersDenseActDense(config)
        else:
            self.mlp = SwitchTransformersSparseMLP(config)

Expected behavior

loading state dict error with strict keys

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-08-07T12:54:51Z

Hi @drunkcoding, thanks for raising this issue!

Indeed, it seems the SwitchTransformersDenseGatedActDense isn't used anywhere - thanks for pointing this out. Would you like to open a PR to resolve this so the correct layer is selected? This way you get the github contribution.

cc @younesbelkada

drunkcoding · 2023-08-07T13:28:05Z

The config file does not have a field that indicates we should select gated act. One field related is is_gated_act, but this is always set to false in all configuration files, which does not match with the checkpoint. This should also be fixed in order to make the implementation work.

ArthurZucker · 2023-08-07T13:54:43Z

Hey! Since Younes is OOO I'll have a look!

drunkcoding · 2023-08-07T14:08:31Z

More configuration error, the encoder use num_layers=48 the decoder use num_decoder_layers=12 in switch-xxl-128, the fields are wrong. In checkpoint file, both encoder and decoder has 24 layers

ArthurZucker · 2023-08-07T14:16:13Z

Are you talking about the configuration_switch or the config.json because the default values are not for the switch-xxl-128 see here:

Instantiating a configuration with the defaults will yield a similar configuration to that of the SwitchTransformers google/switch-base-8 architecture.

drunkcoding · 2023-08-07T14:26:00Z

I mean the values in config.json

ArthurZucker mentioned this issue Aug 10, 2023

[SwitchTransformers] Remove unused module #25427

Merged

ArthurZucker closed this as completed in #25427 Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Transformers MLP Module Type Miss match #25347

Switch Transformers MLP Module Type Miss match #25347

drunkcoding commented Aug 7, 2023

amyeroberts commented Aug 7, 2023

drunkcoding commented Aug 7, 2023

ArthurZucker commented Aug 7, 2023

drunkcoding commented Aug 7, 2023

ArthurZucker commented Aug 7, 2023

drunkcoding commented Aug 7, 2023

Switch Transformers MLP Module Type Miss match #25347

Switch Transformers MLP Module Type Miss match #25347

Comments

drunkcoding commented Aug 7, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Aug 7, 2023

drunkcoding commented Aug 7, 2023

ArthurZucker commented Aug 7, 2023

drunkcoding commented Aug 7, 2023

ArthurZucker commented Aug 7, 2023

drunkcoding commented Aug 7, 2023