Pass the `loss` from the `compile` call to the target_encoder instantiation #277

german1608 · 2022-07-01T02:46:48Z

Scikeras version: 0.8.0

(I feel) related to #206

I was following the MLPClassifier tutorial on the wiki page. It was great that the model function could handle binary and multi-class classification. However, I encountered this issue while executing the test.:

    ValueError: Shapes (None, 1) and (None, 6) are incompatible

My y has 6 classes. I'm using directly KerasClassifier i.e. no sub-classing. This is how I was creating the classifier

def init_model_object(**params) -> Sequential:
    log.info('HP-PARAMS: %s', params)

    def get_clf_model(meta: Dict[str, Any], compile_kwargs: Dict[str, Any]) -> Sequential:
        model = Sequential(name='LSTM-cf-genie')

        model.add(
            layers.ZeroPadding1D(
                padding=3,
                name='zero-padding-layer',
                input_shape=(
                    meta['n_features_in_'],
                    1)))

        model.add(layers.Bidirectional(layers.LSTM(16, name='lstm-layer', return_sequences=True)))

        model.add(layers.LSTM(50, name='lstm-layer-2', return_sequences=False))

        if meta['target_type_'] == 'multiclass':
            n_output_units = meta['n_classes_']
            output_activation = 'softmax'
            loss = 'categorical_crossentropy'
            metrics = ['categorical_accuracy']
        elif meta['target_type_'] == 'binary':
            n_output_units = 1
            output_activation = 'sigmoid'
            loss = 'binary_crossentropy'
            metrics = ['binary_accuracy']
        else:
            raise ValueError('Model does not support target type: ' + meta['target_type_'])

        model.add(layers.Dense(n_output_units, name='output', activation=output_activation))

        model.compile(loss=loss, metrics=metrics, optimizer=compile_kwargs['optimizer'])

        model.summary()
        return model

    clf = KerasClassifier(
        model=get_clf_model,
        epochs=50,
        batch_size=500,
        verbose=1,
        # We have to set this value even for binary classification. Otherwise, the target encoder won't use One hot encoding
        # loss='categorical_crossentropy',
        optimizer='adam',
        optimizer__learning_rate=0.001,
    )
    return clf

Initially, I was passing the loss on the KerasClassifier parameter, and it was training fine. But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function. This is where the exception was starting to show up. I took a look at how scikeras initializes the target encoder:

scikeras/scikeras/wrappers.py

Lines 1395 to 1415 in d50e75a

    
               def target_encoder(self): 
        
                   """Retrieve a transformer for targets / y. 
        
                   For ``KerasClassifier.predict_proba`` to 
        
                   work, this transformer must accept a ``return_proba`` 
        
                   argument in ``inverse_transform`` with a default value 
        
                   of False. 
        
                   Metadata will be collected from ``get_metadata`` if 
        
                   the transformer implements that method. 
        
                   Override this method to implement a custom data transformer 
        
                   for the target. 
        
                   Returns 
        
                   ------- 
        
                   sklearn-transformer 
        
                       Transformer implementing the sklearn transformer 
        
                       interface. 
        
                   """ 
        
                   categories = "auto" if self.classes_ is None else [self.classes_] 
        
                   return ClassifierLabelEncoder(loss=self.loss, categories=categories)

scikeras/scikeras/utils/transformers.py

Lines 154 to 175 in d50e75a

    
           target_type = self._type_of_target(y) 
        
           keras_dtype = np.dtype(tf.keras.backend.floatx()) 
        
           self._y_shape = y.shape 
        
           encoders = { 
        
               "binary": make_pipeline( 
        
                   TargetReshaper(), 
        
                   OrdinalEncoder(dtype=keras_dtype, categories=self.categories), 
        
               ), 
        
               "multiclass": make_pipeline( 
        
                   TargetReshaper(), 
        
                   OrdinalEncoder(dtype=keras_dtype, categories=self.categories), 
        
               ), 
        
               "multiclass-multioutput": FunctionTransformer(), 
        
               "multilabel-indicator": FunctionTransformer(), 
        
           } 
        
           if _is_categorical_crossentropy(self.loss): 
        
               encoders["multiclass"] = make_pipeline( 
        
                   TargetReshaper(), 
        
                   OneHotEncoder( 
        
                       sparse=False, dtype=keras_dtype, categories=self.categories 
        
                   ), 
        
               )

Before it was using one-hot encoding because I was passing loss='categorical_crossentropy' to KerasClassifier.

What ended up working for me was to still use loss='categorical_crossentropy'. It looks like it doesn't affect scores by using sklearns cross_validate (correct me if I'm wrong), and also it doesn't affect that the target_encoder would use ordinal encoding. The drawback of this solution is that it doesn't look suitable and may confuse new-comers.

Other solutions that I thought to solve my particular problem were:

Using a final single output layer for the multi-class classification problem, but in my case, that wasn't working really well...;
One-hot encode even for the binary classification problem, but every tutorial that I found on the internet recommends using a single output node.

To finally solve this issue, I propose to extract the loss (and perhaps the optimizer?) from the model, I suppose around these lines (I don't have any experience on this repository)

scikeras/scikeras/wrappers.py

Lines 897 to 901 in d50e75a

    
           if not ((self.warm_start or warm_start) and self.initialized_): 
        
               X, y = self._initialize(X, y) 
        
           else: 
        
               X, y = self._validate_data(X, y) 
        
           self._ensure_compiled_model()

The text was updated successfully, but these errors were encountered:

german1608 · 2022-07-01T10:25:04Z

errata: idk why it was not failing before, but now I get this exception when setting categorical_crossentropy:

ValueError: loss=categorical_crossentropy but model compiled with binary_crossentropy. Data may not match loss function!

Which makes sense. Still my proposal holds. My solution was to subclass kerasclassifier and add a custom target_encoder that always "uses" categorical_crossentropy.

adriangb · 2022-07-01T19:20:49Z

Thank you for the detailed issue report.

Currently the transformers are initialized and fit before the model is created, so there's no introspection possible:

scikeras/scikeras/wrappers.py

Lines 828 to 835 in d50e75a

    
           self.target_encoder_ = self.target_encoder.fit(y) 
        
           target_metadata = getattr(self.target_encoder_, "get_metadata", dict)() 
        
           vars(self).update(**target_metadata) 
        
           self.feature_encoder_ = self.feature_encoder.fit(X) 
        
           feature_meta = getattr(self.feature_encoder, "get_metadata", dict)() 
        
           vars(self).update(**feature_meta) 
        
           self.model_ = self._build_keras_model()

If we switched the order, the model building function won't have access to certain metadata which is pretty useful for dynamically creating models:

scikeras/scikeras/utils/transformers.py

Lines 306 to 311 in d50e75a

    
           return { 
        
               "classes_": self.classes_, 
        
               "n_classes_": self.n_classes_, 
        
               "n_outputs_": self.n_outputs_, 
        
               "n_outputs_expected_": self.n_outputs_expected_, 
        
           }

But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function.

So your goal is to have automatically choose a loss based on the input data, right? Currently it works the other way around: you can hardcode the loss to "categorical_crossentropy" and the input will automatically get one-hot encoded

german1608 · 2022-07-01T19:31:27Z

So your goal is to have automatically choose a loss based on the input data, right?

Based on the output data, actually. That would work.

NOTE: Don't feel that I'm imposing this. I'm just raising something that caught my attention. Perhaps there is another solution than automatically setting the loss based on the output dimensions.

adriangb · 2022-07-01T19:33:29Z

Based on the output data, actually.

yup sorry bad wording on my point, I'm referring to y which is the output to the model but also an input in the Python function argument sense...

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

german1608 · 2022-07-01T19:44:18Z

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

Even for binary classification? Would that affect how the target_encoder is initialized for binary classification?

adriangb · 2022-07-01T19:47:30Z

I think it should still work for binary classification, yes.

But I'm looking at the MLPClassifier notebook/guide again. It is already dynamically setting the loss function. It uses "sparse_categorical_crossentropy" for multi class targets so that they do not need to be one-hot encoded (and thus the transformer doesn't need to know about the model's loss function at all). Could you do that instead or do you need to use "categorical_crossentropy" for multi class targets?

german1608 · 2022-07-01T21:45:07Z

I would test soon and give you feedback. Thanks for your suggestions

german1608 changed the title ~~Pass the loss from the compile call to the target_encoder instantiationf~~ Pass the loss from the compile call to the target_encoder instantiation Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass the `loss` from the `compile` call to the target_encoder instantiation #277

Pass the `loss` from the `compile` call to the target_encoder instantiation #277

german1608 commented Jul 1, 2022

german1608 commented Jul 1, 2022 •

edited

Loading

adriangb commented Jul 1, 2022 •

edited

Loading

german1608 commented Jul 1, 2022

adriangb commented Jul 1, 2022 •

edited

Loading

german1608 commented Jul 1, 2022

adriangb commented Jul 1, 2022

german1608 commented Jul 1, 2022

Pass the loss from the compile call to the target_encoder instantiation #277

Pass the loss from the compile call to the target_encoder instantiation #277

Comments

german1608 commented Jul 1, 2022

german1608 commented Jul 1, 2022 • edited Loading

adriangb commented Jul 1, 2022 • edited Loading

german1608 commented Jul 1, 2022

adriangb commented Jul 1, 2022 • edited Loading

german1608 commented Jul 1, 2022

adriangb commented Jul 1, 2022

german1608 commented Jul 1, 2022

Pass the `loss` from the `compile` call to the target_encoder instantiation #277

Pass the `loss` from the `compile` call to the target_encoder instantiation #277

german1608 commented Jul 1, 2022 •

edited

Loading

adriangb commented Jul 1, 2022 •

edited

Loading

adriangb commented Jul 1, 2022 •

edited

Loading