Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass the loss from the compile call to the target_encoder instantiation #277

Open
german1608 opened this issue Jul 1, 2022 · 7 comments

Comments

@german1608
Copy link

Scikeras version: 0.8.0

(I feel) related to #206

I was following the MLPClassifier tutorial on the wiki page. It was great that the model function could handle binary and multi-class classification. However, I encountered this issue while executing the test.:

    ValueError: Shapes (None, 1) and (None, 6) are incompatible

My y has 6 classes. I'm using directly KerasClassifier i.e. no sub-classing. This is how I was creating the classifier

def init_model_object(**params) -> Sequential:
    log.info('HP-PARAMS: %s', params)

    def get_clf_model(meta: Dict[str, Any], compile_kwargs: Dict[str, Any]) -> Sequential:
        model = Sequential(name='LSTM-cf-genie')

        model.add(
            layers.ZeroPadding1D(
                padding=3,
                name='zero-padding-layer',
                input_shape=(
                    meta['n_features_in_'],
                    1)))

        model.add(layers.Bidirectional(layers.LSTM(16, name='lstm-layer', return_sequences=True)))

        model.add(layers.LSTM(50, name='lstm-layer-2', return_sequences=False))

        if meta['target_type_'] == 'multiclass':
            n_output_units = meta['n_classes_']
            output_activation = 'softmax'
            loss = 'categorical_crossentropy'
            metrics = ['categorical_accuracy']
        elif meta['target_type_'] == 'binary':
            n_output_units = 1
            output_activation = 'sigmoid'
            loss = 'binary_crossentropy'
            metrics = ['binary_accuracy']
        else:
            raise ValueError('Model does not support target type: ' + meta['target_type_'])

        model.add(layers.Dense(n_output_units, name='output', activation=output_activation))

        model.compile(loss=loss, metrics=metrics, optimizer=compile_kwargs['optimizer'])

        model.summary()
        return model

    clf = KerasClassifier(
        model=get_clf_model,
        epochs=50,
        batch_size=500,
        verbose=1,
        # We have to set this value even for binary classification. Otherwise, the target encoder won't use One hot encoding
        # loss='categorical_crossentropy',
        optimizer='adam',
        optimizer__learning_rate=0.001,
    )
    return clf

Initially, I was passing the loss on the KerasClassifier parameter, and it was training fine. But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function. This is where the exception was starting to show up. I took a look at how scikeras initializes the target encoder:

scikeras/scikeras/wrappers.py

Lines 1395 to 1415 in d50e75a

def target_encoder(self):
"""Retrieve a transformer for targets / y.
For ``KerasClassifier.predict_proba`` to
work, this transformer must accept a ``return_proba``
argument in ``inverse_transform`` with a default value
of False.
Metadata will be collected from ``get_metadata`` if
the transformer implements that method.
Override this method to implement a custom data transformer
for the target.
Returns
-------
sklearn-transformer
Transformer implementing the sklearn transformer
interface.
"""
categories = "auto" if self.classes_ is None else [self.classes_]
return ClassifierLabelEncoder(loss=self.loss, categories=categories)

target_type = self._type_of_target(y)
keras_dtype = np.dtype(tf.keras.backend.floatx())
self._y_shape = y.shape
encoders = {
"binary": make_pipeline(
TargetReshaper(),
OrdinalEncoder(dtype=keras_dtype, categories=self.categories),
),
"multiclass": make_pipeline(
TargetReshaper(),
OrdinalEncoder(dtype=keras_dtype, categories=self.categories),
),
"multiclass-multioutput": FunctionTransformer(),
"multilabel-indicator": FunctionTransformer(),
}
if _is_categorical_crossentropy(self.loss):
encoders["multiclass"] = make_pipeline(
TargetReshaper(),
OneHotEncoder(
sparse=False, dtype=keras_dtype, categories=self.categories
),
)

Before it was using one-hot encoding because I was passing loss='categorical_crossentropy' to KerasClassifier.

What ended up working for me was to still use loss='categorical_crossentropy'. It looks like it doesn't affect scores by using sklearns cross_validate (correct me if I'm wrong), and also it doesn't affect that the target_encoder would use ordinal encoding. The drawback of this solution is that it doesn't look suitable and may confuse new-comers.

Other solutions that I thought to solve my particular problem were:

  • Using a final single output layer for the multi-class classification problem, but in my case, that wasn't working really well...;
  • One-hot encode even for the binary classification problem, but every tutorial that I found on the internet recommends using a single output node.

To finally solve this issue, I propose to extract the loss (and perhaps the optimizer?) from the model, I suppose around these lines (I don't have any experience on this repository)

if not ((self.warm_start or warm_start) and self.initialized_):
X, y = self._initialize(X, y)
else:
X, y = self._validate_data(X, y)
self._ensure_compiled_model()

@german1608 german1608 changed the title Pass the loss from the compile call to the target_encoder instantiationf Pass the loss from the compile call to the target_encoder instantiation Jul 1, 2022
@german1608
Copy link
Author

german1608 commented Jul 1, 2022

errata: idk why it was not failing before, but now I get this exception when setting categorical_crossentropy:

ValueError: loss=categorical_crossentropy but model compiled with binary_crossentropy. Data may not match loss function!

Which makes sense. Still my proposal holds. My solution was to subclass kerasclassifier and add a custom target_encoder that always "uses" categorical_crossentropy.

@adriangb
Copy link
Owner

adriangb commented Jul 1, 2022

Thank you for the detailed issue report.

Currently the transformers are initialized and fit before the model is created, so there's no introspection possible:

self.target_encoder_ = self.target_encoder.fit(y)
target_metadata = getattr(self.target_encoder_, "get_metadata", dict)()
vars(self).update(**target_metadata)
self.feature_encoder_ = self.feature_encoder.fit(X)
feature_meta = getattr(self.feature_encoder, "get_metadata", dict)()
vars(self).update(**feature_meta)
self.model_ = self._build_keras_model()

If we switched the order, the model building function won't have access to certain metadata which is pretty useful for dynamically creating models:

return {
"classes_": self.classes_,
"n_classes_": self.n_classes_,
"n_outputs_": self.n_outputs_,
"n_outputs_expected_": self.n_outputs_expected_,
}

But since I wanted to make my model as plug-and-play as possible, I moved the loss setting inside the model function.

So your goal is to have automatically choose a loss based on the input data, right? Currently it works the other way around: you can hardcode the loss to "categorical_crossentropy" and the input will automatically get one-hot encoded

@german1608
Copy link
Author

So your goal is to have automatically choose a loss based on the input data, right?

Based on the output data, actually. That would work.

NOTE: Don't feel that I'm imposing this. I'm just raising something that caught my attention. Perhaps there is another solution than automatically setting the loss based on the output dimensions.

@adriangb
Copy link
Owner

adriangb commented Jul 1, 2022

Based on the output data, actually.

yup sorry bad wording on my point, I'm referring to y which is the output to the model but also an input in the Python function argument sense...

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

@german1608
Copy link
Author

Is there a problem with the loss always being "categorical_crossentropy" and y being encoded to match? IIRC that's what scikit-learns MLPClassifier does. I guess a small performance hit?

Even for binary classification? Would that affect how the target_encoder is initialized for binary classification?

@adriangb
Copy link
Owner

adriangb commented Jul 1, 2022

I think it should still work for binary classification, yes.

But I'm looking at the MLPClassifier notebook/guide again. It is already dynamically setting the loss function. It uses "sparse_categorical_crossentropy" for multi class targets so that they do not need to be one-hot encoded (and thus the transformer doesn't need to know about the model's loss function at all). Could you do that instead or do you need to use "categorical_crossentropy" for multi class targets?

@german1608
Copy link
Author

I would test soon and give you feedback. Thanks for your suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants