-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: better error message about one-hot encoded targets w/ loss="auto" #218
MAINT: better error message about one-hot encoded targets w/ loss="auto" #218
Conversation
This is great, good idea! A lot of my concerns are mitigated by the fact that:
|
This PR still needs a test to catch when the user mis-specifies the number of output neurons. |
That is, to check for this |
Co-authored-by: Adrian Garcia Badaracco <[email protected]>
Would you be open to, instead of raising the specific ValueError, allow the more general error below to be raised for |
Are you talking adding the message "loss='auto' is not supported for tasks of type 'multilabel-indicator'" to this ValueError? |
I'm talking about a flow like this: if loss == "auto":
activation = self.model_.layers[-1].activation.__name__
...
elif (
self.target_type_ == "multilabel-indicator"
and
hasattr(self, "n_classes_")
and
self.model_.outputs[0].shape[1] == self.n_classes_
and
activation in ("linear", "softmax")
):
use_logits = activation == "linear"
compile_kwargs["loss"] = CategoricalCrossentropy(from_logits=use_logits)
else:
raise NotImplementedError(
f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
" Instead, you must explicitly pass a loss function, for example:"
'\n clf = KerasClassifier(..., loss="categorical_crossentropy")'
) Basically, we check if the combination of the target-type and model outputs indicate with high certainty that this is a run of the mill multiclass problem with one-hot encoded labels and if not, we tell the user that we don't know how to handle this type of target (which would then be classified as `multilabel-indicator'). Full exampledef _compile_model(self, compile_kwargs: Dict[str, Any]) -> None:
if compile_kwargs["loss"] == "auto":
if len(self.model_.outputs) > 1:
raise ValueError(
'Only single-output models are supported with `loss="auto"`'
)
activation = self.model_.get_layer(self.model_.output_names[0]).activation
if hasattr(activation, "__name__"):
activation_name = activation.__name__
else:
activation_name = None
if self.target_type_ == "binary":
if self.model_.outputs[0].shape[1] != 1:
raise ValueError(
"Binary classification expects a model with exactly 1 output unit."
)
compile_kwargs["loss"] = "binary_crossentropy"
elif self.target_type_ == "multiclass":
if self.model_.outputs[0].shape[1] == 1:
raise ValueError(
"Multi-class targets require the model to have >1 output units."
)
compile_kwargs["loss"] = "sparse_categorical_crossentropy"
elif (
self.target_type_ == "multilabel-indicator"
and
hasattr(self, "n_classes_")
and
self.model_.output_shape[1:] == (self.n_classes_, )
and
activation_name in ("linear", "softmax")
):
use_logits = activation_name == "linear"
compile_kwargs["loss"] = losses_module.CategoricalCrossentropy(from_logits=use_logits)
else:
raise NotImplementedError(
f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
" Instead, you must explicitly pass a loss function, for example:"
'\n clf = KerasClassifier(..., loss="categorical_crossentropy")'
)
self.model_.compile(**compile_kwargs) |
I think checks for the
I think SciKeras already makes that check.
That sounds like a pretty clear indicator that this is a run-of-the-mill multiclass problem with one-hot encoded labels. What use case are you imagining where it fails? |
This use case: def get_model(compile):
model = keras.Sequential(
[
keras.Input((num_classes,)),
keras.layers.Dense(num_classes, activation="relu"),
keras.layers.Dense(num_classes, activation="sigmoid"),
]
)
if compile:
model.compile(loss="binary_crossentropy")
return model This is used when each class is independent and there can be more than 1 class per label, for example to characterize sentiment in text or movie genres. Full example code on fc9e7eeimport numpy as np
from scikeras.wrappers import KerasClassifier
from tensorflow import keras
num_classes = 3
n_samples = 10000
y = np.random.randint(low=0, high=2, size=(n_samples, num_classes))
X = y
def get_model(compile):
model = keras.Sequential(
[
keras.Input((num_classes,)),
keras.layers.Dense(num_classes, activation="relu"),
keras.layers.Dense(num_classes, activation="sigmoid"),
]
)
if compile:
model.compile(loss="binary_crossentropy")
return model
m1 = get_model(True)
m1.fit(X, y, epochs=10)
m2 = KerasClassifier(get_model, model__compile=False, epochs=10).fit(X, y)
assert m2.model_.loss == "categorical_crossentropy" Even worse than an error, there is no error, SciKeras silently chooses I'm not sure we should support this use case via If this sounds reasonable to you, I can push a commit that distinguishes and supports both of these use cases and passes all of the tests, with the caveat that they are distinguished by introspecting the activation of the last layer ( Proposed _compile_modeldef _compile_model(self, compile_kwargs: Dict[str, Any]) -> None:
if compile_kwargs["loss"] == "auto":
if len(self.model_.outputs) > 1:
raise ValueError(
'Only single-output models are supported with `loss="auto"`'
)
activation = self.model_.get_layer(self.model_.output_names[0]).activation
activation_name = None
if hasattr(activation, "__name__"):
activation_name = activation.__name__
if self.target_type_ == "binary":
if self.model_.outputs[0].shape[1] != 1:
raise ValueError(
"Binary classification expects a model with exactly 1 output unit."
)
compile_kwargs["loss"] = "binary_crossentropy"
elif self.target_type_ == "multiclass":
if self.model_.outputs[0].shape[1] == 1:
raise ValueError(
"Multi-class targets require the model to have >1 output units."
)
compile_kwargs["loss"] = "sparse_categorical_crossentropy"
elif (
self.target_type_ == "multilabel-indicator"
and
hasattr(self, "n_classes_")
and
self.model_.output_shape[1:] == (self.n_classes_, )
and
activation_name == "softmax"
):
# one-hot encoded multiclass problem
compile_kwargs["loss"] = "categorical_crossentropy"
elif (
self.target_type_ == "multilabel-indicator"
and
hasattr(self, "n_classes_")
and
self.model_.output_shape[1:] == (self.n_classes_, )
and
activation_name == "sigmoid"
):
# multilabel-indicator multiclass problem
compile_kwargs["loss"] = "binary_crossentropy"
else:
raise NotImplementedError(
f'`loss="auto"` is not supported for tasks of type {self.target_type_}.'
" Instead, you must explicitly pass a loss function, for example:"
'\n clf = KerasClassifier(..., loss="categorical_crossentropy")'
) What will not be supported is a linear output activation w/ a loss that takes logits. That is, SciKeras would not support the following model with def get_model(compile):
model = keras.Sequential(
[
keras.Input((num_classes,)),
keras.layers.Dense(num_classes, activation="relu"),
keras.layers.Dense(num_classes),
]
)
model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=True))
return model |
Why not ensure that one-hot encoded labels are passed before setting |
What do you mean by ensure that one-hot labels are passed? What if LabelEncoder gets non one-hot labels? If Edit: I think you mean have LabelEncoder set some attribute indicating that this is a one-hot problem. We'd now have 2 |
Maybe it'd be better to raise an error and provide a some information on why a user might not want to set |
Where would this error be raised? Am I understanding correctly that you feel that this extra bit of introspection into the model is too much, and that instead we should not support any of the use cases in #218 (comment)? |
I think a warning should be issued if SciKeras thinks
I think the last paragraph should be added dynamically.
Yes. I think the user knows their model/dataset better than SciKeras, and they're also a lot smarter than SciKeras. I think SciKeras should support them (provide links/good information) in guiding their choice. |
I agree on the error message if no loss is chosen. We can discuss the specifics of what that error message should contain separately. I'd first like to settle on what is supported, that is, what you mean by |
def _compile(...):
warning = ""
loss = None
if condition:
loss = "binary_crossentropy"
elif target_type == "multiclass-indicator" and hasattr(self, "n_classes_"):
n_out = ...
if n_out == self.n_classes_:
warning = (
"The target type is 'multilabel-indicator,' there are 4 classes and the model has 4 output neurons. "
"If there is only one class per example, loss='categorical_crossentropy' might be appropriate. "
"If there are multiple classes per example, loss='binary_crossentropy' might be appropriate. "
"For more detail, see XXX."
)
if loss is None:
err_msg = "loss='auto' failed to find a suitable loss. To clear this message, set loss to be a Keras loss ..."
if len(warning):
warning = f"\n\n{warning}"
raise ValueError("{err_msg}{warning}")
else:
compile_kwargs["loss"] = loss |
Thank you for the snipped, that is very helpful. So the |
Yeah, that's too much I think. It requires too introspection, or might produce unexpected results for the user. Users are smarter than software, so let's let them decide. I see why now you were hesitant to support it. This PR should be refactored. I could use some help there. |
👍
If you don't mind me pushing to your fork, I can make the edits to have this reflect the intention in #218 (comment). How does that sound? |
Please. Thanks for asking. I have the box "allow edits by maintainers" checked for a reason. |
@stsievert the basics are here, do you want to look at this here, or as part of #210 ? |
hint = ( | ||
"For this type of problem, the following may help:" | ||
'\n - If there is only one class per example, loss="categorical_crossentropy" might be appropriate.' | ||
'\n - If there are multiple classes per example, loss="binary_crossentropy" might be appropriate.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SciKeras should have a documentation page that explains this more, and points to other resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, however I'd like to table this until we merge this PR into #210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scikeras/wrappers.py
Outdated
f'`loss="auto"` is not supported for tasks of type {self.target_type_}.' | ||
" Instead, you must explicitly pass a loss function, for example:" | ||
"\nInstead, you must explicitly pass a loss function, for example:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit:
"\nInstead, you must explicitly pass a loss function, for example:" | |
" Instead, you must explicitly pass a loss function, for example:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two ways to clear this error:
- Compile the model yourself (even if loss="auto")
- Provide an explicit loss yourself (not loss="auto")
Is that accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add a mention of the first, but not more than that, users should know what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should definitely mention it in the error message. Maybe there should be a note about how it's only suitable for advanced usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a basic mention. I think that should be enough.
@@ -214,7 +219,7 @@ def test_multiclass_single_output_unit(): | |||
def test_binary_multiple_output_units(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't binary classification work with loss="categorical_crossentropy"? If so, shouldn't that information be conveyed in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It technically can since bce is simply a specific case of cce, but there is no advantage to that and, depending on the implementation, I can imagine there may be a performance hit. So I think this is one area where it may just be easier to enforce that loss="auto"
will not accomodate the use of a multiple output units with softmax activation by "intelligently" choosing cce, if that is what you were referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be clearly mentioned on the documentation. I think that's appropriate for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the new section in the docs: https://www.adriangb.com/scikeras/refs/pull/218/merge/advanced.html#loss-selection
Co-authored-by: Scott Sievert <[email protected]>
Codecov Report
@@ Coverage Diff @@
## default-loss-auto #218 +/- ##
==================================================
Coverage 99.72% 99.72%
==================================================
Files 6 6
Lines 730 740 +10
==================================================
+ Hits 728 738 +10
Misses 2 2
Continue to review full report at Codecov.
|
docs/source/quickstart.rst
Outdated
hidden_layer_dim=100, | ||
) | ||
|
||
clf.fit(X, y) | ||
y_proba = clf.predict_proba(X) | ||
|
||
|
||
Note that SciKeras even chooses a loss function and compiles your model | ||
(see the :ref:`Advanced Usage` section for more details)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change: "if you want to add Keras's [loss name], specify loss="loss_name"
".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I copied over the relevant section from the "Advanced" section instead:
To override the default loss, simply specify a loss function:
.. code-block:: diff
-KerasClassifier(model=model_build_fn)
+KerasClassifier(model=model_build_fn, loss="categorical_crossentropy")
Do you think that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't categorical_crossentropy
the default, or one of the defaults? Maybe use loss="kl_divergence"
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of the unsupported cases for loss="auto"
. But at the same time, it is relatively common in Keras. So I chose it because it's unsupported but also common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then 👍 to loss="categorical_crossentropy"
. I would add a short note about when it's relevant. I think that'd be especially useful in the quick start section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some detail.
@stsievert I think we're ready to merge this back into #208 and continue there if you are okay with that |
| 1 | >=2 | one-hot | unsupported | | ||
+-----------+-----------+----------+---------------------------------+ | ||
| > 1 | -- | -- | unsupported | | ||
+-----------+-----------+----------+---------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also add a note here saying "if you have more than one output, loss="categorical_crossentropy"
is likely desired" with maybe some links to an MNIST example or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
If your targets are one-hot encoded,
loss="categorical_crossentropy"
is likely desired.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Advanced Usage of SciKeras Wrappers | ||
=================================== | ||
============== | ||
Advanced Usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the section on loss selection in "advanced usage"? I suspect the average Keras user knows about loss/optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what it was historically named... I only shortened the name here in an attempt to get links working (and because the of SciKeras Wrappers
part is arguably redundant). I'm happy to change it, in another PR that should not block this one.
What does this PR implement?
It adds support for one-hot encoded targets when
loss="auto"
. If the classifier has the attributedn_classes_
(whichKerasClassifier.target_encoder_
provides by default), thenloss="categorical_crossentropy"
is provided.Reference issues/PRs
#208 I guess.