-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Check input dimensions against the initialized model_ #143
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #143 +/- ##
==========================================
+ Coverage 99.52% 99.53% +0.01%
==========================================
Files 5 5
Lines 627 646 +19
==========================================
+ Hits 624 643 +19
Misses 3 3
Continue to review full report at Codecov.
|
@stsievert tagging you to take a look whenever you have a chance. This is addressing an issue you had opened (#106) |
scikeras/wrappers.py
Outdated
# generic transformers just to make them SciKeras compatible | ||
n_out_expect = getattr(self, "n_outputs_expected_", None) | ||
if n_out_expect: | ||
if n_out_expect != len(self.model_.outputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: I'd collapse this if-statement:
n_out_expect = getattr(self, "n_outputs_expected_", None)
if n_out_expect and n_out_expect != len(self.model_.outputs):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'll implement that.
Do you think we should redefine n_outputs_expected_: int
-> expected_output_shape_: Union[List[int], Dict[str, int]]
so that each element/key can map to the expected output shape for each output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should redefine n_outputs_expected_: int -> expected_output_shape_: Union[List[int], Dict[str, int]] so that each element/key can map to the expected output shape for each output?
I don't think so. An integer suffices for basic use cases (regressors and classifiers). I can only see output shape being relevant for autoencoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of multi-input or multi-output models (as to why it would be a list/dict).
As to why we'd need the expected output shape specifically, for classifiers, the output shape depends on the classes and type of problem (i.e. Dense(1, activation="sigmoid")
and Dense(2, activation="softmax")
are generally equivalent), which is why we can't just check the shape of X
like we do with the input layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of multi-input or multi-output models (as to why it would be a list/dict).
Maybe that's not a bad idea. I think the type of n_outputs_expected
should be Union[int, List[int], Dict[str, int]]
. That supports more complex models and simple user-defined transformers.
In this function that will mean something like this:
n_out = getattr(self, "n_outputs_expected", None)
if isinstance(n_out, int):
n_out = [n_out]
if n_out and n_out != len(self.model_.outputs):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is of course to revert to c7e6723 and try to continues this conversation in it's own issue/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like checking the model better. But why is n_outputs_expected_
necessary?
Another alternative is of course to revert to c7e6723
Do you mean delete the release from PyPI/GitHub? If I were handling releases, I think I'd rather delay any checks on n_outputs_expected_
to a later release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required. There's other (perhaps better) ways to achieve the same thing.
That commit is in this branch. I was suggesting we revert to that commit in this branch and table this discussion for another issue/PR since the rest of the changes here stand on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit is in this branch. I was suggesting we revert to that commit in this branch and table this discussion for another issue/PR since the rest of the changes here stand on their own.
Oh, I see. I think reverting and a separate issue is a good idea.
It's not required. There's other (perhaps better) ways to achieve the same thing.
I think those (perhaps better) ideas are worth exploring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I reverted to c7e6723 and implemented the collapsed logic in #143 (comment). I'll resolve this thread and we can continue the discussion on the rest of the PR. I also opened #148 to track the discussion surrounding output validation. Thank you for the feedback thus far!
Closes #106