-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH - Gets SciKeras script working #394
base: main
Are you sure you want to change the base?
Conversation
WIP still need to fix one test
It looks like all the pytest tests are failing with |
Should be addressed by #398 |
@lazarust Could you please solve the merge conflict? Regarding the uncovered new line: Would that be solved by adding tests for scikeras? |
@BenjaminBossan I've fixed the merge conflict. Yeah, I believe it would be but I was unsure if we wanted to have tests that included another library like that. Should I add one? |
Yes, it would be good, since that was the initial reason for the change. We have external library tests, see here: https://github.com/skops-dev/skops/blob/main/skops/io/tests/test_external.py For scikeras, it won't be possible to add a comprehensive coverage of all possible models, so a simple example should be enough. If it's possible to add a unit test independently of scikeras that explicitly tests weakrefs, that would also be good, not sure how easy it is to do. Regarding the failing tests, at first glance, it appears to be a change in the model repr in the latest sklearn version, so not related to the changes here. |
@BenjaminBossan Sorry this took me so long to get back to. I've added a test to hit that line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the added scikeras tests. I just have a small request to improve them a little bit.
skops/io/tests/test_external.py
Outdated
|
||
pipeline = Pipeline([("classifier", clf)]) | ||
|
||
dump(clf, "keras-test.skops") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just dumping, could we please do a cycle of dumps
and loads
, similar to the other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I now have it dumping the model, loading it back in and comparing the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of dump
completely in favor of dumps
? That way, we also don't need to care about cleaning up any files created during the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Sorry about that. I didn't realize the difference between dumps
and dump
🤯
@adrinjalali The tests for sklearn nightly are failing because the model repr was changed (not sure why). Normally, we could fix that by having the test check the sklearn verison, but this is a doctest. Any idea how it can be fixed, short of skipping it whole? |
skops/io/tests/test_external.py
Outdated
|
||
X, y = make_classification(1000, 20, n_informative=10, random_state=0) | ||
clf.fit(X, y) | ||
dumped = dumps(clf, "keras-test.skops") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumped = dumps(clf, "keras-test.skops") | |
dumped = dumps(clf) |
2nd argument to dumps
is the compression level. Honestly, I'm surprised that this didn't raise an error.
Ugh, the list of trusted modules is giant now :D I guess it's related to the tensorflow change. Could you please explain why that was necessary? Also, we now get this error on CI:
|
@BenjaminBossan Yeah, sorry I realized I had marked the test method as a |
@BenjaminBossan After staring at this all day, I could use some help lol. For some reason, there's some infinite recursion happening when constructing the tree that I can't figure out why. Initially, I thought it was due to the |
I'll have to check when I'm back. Still off till end of November. But I remember |
information for CachedNodes I'm wodering after looking at the types of a lot of the CachedNodes if there's something weird happening with `None`
@adrinjalali I created an issue in tensorflow to discuss this and make sure I wasn't missing anything tensorflow/tensorflow#68194. It seems like the best way forward is to ignore the warnings for now until Tensorflow supports |
Ignoring warnings in cases where we know why they're happening and with a comment as when the ignore statement should be removed is a normal practice indeed. Thanks for the follow up work on the TF side. |
d43cdd7
to
ce11bf0
Compare
@adrinjalali This should be ready for you to look at again. The failing tests seem to just be a blip with codecov |
9ff414a
to
ce11bf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts looking at this in more details (other than the inline comments)
- we are delegating the save / load to tensorflow? I'd like to see a test showing the user the right error when they try to load such a model without explicitly trusting including modules.
- we don't really include the inner modules in our json tree here, which means we have no idea what's inside that keras model, which means we're prone to massive exploits, this seems it beats the purpose of this format.
@lazarust you've done great work so far, let me know if you need me to have a more detailed look. I haven't personally tried to solve this project for TF, but I could have a look.
skops/io/_scikeras.py
Outdated
|
||
with tempfile.TemporaryDirectory() as temp_dir: | ||
file_name = os.path.join(temp_dir, "model.keras") | ||
obj.model.save(file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we only save the model
attribute? This sounds odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of https://keras.io/guides/serialization_and_saving/, it seems that Keras is compressing all the pieces of the models into the .keras
file. Should I change the name of the file to make it more clear?
@adrinjalali I think I've addressed most of your comments, and apologize it took me a bit to get back to this. For
I'm a little confused by what you mean by inner modules... could you elaborate on what you mean? Sorry this PR has been taking so long to get ironed out! |
@lazarust I went down the rabbit hole of reading the persistence code from keras, and I think it's easier if I push to this PR some changes. So I'll update this one, and you can review the work if that's okay. |
@adrinjalali Sounds good to me, let me know if there's any thing I can do to help! |
@lazarust , this is what I had in mind. However, this has a major issue: The user can use Which allows loading pickles through numpy. So we can only merge / support this, if we have a way to make sure there are no pickle objects in that zip file. |
@adrinjalali Ah the way you used tf directly makes sense to me. As for the pickle file issue, I don't think there's a good way of doing this... Do you know if the way TF loads the files only looks for |
I don't think it matters which files TF finds to load, as long as it allows loading pickle files. What we can investigate to move this further, is if we can reliably monkey-patch and disable the pickle machinery while loading a |
Yeah I agree, I guess I just thought if TF only tries to load Is monkey-patching TF something we'd want to support long term? If TF changed how model saving/loading worked in a future version we'd have to make updates anyways... |
It would be monkey patching |
Ah, yeah I don't think that would be too bad... we'd just need to monkey-patch the |
@adrinjalali Sorry I still haven't gotten to this. Have you made any progress on patching |
No I need to get to it soon |
WIP still need to fix one test
Reference Issues/PRs
Fixes #388
What does this implement/fix? Explain your changes.
This fixes a recursion error that was happening when dumping a scikeras model.
Any other comments?