-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first version of save_to_remote for HF from FarmReader #2618
Conversation
@TuanaCelik I recently merged a drastic change to the CI, so I recommend you merge master before proceeding with this one. Thanks 😊 |
@TuanaCelik do you have some test notebook / code to try this out? |
So usage would be: from haystack.nodes import FARMReader
reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True)
data_dir = "data"
reader.train(data_dir=data_dir, train_filename="squad_small.json", use_gpu=True, n_epochs=1, save_dir="my_model", batch_size=4)
reader.save_to_remote(model_name='model_from_haystack', private=True) And reloading the model from HF new_reader = FARMReader(model_name_or_path='HF_USER_NAME/model_from_haystack', use_auth_token=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks already pretty good! Let's not save FARM-related stuff and add some docstrings. Then it's good to go for a first quick-win version.
If the repo already exists it will overwrite the model there. So as a next iteration, adding a version parameter would be an option. |
Thanks @tstadel - I'll add the docstrings and change the self.save as suggested. 2 things I would like to clarify:
In short, these could be smaller PRs and changes later. What would be best? Do all in one or merge this simple version with just FARMReader first (after some minor changes + docstrings) |
I'd say let's do all of the extensions in separate PRs. This PR would already create value if we released it tomorrow. |
@TuanaCelik do you have time to work on this? If not I could take over from here... |
@tstadel - working on it this week so can be left with me :) |
@ZanSara @tstadel - I've made changes based on the comments. 2 things: |
…haystack into hf-integration-tuana # Conflicts: # haystack/nodes/reader/farm.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that pylint spotted this during CI: we should avoid positional args when calling create_repo
. Please switch to keyword args.
Sorry to bother you again. I executed my test script again and I got the following warning:
Let's briefly discuss whether we want to use |
Please also have a look at this comment on storing models in FARM format on the model hub and failing loading: https://github.com/deepset-ai/haystack-hub-api/issues/977#issuecomment-1173439438 |
@julian-risch we should be safe here as we first convert the model to transformers and then upload. |
huggingface-hub package would have to be v0.5 or higher - checking how to handle with Thomas
…haystack into hf-integration-tuana
We changed the huggingface-hub version requirements in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* first version of save_to_remote for HF from FarmReader * Update Documentation & Code Style * Changes based on comments * Update Documentation & Code Style * imports order * making small changes to pydoc * indent fix * Update Documentation & Code Style * keyword arguments instead of positional * Changing to repo_id huggingface-hub package would have to be v0.5 or higher - checking how to handle with Thomas * Update Documentation & Code Style * adding huggingface-hub dependency 0.5 or above Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <[email protected]>
Fixes #2416
This draft PR introduces the
save_to_remote()
function to theFarmReader
It converts a model to the transformer models format using the
convert_to_transformer
method from theAdaptiveModel
class.I have tested this by fine-tuning a model and then calling
save_to_remote
- then loading that same model back intoFarmReader
and using it in anExtractiveQuestionAnswering
pipeline(PS: sentence-transformers implements this in their own repo and I was able to reuse their method of tracking files for lfs)
To do and test
create_repo
method from huggingface_hub has anexists_ok
argument which I've set to True but this still needs testingDensePassageRetriever
andTableTextRetriever
Status (please check what you already did):