Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first version of save_to_remote for HF from FarmReader #2618

Merged
merged 18 commits into from
Jul 4, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/_src/api/api/reader.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,26 @@ Saves the Reader model so that it can be reused at a later point in time.

- `directory`: Directory where the Reader model should be saved

<a id="farm.FARMReader.save_to_remote"></a>

#### FARMReader.save\_to\_remote

```python
def save_to_remote(model_name: str, hf_organization: Optional[str] = None, private: Optional[bool] = None, commit_message: str = "Add new model to Hugging Face.")
```

Saves the Reader model to Hugging Face Model Hub with the given model_name. For this to work:

- Be logged in to Hugging Face on your machine via transformers-cli
- Have git lfs installed (https://packagecloud.io/github/git-lfs/install), you can test it by git lfs --version

**Arguments**:

- `model_name`: Repository name of the model you want to save to Hugging Face
- `hf_organization`: The name of the organization you want to save the model to (you must be a member of this organization)
- `private`: Set to true to make the model repository private
- `commit_message`: Commit message while saving to Hugging Face

<a id="farm.FARMReader.predict_batch"></a>

#### FARMReader.predict\_batch
Expand Down
63 changes: 63 additions & 0 deletions haystack/nodes/reader/farm.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@
import multiprocessing
from pathlib import Path
from collections import defaultdict
import os
import tempfile
from time import perf_counter

import torch
from huggingface_hub import create_repo, HfFolder, Repository

from haystack.errors import HaystackError
from haystack.modeling.data_handler.data_silo import DataSilo, DistillationDataSilo
Expand Down Expand Up @@ -688,6 +692,65 @@ def save(self, directory: Path):
self.inferencer.model.save(directory)
self.inferencer.processor.save(directory)

def save_to_remote(
TuanaCelik marked this conversation as resolved.
Show resolved Hide resolved
self,
model_name: str,
hf_organization: Optional[str] = None,
private: Optional[bool] = None,
commit_message: str = "Add new model to Hugging Face.",
):
"""
Saves the Reader model to Hugging Face Model Hub with the given model_name. For this to work:
- Be logged in to Hugging Face on your machine via transformers-cli
- Have git lfs installed (https://packagecloud.io/github/git-lfs/install), you can test it by git lfs --version

:param model_name: Repository name of the model you want to save to Hugging Face
:param hf_organization: The name of the organization you want to save the model to (you must be a member of this organization)
:param private: Set to true to make the model repository private
:param commit_message: Commit message while saving to Hugging Face
"""
# Note: This function was inspired by the save_to_hub function in the sentence-transformers repo (https://github.com/UKPLab/sentence-transformers/)
# Especially for git-lfs tracking.

token = HfFolder.get_token()
if token is None:
raise ValueError(
"To save this reader model to Hugging Face, make sure you login to the hub on this computer by typing `transformers-cli login`."
)

repo_url = create_repo(
token, model_name, organization=hf_organization, private=private, repo_type=None, exist_ok=True
TuanaCelik marked this conversation as resolved.
Show resolved Hide resolved
)

transformer_models = self.inferencer.model.convert_to_transformers()

with tempfile.TemporaryDirectory() as tmp_dir:
repo = Repository(tmp_dir, clone_from=repo_url)

self.inferencer.processor.tokenizer.save_pretrained(tmp_dir)

# convert_to_transformers (above) creates one model per prediction head.
# As the FarmReader models only have one head (QA) we go with this.
transformer_models[0].save_pretrained(tmp_dir)
TuanaCelik marked this conversation as resolved.
Show resolved Hide resolved

large_files = []
for root, dirs, files in os.walk(tmp_dir):
for filename in files:
file_path = os.path.join(root, filename)
rel_path = os.path.relpath(file_path, tmp_dir)

if os.path.getsize(file_path) > (5 * 1024 * 1024):
large_files.append(rel_path)

if len(large_files) > 0:
logger.info("Track files with git lfs: {}".format(", ".join(large_files)))
repo.lfs_track(large_files)

logger.info("Push model to the hub. This might take a while")
commit_url = repo.push_to_hub(commit_message=commit_message)

return commit_url

def predict_batch(
self,
queries: List[str],
Expand Down