Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading a .save()'d model with revision and trust_remote_code re-downloads code at runtime? #2613

Open
thoughtpolice opened this issue Apr 25, 2024 · 4 comments

Comments

@thoughtpolice
Copy link

thoughtpolice commented Apr 25, 2024

Hello,

Thanks for this library, it's pretty nice! I have a project using it that basically exposes sentence embeddings as a stateless HTTP API. In order to make distribution easy and consistent, it packages supported models directly with the code inside the distributed docker images I build. I use .save() in order to save all the models in a first step, then put them in the docker image in a second step. The intent is that the server will never need to download any new files and can run completely isolated from the public internet if you use the docker image, and all model versions will always be consistent.

https://github.com/thoughtpolice/embedding-server

This is currently using sentence-transformers 2.6.1

One of the models I'm using is nomic-ai/nomic-embed-text-v1, which requires trust_remote_code=True. In order to keep things hermetic, I also provide a revision argument for the HF repository to pin it to a specific revision.

However, it seems like the .save() call does not actually save the runtime configuration code appropriately or something, so every subsequent run of the docker container immediately re-downloads the remote code and runs it.

For example, try the following command:

docker run --rm -it -p 5000:5000 ghcr.io/thoughtpolice/embedding-server:1172b14b6fd594b9b4f2eeb1aa4abfd67f158c71

You'll see this:
 

Loading models from '/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data'...
configuration_hf_nomic_bert.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96k/1.96k [00:00<00:00, 12.1MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-embed-text-v1:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_hf_nomic_bert.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.6k/52.6k [00:00<00:00, 6.79MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-embed-text-v1:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
<All keys matched successfully>
[2024-04-25 00:31:53 +0000] [1] [INFO] Running on http://0.0.0.0:5000 (CTRL + C to quit)

Looking at the provided directory, neither of these files are there:

$ find /nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/modules.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/model.safetensors
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/README.md
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/vocab.txt
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/config_sentence_transformers.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/tokenizer.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/special_tokens_map.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/2_Normalize
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/sentence_bert_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/1_Pooling
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/1_Pooling/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/tokenizer_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/modules.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/model.safetensors
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/README.md
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/vocab.txt
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/config_sentence_transformers.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/tokenizer.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/special_tokens_map.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/2_Normalize
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/sentence_bert_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/1_Pooling
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/1_Pooling/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/tokenizer_config.json

So this means the download is incomplete, for some reason, i.e. .save() does not save all the needed files, in particular the two .py files in the upstream HF repo: https://huggingface.co/nomic-ai/nomic-embed-text-v1/tree/main

This is especially misleading because the warning message suggests to use revision to pin the version of the file, but it doesn't work in the case of .save() I guess (maybe there are some call sites where the revision argument was forgotten?)

I actually tried to fix this but to no avail. It seems that just putting the files in the data directory next to the weights isn't enough; it will redownload those files to $HF_HOME, instead, and load them from there, so this fix does not work.

I'm afraid I'm not very good at Python and don't know if this is a bug in sentence-transformers, my code, or the distributed nomic HF model.

The code for saving and loading models is done here. It basically just calls .save or the constructor with the provided model arguments, and the build system calls --save-models-to in order to put them all in a directory. But I could have gotten it wrong: https://github.com/thoughtpolice/embedding-server/blob/main/embedding-server.py#L116-L148

This isn't a huge deal for me right now, but it would be nice if this could work since I did like that the server could run 100% offline. Thanks.

@tomaarsen
Copy link
Collaborator

Hello!

I see you've already gone through a lot of steps to figure out what's happening here. Indeed, when saving a model whose modeling/configuration/tokenizers are custom and stored in a remote repository, these will not be saved as "local" in the resulting directory. Instead, the configuration files in this directory just point to the remote repository where these files can be found. This inhibits easy "out of the box" offline mode with these models, indeed.

This is a design decision in transformers (i.e., to not download those files and update the config.json to point to the local files instead), and I don't think it's one that will be changed.

As for revision: I assume that only refers to the repository that we're actively downloading from, but not the repository from which the modeling/config/tokenizer files are pulled from. I agree that the To avoid downloading new versions of the code file, you can pin a revision. is not correct in this example. This warrants an issue in https://github.com/huggingface/transformers/

For your specific case, we can resolve the problem like so:

  1. Download a model of your choice with model name $M$.
  2. Save the model in output directory $O$
  3. Check if it has an auto_map option in config.json or tokenizer_config.json.
  4. If it does:
    4a. Split each value on --. If you get 2 splits, the first is the repository and the latter is the path to a class. If you get 1, that's just the path to a class and you should take $M$ as the repository.
    4b. use hf_hub_download to download the files and store them in $O$
    4c. update the config.json and tokenizer_config.json to remove the repository if it existed, so only the (local) path to the class remains.
  5. If it does not, do nothing.

With that recipe, you'll be able to "fully" download a model. After which, I think you should be able to load it with local_files_only=True to prevent it from making any calls (this is not yet released, so you'd have to use pip install git+https://github.com/UKPLab/sentence-transformers.git@993437ab8514dca149adb4bf90f217224c07aff3 - although I'm not sure you can specify a commit like that).

Also, there's only 2 models that are commonly used that rely on custom modeling files: Jina and Nomic, so you won't encounter this often at all. Hopefully you can apply this recipe to get it working nicely for your use case.
A feature request to get this behaviour automatically when calling save would be best suited for transformers, as all of the saving/loading functionality for custom models exists there, and any changes/improvements should then work (nearly) out of the box in Sentence Transformers.

  • Tom Aarsen

thoughtpolice added a commit to thoughtpolice/embedding-server that referenced this issue Apr 25, 2024
Due to a design choice in Transformers, remote-code embedded in a model on HF is
downloaded at runtime and this cannot be disabled, despite our attempts to
download it at build time. A practical result of this is that the built Docker
container image needs networking access to download the remote code, which is
not ideal.

This fix is a workaround to download the remote code at build time, and embed it
into the model by updating the `config.json` file appropriately. This should
only be needed for Nomic-style models, apparently, since they are some of the
only embedding models that use remote code.

Many thanks to Tom Aarsen for providing me with the steps to fix this.

Relevant issue: UKPLab/sentence-transformers#2613 (comment)
@thoughtpolice
Copy link
Author

Really appreciate the help and pointers, with your steps I was quickly able to fix this using sed + gron to quickly update the config: thoughtpolice/embedding-server@2c3968b

I'll try to take some time and write up a FR for transformers when I get the chance, so that other people in the future don't need to do this. Thank you!

@VladKha
Copy link

VladKha commented Nov 17, 2024

Adding a simple code snippet implementing @tomaarsen's steps from above

import json
from pathlib import Path
from huggingface_hub import hf_hub_download, snapshot_download
from sentence_transformers import SentenceTransformer

model_name = 'jinaai/jina-embeddings-v3'

# steps from https://github.com/UKPLab/sentence-transformers/issues/2613#issuecomment-2076964416
# 1. download model
model = SentenceTransformer(model_name, trust_remote_code=True)

# 2. save the model
local_model_path = Path('hf-models') / model_name.replace('/', '-')
model.save(str(local_model_path))

# 3. check config files
for f_path in local_model_path.iterdir():    
    if f_path.name in ['config.json', 'tokenizer_config.json']:
        with open(f_path, 'r') as f:
            config_dict = json.load(f)

        # 4a. get repository and class_path from configs
        auto_map_configs = config_dict.get('auto_map', {})
        for config_name, config_value in auto_map_configs.items():
            splits = config_value.split('--')

            if len(splits) == 2:
                repository, class_path = splits
            elif len(splits) == 1:
                repository = model_name
                class_path = splits[0]
            else:
                raise ValueError(f'strange config {config_value}')
        
            # 4b. download required configs from hf hub
            snapshot_download(repo_id=repository, local_dir=local_model_path)

            # 4c. update config files
            config_dict['auto_map'][config_name] = class_path

        with open(f_path, 'w') as f:
            json.dump(config_dict, f)

Sample test code for offline usage (recommended to test in a separate fresh environment, with newly installed transformers & sentence-transformers):

import os
os.environ['HF_HUB_OFFLINE'] = '1'

from pathlib import Path
from sentence_transformers import SentenceTransformer

model_name = 'jinaai/jina-embeddings-v3'
local_model_path = Path('hf-models') / model_name.replace('/', '-')
model = SentenceTransformer(str(local_model_path), trust_remote_code=True, local_files_only=True)

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)

Note: hf_hub_download wasn't enough for step 4b since some classes from configs can import code from other files in the same repo (like jinaai embeddings above) - so the whole repo needs to be downloaded.


@tomaarsen, thank you very much for describing the steps to implement this!
Do you believe it still makes sense to open a separate FR in the huggingface/transformers for this?

I would be happy to raise it there properly if so.
I believe some variation of this was submitted before in huggingface/transformers#22260, but didn't get traction since required more clarifications / instructions to reproduce.

I have a simple real popular use case - Kaggle Notebook environments used in competitions.
Some Kaggle competitions require submitting code in Kaggle Notebooks, which are run later on private data and don't allow internet access.
Practically, this means you must prepare all models in advance and upload them as dependencies to the submissions notebook. So having transformers trying to reach out to hf-hub (even though when the model is already pre-downloaded) is not an option and just disqualifies a group of models from usage.

@tomaarsen
Copy link
Collaborator

Nice work @VladKha!

I think a feature request could still make some sense, but perhaps it's considered a bit too niche - very few models use third party repositories to host their code.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants