Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading of sentence transformers #105

Open
MNLubov opened this issue Jul 18, 2022 · 8 comments
Open

Downloading of sentence transformers #105

MNLubov opened this issue Jul 18, 2022 · 8 comments

Comments

@MNLubov
Copy link

MNLubov commented Jul 18, 2022

Basic downloading doesn't work for sentence-transformers. Is it possible to download model using direct url or url for config.json?

@chengchingwen
Copy link
Owner

Which code are you testing with?

@MNLubov
Copy link
Author

MNLubov commented Jul 19, 2022

I tested hgf"model_name:model_type" approach described in README.
Also I tried another approach

model_name = "all-MiniLM-L6-v2"
file_name = "config.json"
path = "d:/models/sentence transformers/all-MiniLM-L6-v2"
file_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, file_name)

Which allows to register config file. Then

cfg = load_config(model_name)
model_cons = Transformers.HuggingFace.get_model_type((Val ∘ Symbol ∘ lowercase)(model_type), (Val ∘ Symbol ∘ lowercase)(item))
model = load_model(model_cons, model_name; config=cfg)

Which gives me following error: Model Val{:model_task}() doesn't support this kind of task: Val{:model_task}(). I tried different model tasks. I also used "BertModel" as a task, which is given in architectures field of config file.
As far as I understand there is no model types/model tasks for sentence transformers

@chengchingwen
Copy link
Owner

Generally, It might be better to wait for #103.

Which gives me following error: Model Val{:model_task}() doesn't support this kind of task: Val{:model_task}(). I tried different model tasks. I also used "BertModel" as a task, which is given in architectures field of config file.

For "BertModel", you would need to pass Val(:model) to the function, or one of the following:

julia> map(first, HuggingFace.get_model_type(:bert))
(:model, :forpretraining, :lmheadmodel, :formaskedlm, :fornextsentenceprediction, :forsequenceclassification, :formultiplechoice, :fortokenclassification, :forquestionanswering)

@chengchingwen
Copy link
Owner

It is now merged and released in v0.1.19, please give it a try.

Let me know if there are any problems, feature requests, or any inconvenience when using that.

@MNLubov
Copy link
Author

MNLubov commented Aug 23, 2022

Hi, Peter. I've tested downloading sentence transformers. Approach described above works fine with BERT-like sentence-transformers. In this approach I download and register config file and model:

path = "d:/models/sentence transformers/all-MiniLM-L6-v2"
model_name = "all-MiniLM-L6-v2" 
config_file = "config.json"  
model_file = "pytorch_model.bin"

config_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, config_file)
model_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, model_file)

cfg = Transformers.HuggingFace.load_config(model_name)
model_cons = Transformers.HuggingFace.HGFBertModel
model = Transformers.HuggingFace.load_model(model_cons, model_name; config=cfg)

Using macro doesn't work for sentence transformers, only for "token" transformers. Macro works only if I downloaded and register config file and model file. If I use macro without pre-download sentence-transformer:

model_hfapi = hgf"all-distilroberta-v1:model"

I get:

 Info: No local config.json found. downloading...
ERROR: HTTP.Exceptions.StatusError(404, "GET", "/all-distilroberta-v1-config.json", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
Content-Type: application/xml
Transfer-Encoding: chunked
Connection: keep-alive
Date: Tue, 23 Aug 2022 14:18:41 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 bccded73b8b9a1d038e5d874cf586402.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: HEL50-C1
X-Amz-Cf-Id: dldXOXWkBtTB714zBVOMwiL4xIDRiGrjzc5BB46_yPYXker_lPo4dA== """)

In the case of using macro with adding 'sentence-transformers' prefix to model name:

[ Info: No local config.json found. downloading...
ERROR: HTTP.Exceptions.ConnectError("https://cdn.huggingface.co\\sentence-transformers/all-distilroberta-v1\\config.json", DNSError: cdn.huggingface.co\sentence-transformers, unknown node or service (EAI_NONAME))

@chengchingwen
Copy link
Owner

You seems to be using the old version, please update to the latest one. And it should be used like this:

using Transformers
using Transformers.Basic
using Transformers.HuggingFace

textenc = hgf"sentence-transformers/all-MiniLM-L6-v2:tokenizer"
model = hgf"sentence-transformers/all-MiniLM-L6-v2:model"

sentences = ["This is an example sentence", "Each sentence is converted"]
a = encode(textenc, sentences)
model_outputs = model(a.input.tok; token_type_ids = a.input.segment, attention_mask = a.mask) 

@MNLubov
Copy link
Author

MNLubov commented Aug 29, 2022

@chengchingwen, How could I load HuggingFace model from file stored locally? Is loading from HuggingFace portal and then saving it as .bson file the best way to do it?

@chengchingwen
Copy link
Owner

It really depends on the scenario. Would need to expand on that (e.g. what do you have and what do you want to achieve).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants