Downloading of sentence transformers #105

MNLubov · 2022-07-18T12:40:16Z

Basic downloading doesn't work for sentence-transformers. Is it possible to download model using direct url or url for config.json?

chengchingwen · 2022-07-19T08:10:21Z

Which code are you testing with?

MNLubov · 2022-07-19T10:58:35Z

I tested hgf"model_name:model_type" approach described in README.
Also I tried another approach

model_name = "all-MiniLM-L6-v2"
file_name = "config.json"
path = "d:/models/sentence transformers/all-MiniLM-L6-v2"
file_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, file_name)

Which allows to register config file. Then

cfg = load_config(model_name)
model_cons = Transformers.HuggingFace.get_model_type((Val ∘ Symbol ∘ lowercase)(model_type), (Val ∘ Symbol ∘ lowercase)(item))
model = load_model(model_cons, model_name; config=cfg)

Which gives me following error: Model Val{:model_task}() doesn't support this kind of task: Val{:model_task}(). I tried different model tasks. I also used "BertModel" as a task, which is given in architectures field of config file.
As far as I understand there is no model types/model tasks for sentence transformers

chengchingwen · 2022-07-19T11:50:30Z

Generally, It might be better to wait for #103.

Which gives me following error: Model Val{:model_task}() doesn't support this kind of task: Val{:model_task}(). I tried different model tasks. I also used "BertModel" as a task, which is given in architectures field of config file.

For "BertModel", you would need to pass Val(:model) to the function, or one of the following:

julia> map(first, HuggingFace.get_model_type(:bert))
(:model, :forpretraining, :lmheadmodel, :formaskedlm, :fornextsentenceprediction, :forsequenceclassification, :formultiplechoice, :fortokenclassification, :forquestionanswering)

chengchingwen · 2022-07-31T18:04:13Z

It is now merged and released in v0.1.19, please give it a try.

Let me know if there are any problems, feature requests, or any inconvenience when using that.

MNLubov · 2022-08-23T14:25:01Z

Hi, Peter. I've tested downloading sentence transformers. Approach described above works fine with BERT-like sentence-transformers. In this approach I download and register config file and model:

path = "d:/models/sentence transformers/all-MiniLM-L6-v2"
model_name = "all-MiniLM-L6-v2" 
config_file = "config.json"  
model_file = "pytorch_model.bin"

config_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, config_file)
model_hash = Transformers.HuggingFace.find_or_register_hgf_file_hash(path, model_name, model_file)

cfg = Transformers.HuggingFace.load_config(model_name)
model_cons = Transformers.HuggingFace.HGFBertModel
model = Transformers.HuggingFace.load_model(model_cons, model_name; config=cfg)

Using macro doesn't work for sentence transformers, only for "token" transformers. Macro works only if I downloaded and register config file and model file. If I use macro without pre-download sentence-transformer:

model_hfapi = hgf"all-distilroberta-v1:model"

I get:

 Info: No local config.json found. downloading...
ERROR: HTTP.Exceptions.StatusError(404, "GET", "/all-distilroberta-v1-config.json", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
Content-Type: application/xml
Transfer-Encoding: chunked
Connection: keep-alive
Date: Tue, 23 Aug 2022 14:18:41 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 bccded73b8b9a1d038e5d874cf586402.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: HEL50-C1
X-Amz-Cf-Id: dldXOXWkBtTB714zBVOMwiL4xIDRiGrjzc5BB46_yPYXker_lPo4dA== """)

In the case of using macro with adding 'sentence-transformers' prefix to model name:

[ Info: No local config.json found. downloading...
ERROR: HTTP.Exceptions.ConnectError("https://cdn.huggingface.co\\sentence-transformers/all-distilroberta-v1\\config.json", DNSError: cdn.huggingface.co\sentence-transformers, unknown node or service (EAI_NONAME))

chengchingwen · 2022-08-25T08:03:49Z

You seems to be using the old version, please update to the latest one. And it should be used like this:

using Transformers
using Transformers.Basic
using Transformers.HuggingFace

textenc = hgf"sentence-transformers/all-MiniLM-L6-v2:tokenizer"
model = hgf"sentence-transformers/all-MiniLM-L6-v2:model"

sentences = ["This is an example sentence", "Each sentence is converted"]
a = encode(textenc, sentences)
model_outputs = model(a.input.tok; token_type_ids = a.input.segment, attention_mask = a.mask)

MNLubov · 2022-08-29T12:45:07Z

@chengchingwen, How could I load HuggingFace model from file stored locally? Is loading from HuggingFace portal and then saving it as .bson file the best way to do it?

chengchingwen · 2022-08-29T14:02:30Z

It really depends on the scenario. Would need to expand on that (e.g. what do you have and what do you want to achieve).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading of sentence transformers #105

Downloading of sentence transformers #105

MNLubov commented Jul 18, 2022

chengchingwen commented Jul 19, 2022

MNLubov commented Jul 19, 2022 •

edited

Loading

chengchingwen commented Jul 19, 2022

chengchingwen commented Jul 31, 2022

MNLubov commented Aug 23, 2022

chengchingwen commented Aug 25, 2022

MNLubov commented Aug 29, 2022 •

edited

Loading

chengchingwen commented Aug 29, 2022

Downloading of sentence transformers #105

Downloading of sentence transformers #105

Comments

MNLubov commented Jul 18, 2022

chengchingwen commented Jul 19, 2022

MNLubov commented Jul 19, 2022 • edited Loading

chengchingwen commented Jul 19, 2022

chengchingwen commented Jul 31, 2022

MNLubov commented Aug 23, 2022

chengchingwen commented Aug 25, 2022

MNLubov commented Aug 29, 2022 • edited Loading

chengchingwen commented Aug 29, 2022

MNLubov commented Jul 19, 2022 •

edited

Loading

MNLubov commented Aug 29, 2022 •

edited

Loading