-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some plan #83
Comments
The new tokenizer api (using TextEncodeBase) is basically finished and included in the 0.1.16 release, though the gpt part is ignored for now. For the next step, I will be fixing the huggingface download issue with HuggingFaceApi.jl. Rewriting the attention layer might be breaking, so that would probably be the last one to do. Some other issue that might also need to be tracked:
|
@chengchingwen |
@MNLubov Are you looking for a specific model from HuggingFace? I'm trying to fix the huggingface module this month, so if everything goes well, it would be workable again before August. Just to clarify, even if that huggingface module is fixed, it's still possible that we don't have the implementation for that model type (by model type, I mean something like |
@chengchingwen Thanks for the clarification. Currently I am testing different sentence-transformers from Huggingface to find the most suitable for my purposes. As a temporary solution, I use PyCall to find the most suitable one. |
@MNLubov Yes. I haven't investigate the sentence-transformers implementation, but it seem that it can also be done with normal huggingface interface. Like this one https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2, it's a |
Here are some stuff I'm going to rewrite for the new release:
Basic.Vocabulary
withTextEncodeBase.Vocab
.and use StructWalk.jl to transform the state_dict. Remove the Pretrain submodule and use the huggingface one.feel free to add comments.
The text was updated successfully, but these errors were encountered: