-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model request] Helsinki-NLP/opus-mt-ru-en (marian) #63
Comments
lol it's in the readme okeyyyyyyyyy |
Haha no worries :) However, the model type you are wanting to run is of type "marian", which currently isn't supported. Fortunately, it is very similar to some existing models, so, it should be quite simple to add. I'll reopen the issue, and then close it once I add support for it. |
Ha ! thank you ! I've tried
But i don't really know what i am doing. It failed with an error :
|
Make sure you're on the dev branch of https://github.com/huggingface/optimum - this was fixed in the past few days:
But even if you get the conversion working, since the model isn't supported yet, it will throw an error. No worries though! I'll get it supported :) |
While I look into this, can you provide some sample code for running the model (and its expected output)? |
Something like that expected output is an english sentence :) |
Thanks 👍 I think I've got a basic version working. I'll do some testing tomorrow, and hopefully have a release out either tomorrow or the day afterwards 🔥 |
Wow, thank you man ! |
Hi ! I just finish some tests with your implementation. Technically, sound's good, but the quality of the translation is really far away from the one i have at hugging face with the same model, i.e https://t.ly/g7f_ I've
Then
Then, run this code
Crazy output :)
If a run the node code twice, it crash
Any suggestion ? |
Hey 👋 Yeah I'm still running more tests before pushing a release. I did recently fix an issue where it wouldn't use the correct default sampling settings (I will push this to main shortly). In the meantime, you can set the sampling settings using this: let output = await pipe("Впервые в истории США экс-президент становится фигурантом уголовного дела. Большое жюри Манхэттена в четверг вечером проголосовало за предъявление Дональду Трампу обвинения по делу о предполагаемых выплатах порнозвезде Сторми Дэниелс. Решение жюри стало неожиданностью как для экс-президента, так и для его команды, считавшей, что прокурор откажется от этого дела из-за его крайней запутанности. Теперь бывший президент должен явиться в суд для дачи показаний.", {
top_k: 0,
do_sample: false,
num_beams: 1
}); Running your code with these new changes produces the following text:
Although the English isn't perfect, it definitely is better than what you are seeing haha. One potential reason for the slight inaccuracies is the quantization, so, if you are able to avoid quantizing the model, you would probably get better results (you can do this by removing EDIT: Could you sent the input text for the output of the message which contains "Republic of Korea"? You seem to have sent the original text twice. |
I removed the
Got this. Better than quantized one, but your result is more accurate
And then, i can't run twice the nodejs code:
I check differences of model's files just after the first run of the code, then after : no diff. I need to convert hugging model each time. I'm always using the same text for testing, i.e
I don't know why Republic of Korea appears in the translation. In fact i started to be really afraid about that, i don't wanna Kim Jong-un to be on my computer right now huhu :) |
OK, forget the error "Error: no available backend found. ERR: ". It was related to my #62 cache implementation. |
Oh goodness haha! Then, there is definitely something else wrong lol. I've pushed the latest changes so you can test on the same codebase as me. So, can you do the following things (in this order):
👍 |
Done. The result is ... let's say ... I died laughing
Let me try with your remote model files |
👀👀👀 Well... that is... unfortunate 🤣 I have received sponsorships before (see under my profile, it lists 3 names), so, it must be a GitHub bug 👀 |
Yep. But you accept koffee so ... done ! And it's just perfect with remote model files.
Because i plan to use lot's of Helsinki-NLP models for other languages, i really like to know how did you convert Helsinki-NLP files to transformers.js files exactly. You succeed with quantized files. Let me test again : python convert with --quantize and local models. |
Sorry for the delay 🤣 ... I've just been downloading the models to my local folder to test before I share the script. |
Hmmm... Looks like like the model from Google Colab is different to mine 😅 ... I'll see what the problem is and fix it! |
Hmm... after debugging for the past few hours, I'm unable to find the issue 😓 ... It's got something to do with the latest update(s) to optimum, which completely broke the conversion process... I'll look into it more tomorrow :/ |
After a ton of debugging, the fix was actually very simple! 4868918 Let me know if that works for you :) I will also release the other model files for other translations. |
I'm uploading a few of the 1440 models (https://huggingface.co/Xenova/transformers.js/tree/main/quantized/Helsinki-NLP). I just chose the most popular ones, so there will be some missing, but you can just convert your own models. If you want, you can also make a PR to add those models to the base repo too :) Here is some example usage code: let pipe1 = await pipeline('translation', 'Helsinki-NLP/opus-mt-ru-en');
let output1 = await pipe1("Привет, как дела?", {
top_k: 0,
do_sample: false,
num_beams: 1,
});
console.log(output1);
// Outputs: [{translation_text: "Hey, how's it going?"}]
let pipe2 = await pipeline('translation', 'Helsinki-NLP/opus-mt-de-en');
let output2 = await pipe2("Hallo, wie geht's dir?", {
top_k: 0,
do_sample: false,
num_beams: 1
});
console.log(output2);
// Outputs: [{translation_text: 'Hello, how are you?'}] I will close this issue once I publish the next release (v1.4.0) 👍 |
Marian models are now officially supported in v1.4.0 🥳 Closing the issue! 😄 |
Huge thanks man. I'll test as soon as possible. |
Hi @xenova ! Hope you are well. I'm testing right now, but i can't. Error:
Code:
package.json
Node version: I tried things below, same error:
Are you able to reproduce ? |
It's fine when using remote model |
Sorry for this noob question, can somebody give me a kind of guideline to be able to convert and use
https://huggingface.co/Helsinki-NLP/opus-mt-ru-en/tree/main
thank you
The text was updated successfully, but these errors were encountered: