Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Helsinki Multilingual models errors when using required >>id<< tokens #70

Closed
takoyaro opened this issue Apr 6, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@takoyaro
Copy link

takoyaro commented Apr 6, 2023

Describe the bug
A clear and concise description of what the bug is.

Helsinki multilingual models (Helsinki-NLP/opus-mt-en-mul and Helsinki-NLP/opus-mt-mul-en) require a specific token in the shape of >>id<< as per the models documentation. However when using those tokens will cause the models.js file to throw an error:

models.js:73 An error occurred during model execution: "Error: failed to call OrtRun(). error code = 6.".

How to reproduce
Steps or a minimal working example to reproduce the behavior

const pipe = await pipeline('translation','Helsinki-NLP/opus-mt-en-mul')
const result = await pipe(">>jpn<< I love transformer.js, it's a wonderful library");

Expected behavior
A clear and concise description of what you expected to happen.

I expect to get the translated text

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.
Screenshot 2023-04-06 at 11 27 58

Environment

  • Transformers.js version: 1.4.0
  • Browser (if applicable): Arc 0.96.0
  • Operating system (if applicable): macOS Ventura 13.2
  • Other:

Additional context
Add any other context about the problem here.

It should be noted that the models do not error without the >>id<< token and run just fine without them (although the translation is obviously wrong because the source/target language wasn't specified)

@takoyaro takoyaro added the bug Something isn't working label Apr 6, 2023
@xenova
Copy link
Collaborator

xenova commented Apr 6, 2023

You're right! I didn't account for that, my bad! Let me work on that.

Here's the part of the code I missed:
https://github.com/huggingface/transformers/blob/12d51db243a00726a548a43cc333390ebae731e3/src/transformers/models/marian/tokenization_marian.py#L204-L213

@ldenoue
Copy link

ldenoue commented Apr 6, 2023

Could this be related to #66 ?

@xenova
Copy link
Collaborator

xenova commented Apr 6, 2023

Could this be related to #66 ?

Although the error message is the same, I think that's just a general error that onnxruntime throws when something goes wrong :/ That's the reason I added the extra logging statements so that we can perhaps see what the actual issue is.

One thing that might be causing it is an out-of-memory issue which can happen if too many beams are being run (which happens since the default config, which comes from the python library, has num_beams set to something > 1). I'll have to look into it a bit more though.

@xenova
Copy link
Collaborator

xenova commented Apr 6, 2023

12163dd should fix this 👍 Let me know if it works!

Here's some example code:

const pipe = await pipeline('translation', 'Helsinki-NLP/opus-mt-en-mul');
const result = await pipe(">>jpn<< I love pizza", {
    do_sample: false,
    num_beams: 1
});
// Outputs: [{translation_text: 'ピザを愛してる'}]

I will close the issue once I publish a full release.

@takoyaro
Copy link
Author

takoyaro commented Apr 6, 2023

You're right! I didn't account for that, my bad! Let me work on that.

Here's the part of the code I missed: https://github.com/huggingface/transformers/blob/12d51db243a00726a548a43cc333390ebae731e3/src/transformers/models/marian/tokenization_marian.py#L204-L213

Good catch!

12163dd indeed fixes the issue, thank you!

@xenova
Copy link
Collaborator

xenova commented Apr 6, 2023

Awesome! Closing now 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants