Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart execution providers #35

Merged
merged 6 commits into from
Mar 22, 2023
Merged

Smart execution providers #35

merged 6 commits into from
Mar 22, 2023

Conversation

DavidGOrtega
Copy link
Contributor

@DavidGOrtega DavidGOrtega commented Mar 20, 2023

The purpose of this PR is:

  • Best effort to use the gpu over cpu if not fallbacks to wasm
  • Use onnxruntime-node in case is installed as dep in any node app using transformers.js (5x faster than WASM). If not installed it will falback to WASM provider.
  • Adds the hack for this issue setting numThreads to one
  • Use onnx executorProviders fallback chain to improve faster inference in browser if the model supports it.
  • Wraps multiple onnxruntime-web requirement dependancy under one file

@DavidGOrtega DavidGOrtega marked this pull request as draft March 20, 2023 18:47
@DavidGOrtega
Copy link
Contributor Author

@xenova This should do the trick. I need to test the web and check that the fallback is proceeding fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency is correct :)

@xenova
Copy link
Collaborator

xenova commented Mar 20, 2023

This is great! Thanks for putting the time in to get it working. I'll test on my side and merge as soon as I can :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the execution provided switch to wasm if the webgl backend fails? If so, then this is alright. If not, then I am slightly worried about using backends (webgl/cuda/webgpu) that do not fully support the necessary operations.

Can you confirm?

Copy link
Contributor Author

@DavidGOrtega DavidGOrtega Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONNX will fallback to the next one if its fails. However this behaviour is flaky and because f that I have not included all the backends. With the ones selected we should be ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're accessing ONNX through the tensor utils file, but I think it would be better if we create a separate file (e.g., backend.js or onnx.js) which handles the loading and fallbacks of the various imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same feeling but I did not want to be too big or out of the scope.
I think that we need also to review tensor_utils, i.e. I do not think that we have to implement softmax ourselves

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - just some questions about how fallbacks are handled and organization details

@DavidGOrtega
Copy link
Contributor Author

@xenova Im not totally happy with the PR unless we remove all the backends and allow the user to install node bindings, which are much faster than wasm.

Also the fallback is not working if the model do not support a layer 🤦

I think that we should provide a way to expose the desired executor.
So I would change the PR to do this.

let ONNX;
let executionProviders = [ 'wasm' ];

try {
    ONNX = require('onnxruntime-node');
    executionProviders = [ 'cuda', 'cpu' ];
} catch (err) {
    ONNX = require('onnxruntime-web');
    if(typeof process === 'object') {
        // https://github.com/microsoft/onnxruntime/issues/10311
        ONNX.env.wasm.numThreads = 1;
    }
}

With the code below we have at least fixed a rough edge and allow the user to use node bindings if desired.

What do you think?

@xenova
Copy link
Collaborator

xenova commented Mar 21, 2023

Yeah I agree 👍 Once WebGPU releases, I'll be more focused on getting GPU support working (both for node and browser).

I'll merge this main into this PR, make some edits, then merge the PR back into main (hopefully soon haha). Thanks again for your contributions!

@xenova
Copy link
Collaborator

xenova commented Mar 22, 2023

I ran some tests to see what kind of speedup these changes make, and it's amazing!

Task Speedup
Text classification 1400%
Question answering 600%
Image-to-text 400%
Text-to-text generation 350%
Code generation 350%
Embeddings 325%
Masked language modelling 300%
Translation 225%
Summarization 200%
Zero-shot image classification 200%
Image classification 175%
Text generation 150%

🎉

I'm doing some final merging and will hopefully get it published soon :)

@xenova
Copy link
Collaborator

xenova commented Mar 22, 2023

@DavidGOrtega Can you grant me write access to your fork? I would like to push the changes without having to create a new fork and make a PR. (I think you can just add me as a collaborator?)

xenova added a commit that referenced this pull request Mar 22, 2023
Smart execution providers (Merges #35 into main)
@xenova xenova merged commit cd6aafe into huggingface:main Mar 22, 2023
@xenova
Copy link
Collaborator

xenova commented Mar 22, 2023

Got it working, PR merged! 🎉 Thanks again for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Current use of execution providers is suboptimal
2 participants