Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add Support for vicuna-13b-delta-v1.1 #96

Open
DK013 opened this issue Apr 21, 2023 · 13 comments
Open

[Feature request] Add Support for vicuna-13b-delta-v1.1 #96

DK013 opened this issue Apr 21, 2023 · 13 comments
Labels
enhancement New feature or request

Comments

@DK013
Copy link

DK013 commented Apr 21, 2023

**support for vicuna-13b-delta-v1.1
NOTE: It's not listed in transformers supported models list but it does work with transformers

Reason for request
With the upcoming WebGPU support in ONNXRuntime I believe it'll be really helpful to have an LLm support for browser based applications and this repo is the best solution we got so far.

Additional context
I've been working on an AI assistant made in electron and cordova for desktop and mobile platforms respectively. I'm already using TransformerJS with whisper for speech-to-text. I intend to switch to WebGPU with JSEP as soon it's available, so I can leverage the GPU compute capabilities to run larger models. I'm trying to build the project with as much opensource resources as possible and having an LLM support would be real nice instead of using openai apis. This keeps the project cost free for users and user's data-privacy is another benifit. I'm really looking forward to see if this is gonna be possible. I'm willing to contribute as much as I can being a complete novice to the ML community.

Thanks in advance

@DK013 DK013 added the enhancement New feature or request label Apr 21, 2023
@xenova
Copy link
Collaborator

xenova commented Apr 21, 2023

I believe it'll be really helpful to have an LLm support for browser based applications and this repo is the best solution we got so far.

Agreed! WebGPU for onnxruntime-web is almost here (see microsoft/onnxruntime#14579), and Transformers.js will support it when ready! There will be a massive announcement when it does drop! As for now, it is just a matter of waiting 😅 ...

@DK013
Copy link
Author

DK013 commented Apr 25, 2023

Agreed! WebGPU for onnxruntime-web is almost here (see microsoft/onnxruntime#14579), and Transformers.js will support it when ready! There will be a massive announcement when it does drop! As for now, it is just a matter of waiting 😅 ...

It's here !! They just merged the [js/web] WebGPU backend via JSEP #14579 few hours ago into the main brunch. No official release yet. Looks like @fs-eire openned another pull request for code cleanup and some small fixes. But we can build from the main brunch and start coding 😄

@fs-eire
Copy link
Contributor

fs-eire commented Apr 25, 2023

it takes some time and effort from enabling building from source, to including in the NPM package, to release as experimental feature, and then to final release.

I will keep working on stability, performance and coverage of the webgpu backend operator implementation in ort-web. this is going to be long-term work. Please feel free to "@" me or submit github issues to onnxruntime for feedback.

@xenova
Copy link
Collaborator

xenova commented Apr 27, 2023

@fs-eire I have been following the build instructions from here, and I think I've sorted out most issues regarding dependencies, versions, etc.

However, when running in the browser, I get the error JS execution provider is not supported in this build. Understandably, the docs have not yet properly been updated, so, would it be possible for you to provide steps to build from source, including what build arguments I should use? Thanks!

@DK013
Copy link
Author

DK013 commented Apr 27, 2023

I've been looking at the updated files and under <ORT_ROOT>/tools/ci_build/build.py I can see there's arguments for use_jsep but it works within build_wasm from what I can tell checking the yml configs for CI.
So I guess the build command will look something like this: ./build.bat --build_wasm --enable_wasm_simd --use_jsep --target onnxruntime_webassembly.
However, if you've cloned the repo as instructed here, chances are you don't have the latest source and the --use_jsep argument will fail. Simply download zip from main brunch and replace the files with latest version in local.
@xenova Give it a try

[UPDATE]: Here's what worked for me...
Build 4 times with these args:

  • ./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti
  • ./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_threads
  • ./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_simd
  • ./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_threads --enable_wasm_simd

Then copy files as instructed in the documentation and build npm package.

I'm attaching my successful npm package build zip here just in case if you wanna be lazy. 😘
ort.zip

@xenova
Copy link
Collaborator

xenova commented Apr 27, 2023

However, if you've cloned the repo as instructed here, chances are you don't have the latest source and the --use_jsep argument will fail. Simply download zip from main brunch and replace the files with latest version in local.

That was it! Commit history says that change was made yesterday, so I was one day out of date haha.

The package is being built now, and I will hopefully have more updates later today 🎉

I'm attaching my successful npm package build zip here just in case if you wanna be lazy. 😘
ort.zip

This will come in handy! Thanks!

@xenova
Copy link
Collaborator

xenova commented Apr 27, 2023

So, I did that all, but it doesn't seem to run :/ Here's the error I get: microsoft/onnxruntime#15719

@DK013 If you're able to, could you try run the model linked here with the following input:

let input = {
    attention_mask: new Tensor(
        'int64',
        new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
        [1, 12]
    ),
    input_ids: new Tensor(
        'int64',
        new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
        [1, 12]
    )
}

or see here for a full demo to test with.

@fs-eire
Copy link
Contributor

fs-eire commented Apr 28, 2023

@DK013 please use the workaround as described in microsoft/onnxruntime#15719 (comment) . I am working on a solution to fix the issue.

@DK013
Copy link
Author

DK013 commented May 5, 2023

@xenova As I can see fs-eire is working on the issues we've encountered before, and I'm running behind schedule on my own project, I'm implementing transformers.js with cpu for now in my code. Mainly I need whisper (a little more than the base model hopefully) to work right now and a suitable LLM model later on. So I'm gonna go ahead and complete the basic codes for testing right now, and wait for once you guys are done with polishing webgpu.
If you need any input from me, just let me know. 😉

@fs-eire
Copy link
Contributor

fs-eire commented May 5, 2023

@DK013 The PR mentioned above is merged, and another bugfix is in PR: microsoft/onnxruntime#15819.

@matthoffner
Copy link

Catching up on this issue, does this mean there is conversational model support with the onnxruntime PRs? The README shows it as not supported yet. Thanks for any clarification!

@xenova
Copy link
Collaborator

xenova commented May 31, 2023

does this mean there is conversational model support with the onnxruntime PRs?

The vicuna-13b-delta-v1.1 is categorized as a text-generation model (not a conversational model), which is supported by Transformers.js. The distinction (which mainly lies in how they are used) is subtle, as both can be used for "conversations". For more information, see:

@DK013
Copy link
Author

DK013 commented Jun 2, 2023

@xenova heyo, I've been a bit busy with my own projects and running the business and all. what's the status of webgpu? How are your tests going?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants