-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: support faster inference methods #8
Conversation
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
after further thought I am going to update the UI/app in a later round of improvements bc I don't have time for it these days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look fine from me self-reviewing what I did.. will clone into a colab and test basics to make sure
This would be awesome! Does it handle the conversion of a huggingface model to optimized onnx pretty automatically? Or do you need to convert manually first, and it supports loading in an onnx model? |
thanks for your interest
yep it's pretty automatic (for ONNX runtime at least). Basically the same as ONNX models for inference. One caveat though - this PR "enables" ONNX support, but doesn't mean that I have gone through and validated that ONNX inference itself is fine vs. standard, so any issues with the 'base' conversation/support in ONNX would still show up here. My take is that there may be some issues that pop up and need to be resolved at the source ONNX code, simply because ONNX inference with long-context models probably wasn't happening much in the past. For example: LongT5 is supported officially by ONNX, but when testing with long-t5-base I find that the last batch of tokens I run inference with results in like .. 20% of the decoded letters being the letter "d" for seemingly no reason?? Earlier batches are fine, so.. YMMV and test/validate before using. I'll try and add a note somewhere about that |
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
#9 will now be handled by this |
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
oook finally happy with it |
support faster and advanced inference methods
torch.compile