Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: support faster inference methods #8

Merged
merged 36 commits into from
Jul 8, 2023
Merged

Conversation

pszemraj
Copy link
Owner

support faster and advanced inference methods

  • torch.compile
  • optimum onnx
  • use fire for the CLI

pszemraj added 30 commits April 4, 2023 00:20
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
@pszemraj pszemraj added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 16, 2023
@pszemraj pszemraj self-assigned this Jun 16, 2023
@pszemraj pszemraj marked this pull request as draft June 16, 2023 13:34
@pszemraj
Copy link
Owner Author

after further thought I am going to update the UI/app in a later round of improvements bc I don't have time for it these days

Copy link
Owner Author

@pszemraj pszemraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look fine from me self-reviewing what I did.. will clone into a colab and test basics to make sure

@lefnire
Copy link

lefnire commented Jun 16, 2023

This would be awesome! Does it handle the conversion of a huggingface model to optimized onnx pretty automatically? Or do you need to convert manually first, and it supports loading in an onnx model?

@pszemraj
Copy link
Owner Author

pszemraj commented Jun 19, 2023

thanks for your interest ☺️ I will try and have it merged this weekend latest

Does it handle the conversion of a huggingface model to optimized onnx pretty automatically?

yep it's pretty automatic (for ONNX runtime at least). Basically the same as ONNX models for inference.

One caveat though - this PR "enables" ONNX support, but doesn't mean that I have gone through and validated that ONNX inference itself is fine vs. standard, so any issues with the 'base' conversation/support in ONNX would still show up here. My take is that there may be some issues that pop up and need to be resolved at the source ONNX code, simply because ONNX inference with long-context models probably wasn't happening much in the past. For example: LongT5 is supported officially by ONNX, but when testing with long-t5-base I find that the last batch of tokens I run inference with results in like .. 20% of the decoded letters being the letter "d" for seemingly no reason?? Earlier batches are fine, so.. YMMV and test/validate before using.

I'll try and add a note somewhere about that

@pszemraj
Copy link
Owner Author

pszemraj commented Jul 7, 2023

#9 will now be handled by this

@pszemraj pszemraj marked this pull request as ready for review July 7, 2023 23:14
pszemraj added 2 commits July 8, 2023 02:51
Signed-off-by: peter szemraj <[email protected]>
Signed-off-by: peter szemraj <[email protected]>
@pszemraj pszemraj linked an issue Jul 8, 2023 that may be closed by this pull request
@pszemraj
Copy link
Owner Author

pszemraj commented Jul 8, 2023

oook finally happy with it

@pszemraj pszemraj merged commit d51c4cd into main Jul 8, 2023
@pszemraj pszemraj deleted the streamline-compile branch July 8, 2023 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

txtsum web UI error (+ fix)
2 participants