Draft: support faster inference methods #8

pszemraj · 2023-06-16T13:34:11Z

support faster and advanced inference methods

torch.compile
optimum onnx
use fire for the CLI

Signed-off-by: peter szemraj <[email protected]>

pszemraj · 2023-06-16T13:35:17Z

after further thought I am going to update the UI/app in a later round of improvements bc I don't have time for it these days

pszemraj

changes look fine from me self-reviewing what I did.. will clone into a colab and test basics to make sure

lefnire · 2023-06-16T21:26:03Z

This would be awesome! Does it handle the conversion of a huggingface model to optimized onnx pretty automatically? Or do you need to convert manually first, and it supports loading in an onnx model?

pszemraj · 2023-06-19T17:26:16Z

thanks for your interest ☺️ I will try and have it merged this weekend latest

Does it handle the conversion of a huggingface model to optimized onnx pretty automatically?

yep it's pretty automatic (for ONNX runtime at least). Basically the same as ONNX models for inference.

One caveat though - this PR "enables" ONNX support, but doesn't mean that I have gone through and validated that ONNX inference itself is fine vs. standard, so any issues with the 'base' conversation/support in ONNX would still show up here. My take is that there may be some issues that pop up and need to be resolved at the source ONNX code, simply because ONNX inference with long-context models probably wasn't happening much in the past. For example: LongT5 is supported officially by ONNX, but when testing with long-t5-base I find that the last batch of tokens I run inference with results in like .. 20% of the decoded letters being the letter "d" for seemingly no reason?? Earlier batches are fine, so.. YMMV and test/validate before using.

I'll try and add a note somewhere about that

Signed-off-by: peter szemraj <[email protected]>

pszemraj · 2023-07-07T23:14:27Z

#9 will now be handled by this

Signed-off-by: peter szemraj <[email protected]>

pszemraj · 2023-07-08T00:58:40Z

oook finally happy with it

pszemraj added 30 commits April 4, 2023 00:20

📝 update admin

4a2c0e0

Signed-off-by: peter szemraj <[email protected]>

✨ add flag for torch.compile

bc39995

Signed-off-by: peter szemraj <[email protected]>

🎨 cleanup

f942d7c

Signed-off-by: peter szemraj <[email protected]>

✏️

cee9850

Signed-off-by: peter szemraj <[email protected]>

✨ add compile() to CLI

d278f1d

Signed-off-by: peter szemraj <[email protected]>

updatedwip

401fa3e

Signed-off-by: peter szemraj <[email protected]>

wip

d4082da

Signed-off-by: peter szemraj <[email protected]>

✨ onnx gpu

a9c47bd

Signed-off-by: peter szemraj <[email protected]>

🧑‍💻 update cuda select logic

a052660

Signed-off-by: peter szemraj <[email protected]>

🚸 inform about xtra pkg

b39c876

Signed-off-by: peter szemraj <[email protected]>

🐛

888bfa9

Signed-off-by: peter szemraj <[email protected]>

🐛 model.eval() not set, keep special tokens

1aed64d

Signed-off-by: peter szemraj <[email protected]>

🚧 pad last batch of tokens

1825ad3

Signed-off-by: peter szemraj <[email protected]>

🧑‍💻 save version to output aprams

cb1e230

Signed-off-by: peter szemraj <[email protected]>

set batch_length if none

2cb028a

Signed-off-by: peter szemraj <[email protected]>

🧑‍💻 🚸 better config

86d75d6

Signed-off-by: peter szemraj <[email protected]>

🚧 make compatible with local paths

272d0df

Signed-off-by: peter szemraj <[email protected]>

📝 update docs

af29025

Signed-off-by: peter szemraj <[email protected]>

✨ add force_cache

575c13a

Signed-off-by: peter szemraj <[email protected]>

🐛

d70ea55

Signed-off-by: peter szemraj <[email protected]>

✨ ➕ fire for CLI

a7f06f0

Signed-off-by: peter szemraj <[email protected]>

✨ option to skip completed files

0db9e82

Signed-off-by: peter szemraj <[email protected]>

cli details

7145d60

Signed-off-by: peter szemraj <[email protected]>

⚰️

9c6e99c

Signed-off-by: peter szemraj <[email protected]>

🚧 validate and debug using cache

77d7c13

Signed-off-by: peter szemraj <[email protected]>

🔊

296eeef

Signed-off-by: peter szemraj <[email protected]>

🥅 break loop if runtime error

4a8106a

Signed-off-by: peter szemraj <[email protected]>

🔊

af908f3

Signed-off-by: peter szemraj <[email protected]>

log config

3b5dec2

Signed-off-by: peter szemraj <[email protected]>

🔊 improve log setup

cd1c8ea

Signed-off-by: peter szemraj <[email protected]>

pszemraj added 2 commits April 30, 2023 13:53

♻️ general improvements

97fb67c

Signed-off-by: peter szemraj <[email protected]>

format

3fdffd5

Signed-off-by: peter szemraj <[email protected]>

pszemraj added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 16, 2023

pszemraj self-assigned this Jun 16, 2023

pszemraj marked this pull request as draft June 16, 2023 13:34

pszemraj commented Jun 16, 2023

View reviewed changes

pszemraj mentioned this pull request Jun 19, 2023

txtsum web UI error (+ fix) #9

Closed

pszemraj added 2 commits July 8, 2023 00:54

📌 pin rapidfuzz for doctr

5c3ab4a

Signed-off-by: peter szemraj <[email protected]>

⚡️ 📝 details on optimum and format

788aa30

Signed-off-by: peter szemraj <[email protected]>

pszemraj marked this pull request as ready for review July 7, 2023 23:14

pszemraj added 2 commits July 8, 2023 02:51

📝 le final docs

deed9f9

Signed-off-by: peter szemraj <[email protected]>

📝

cc32af1

Signed-off-by: peter szemraj <[email protected]>

pszemraj linked an issue Jul 8, 2023 that may be closed by this pull request

txtsum web UI error (+ fix) #9

Closed

pszemraj merged commit d51c4cd into main Jul 8, 2023

pszemraj deleted the streamline-compile branch July 8, 2023 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: support faster inference methods #8

Draft: support faster inference methods #8

pszemraj commented Jun 16, 2023

pszemraj commented Jun 16, 2023

pszemraj left a comment

lefnire commented Jun 16, 2023

pszemraj commented Jun 19, 2023 •

edited

Loading

pszemraj commented Jul 7, 2023

pszemraj commented Jul 8, 2023

Draft: support faster inference methods #8

Draft: support faster inference methods #8

Conversation

pszemraj commented Jun 16, 2023

pszemraj commented Jun 16, 2023

pszemraj left a comment

Choose a reason for hiding this comment

lefnire commented Jun 16, 2023

pszemraj commented Jun 19, 2023 • edited Loading

pszemraj commented Jul 7, 2023

pszemraj commented Jul 8, 2023

pszemraj commented Jun 19, 2023 •

edited

Loading