-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Streaming #10
Conversation
ifrit98
commented
Sep 15, 2023
- Adds a template for streaming servers
- Adds documentation for it
- Also fixes licensing headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a few questions.
await send({"type": "http.response.body", "body": (token + '\n').encode('utf-8'), "more_body": True}) | ||
bt.logging.trace(f"Streamed token: {token}") | ||
# Sleep to show the streaming effect | ||
await asyncio.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is sleep here necessary for the final implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crap, no that was for testing. Nice catch!
# Simulate model inference | ||
input_ids = tokenizer(text, return_tensors="pt").input_ids.squeeze() | ||
# Iterate over the decoded tokens and send them back to the client. | ||
for token in model(input_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are sending token by token?
out of curiosity, how would it change the speed when we send 2 tokens each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better, I will test this and ammend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# Iterate over the decoded tokens and send them back to the client. | ||
for token in model(input_ids): | ||
# Send token back to the client | ||
await send({"type": "http.response.body", "body": (token + '\n').encode('utf-8'), "more_body": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the end line necessary after each token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The send() is required for every chunk, but would be up to the miners to implement their own buffering logic and determine an appropriate chunk size. Assume they'd buffer >1 token at a time and send in chunks.
I'll make an update to reflect this point. Thanks!