-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support stream api #479
Conversation
@yuvipanda please help review this feature |
@ganisback Thanks for your contribution! To help with review please could you expand the PR description to explain what this does, and why it's needed? We'll need some tests if this is going to be merged, but those can be added later if we agree this feature should be added. |
@manics description updated |
Ah interesting! I think @jmunroe also ran into this recently. If my understanding is right, the existing code doesn't stream but waits for the entire upstream request to be completed before passing it on. And with this PR, we're special casing the
Can't do this, see #479 (comment) |
jupyter_server_proxy/handlers.py
Outdated
|
||
def dump_headers(headers_raw): | ||
for line in headers_raw: | ||
r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use https://www.tornadoweb.org/en/stable/httputil.html#tornado.httputil.HTTPHeaders.parse_line here, and if that fails assume it's a status code? regexes in places like this always worry me :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_line is a internal method in HTTPHeaders, we can not use it directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah, I see it doesn't return it but updates it.
The regex here still worries me though. Is it coming from somewhere specific (like an RFC, or an existing implementation?)? If so, I'd appreciate a link to the source as a comment. If not, I'd like us to find some way to farm this out. I would avoid writing code that parses HTTP in any form if possible.
Ah damn, I just realized that won't work, because of our |
As for testing, I suggest:
This can be simpler than websockets, because eventstreams are simpler. |
|
||
headers_raw = [] | ||
|
||
def dump_headers(headers_raw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this function outside _proxy_progressive
. headers_raw
as used inside this function is an arg, while there's also a local variable with the same name. Moving it outside cleans this up a little.
jupyter_server_proxy/handlers.py
Outdated
r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line) | ||
if r: | ||
k,v = r.groups([1,2]) | ||
if k not in ('Content-Length', 'Transfer-Encoding', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different list than the one in line 531. Is there a specific reason? If not, I think we should move this to a constant at the top of the file and reference it from both places so it doesn't get out of date.
jupyter_server_proxy/handlers.py
Outdated
if response.body: # Likewise, should already be chunked out and flushed | ||
self.write(response.body) | ||
|
||
async def _proxy_normal(self, host, port, proxied_path, body): | ||
if self.unix_socket is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unix socket check should probably be common to both the proxying methods, and should be moved up.
ok, I've done one round of review @ganisback! Hope this helps. @manics I think we should definitely add this feature! Ideally, I'd have liked us to make everything streaming, but that's incompatible with |
OK, will address your comments in the weekend |
Hey @ganisback! Just wanted to check to see if there's anything else I can do to help you move this forward :) I just ran into this again in a different project! |
Hi @yuvipanda , I am blocking in testing case, can you help add the testing case? |
@ganisback it was slightly tricky, but I think I have a useful test for you at ganisback#1! It fails on main but passes on this branch! LMK what you think of it. I haven't had time to look at your changes yet unfortunately but will do this coming week! |
@yuvipanda I increased resp time and added resp data checking, now the tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there! Can you address https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645273654 and https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645272938, and the couple other style things? Then it's ready to land I think.
Thank you for your contribution and patience, @ganisback!
jupyter_server_proxy/handlers.py
Outdated
|
||
def dump_headers(headers_raw): | ||
for line in headers_raw: | ||
r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah, I see it doesn't return it but updates it.
The regex here still worries me though. Is it coming from somewhere specific (like an RFC, or an existing implementation?)? If so, I'd appreciate a link to the source as a comment. If not, I'd like us to find some way to farm this out. I would avoid writing code that parses HTTP in any form if possible.
jupyter_server_proxy/handlers.py
Outdated
# some header appear multiple times, eg 'Set-Cookie' | ||
self.set_header(k,v) | ||
else: | ||
r = re.match('^HTTP[^\s]* ([0-9]+)', line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my comment in https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645266065
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a example for stream header:
HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: no-cache
Connection: keep-alive
httputil.HTTPHeaders.parse_line can not parse the first line to get http status, so we use regex to solve this issue.
the source code comes from https://github.com/ideonate/jhsingle-native-proxy/blob/3e26a4ee0e7318970b7cf6abbd7d88455a9ac621/jhsingle_native_proxy/proxyhandlers.py#L265
tests/test_proxies.py
Outdated
await client.fetch( | ||
url, | ||
headers={"Accept": "text/event-stream"}, | ||
request_timeout=22, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explicitly setting timeout can be removed now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tests/test_proxies.py
Outdated
assert times_called == limit | ||
print(stream_read_intervals) | ||
assert all([0.45 < t < 3.0 for t in stream_read_intervals]) | ||
print(stream_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of these extra print statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@yuvipanda any other comments? |
Excited for this! Spent to much of my Saturday debugging why SSE was connected but not sending anything. |
@yuvipanda @manics is there anything else that needs to be done here to get it merged? |
Hi folks, just circling back here, is there any way I could help with this PR? |
I'm just waiting for they merge this PR |
Sorry for dropping this, travel / vacation time :( I'll try to spend an hour on this later this week. |
It looks like pre-commit is failing because this PR is behind As per https://results.pre-commit.ci/run/github/71295164/1723882819.1ODn2EWuTr2D6HI5NFt0pQ |
I opened a PR against this branch (ganisback#2) to fix pre-commit checks |
ac7ecff
to
63a4bb3
Compare
for more information, see https://pre-commit.ci
I still don't like using regexes for parsing headers, but I this is a clear improvement over status quo. So I've rebased this, and will merge. Apologies for the long time this one took, @ganisback! I've opened #498 to handle removing the regexes. |
Amazing, thanks so much! Do you plan to cut a new release with it now or only after #498 gets addressed? |
#498 doesn't need to block a release! |
If you make a PR following https://github.com/jupyterhub/jupyter-server-proxy/blob/main/RELEASE.md i'm happy to merge that :) |
Although i'm off from monday! |
currently jupyter-server-proxy does not support the text/event-stream api. when we install https://github.com/hiyouga/LLaMA-Factory in the notebook instance, and access by : http://xxx.xxx/proxy/7860/
the normal request works fine, but the stream API never response.
reference code: https://github.com/ideonate/jhsingle-native-proxy/blob/3e26a4ee0e7318970b7cf6abbd7d88455a9ac621/jhsingle_native_proxy/proxyhandlers.py#L217