feat: support stream api #479

ganisback · 2024-06-18T07:59:42Z

currently jupyter-server-proxy does not support the text/event-stream api. when we install https://github.com/hiyouga/LLaMA-Factory in the notebook instance, and access by : http://xxx.xxx/proxy/7860/
the normal request works fine, but the stream API never response.
reference code: https://github.com/ideonate/jhsingle-native-proxy/blob/3e26a4ee0e7318970b7cf6abbd7d88455a9ac621/jhsingle_native_proxy/proxyhandlers.py#L217

ganisback · 2024-06-18T08:53:32Z

@yuvipanda please help review this feature

manics · 2024-06-18T09:23:08Z

@ganisback Thanks for your contribution! To help with review please could you expand the PR description to explain what this does, and why it's needed? We'll need some tests if this is going to be merged, but those can be added later if we agree this feature should be added.

ganisback · 2024-06-18T13:11:53Z

@manics description updated

yuvipanda · 2024-06-18T16:10:46Z

Ah interesting! I think @jmunroe also ran into this recently.

If my understanding is right, the existing code doesn't stream but waits for the entire upstream request to be completed before passing it on. And with this PR, we're special casing the text/eventstream to be streaming.

Instead, what do you think about modifying the existing code (what you have as _proxy_normal) to be streaming? I don't think there's any reason to not stream all requests - I think we initially waited for requests to complete simply because that was the easier thing to do. That would also help with tests - the current tests should catch that and if they pass, we can probably merge that.

~~So I propose that instead of only streaming eventstreams, we stream everything.~~

Can't do this, see #479 (comment)

yuvipanda · 2024-06-19T01:09:27Z

jupyter_server_proxy/handlers.py

+
+        def dump_headers(headers_raw):
+            for line in headers_raw:
+                r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line)


Can you use https://www.tornadoweb.org/en/stable/httputil.html#tornado.httputil.HTTPHeaders.parse_line here, and if that fails assume it's a status code? regexes in places like this always worry me :D

parse_line is a internal method in HTTPHeaders, we can not use it directly

Aaah, I see it doesn't return it but updates it.

The regex here still worries me though. Is it coming from somewhere specific (like an RFC, or an existing implementation?)? If so, I'd appreciate a link to the source as a comment. If not, I'd like us to find some way to farm this out. I would avoid writing code that parses HTTP in any form if possible.

yuvipanda · 2024-06-19T01:13:25Z

Ah damn, I just realized that won't work, because of our RewriteableResponse implementation :( That implementation's signature depends on being able to buffer requests, rather than being able to stream them. Oof :(

jupyter_server_proxy/handlers.py

yuvipanda · 2024-06-19T01:17:28Z

As for testing, I suggest:

Create a simple eventstream emitting server in https://github.com/jupyterhub/jupyter-server-proxy/tree/main/tests/resources (similar to the websocket one perhaps)
Create tests in test_proxy similar to the websocket tests (

jupyter-server-proxy/tests/test_proxies.py

Line 346 in 234a192

async def test_server_proxy_websocket_messages(

).

This can be simpler than websockets, because eventstreams are simpler.

jupyter_server_proxy/handlers.py

yuvipanda · 2024-06-19T01:21:29Z

jupyter_server_proxy/handlers.py

+
+        headers_raw = []
+
+        def dump_headers(headers_raw):


let's move this function outside _proxy_progressive. headers_raw as used inside this function is an arg, while there's also a local variable with the same name. Moving it outside cleans this up a little.

yuvipanda · 2024-06-19T01:22:58Z

jupyter_server_proxy/handlers.py

+                r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line)
+                if r:
+                    k,v = r.groups([1,2])
+                    if k not in ('Content-Length', 'Transfer-Encoding',


This is a different list than the one in line 531. Is there a specific reason? If not, I think we should move this to a constant at the top of the file and reference it from both places so it doesn't get out of date.

yuvipanda · 2024-06-19T01:23:58Z

jupyter_server_proxy/handlers.py

+            if response.body: # Likewise, should already be chunked out and flushed
+                self.write(response.body)
+
+    async def _proxy_normal(self, host, port, proxied_path, body):
        if self.unix_socket is not None:


This unix socket check should probably be common to both the proxying methods, and should be moved up.

yuvipanda · 2024-06-19T01:25:45Z

ok, I've done one round of review @ganisback! Hope this helps.

@manics I think we should definitely add this feature! Ideally, I'd have liked us to make everything streaming, but that's incompatible with RewriteableResponse :( So making at least eventstreams (which by definition need to be streaming) streaming (and hence incompatible with RewriteableResponse) seems ok to me.

ganisback · 2024-06-19T14:30:01Z

OK, will address your comments in the weekend

yuvipanda · 2024-07-02T23:53:16Z

Hey @ganisback! Just wanted to check to see if there's anything else I can do to help you move this forward :) I just ran into this again in a different project!

ganisback · 2024-07-03T13:13:24Z

Hi @yuvipanda , I am blocking in testing case, can you help add the testing case?

yuvipanda · 2024-07-06T01:39:28Z

@ganisback it was slightly tricky, but I think I have a useful test for you at ganisback#1! It fails on main but passes on this branch!

LMK what you think of it. I haven't had time to look at your changes yet unfortunately but will do this coming week!

ganisback · 2024-07-10T03:14:19Z

@yuvipanda I increased resp time and added resp data checking, now the tests passed.

yuvipanda

Almost there! Can you address https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645273654 and https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645272938, and the couple other style things? Then it's ready to land I think.

Thank you for your contribution and patience, @ganisback!

yuvipanda · 2024-07-12T00:35:41Z

jupyter_server_proxy/handlers.py

+
+        def dump_headers(headers_raw):
+            for line in headers_raw:
+                r = re.match('^([a-zA-Z0-9\-_]+)\s*\:\s*([^\r\n]+)[\r\n]*$', line)


Aaah, I see it doesn't return it but updates it.

The regex here still worries me though. Is it coming from somewhere specific (like an RFC, or an existing implementation?)? If so, I'd appreciate a link to the source as a comment. If not, I'd like us to find some way to farm this out. I would avoid writing code that parses HTTP in any form if possible.

yuvipanda · 2024-07-12T00:35:57Z

jupyter_server_proxy/handlers.py

+                        # some header appear multiple times, eg 'Set-Cookie'
+                        self.set_header(k,v)
+                else:
+                    r = re.match('^HTTP[^\s]* ([0-9]+)', line)


Same as my comment in https://github.com/jupyterhub/jupyter-server-proxy/pull/479/files#r1645266065

here is a example for stream header:

HTTP/1.1 200 OK Content-Type: text/html Cache-Control: no-cache Connection: keep-alive

httputil.HTTPHeaders.parse_line can not parse the first line to get http status, so we use regex to solve this issue.
the source code comes from https://github.com/ideonate/jhsingle-native-proxy/blob/3e26a4ee0e7318970b7cf6abbd7d88455a9ac621/jhsingle_native_proxy/proxyhandlers.py#L265

jupyter_server_proxy/handlers.py

yuvipanda · 2024-07-12T00:37:56Z

tests/test_proxies.py

+    await client.fetch(
+        url,
+        headers={"Accept": "text/event-stream"},
+        request_timeout=22,


I think explicitly setting timeout can be removed now right?

yuvipanda · 2024-07-12T00:38:06Z

tests/test_proxies.py

+    assert times_called == limit
+    print(stream_read_intervals)
+    assert all([0.45 < t < 3.0 for t in stream_read_intervals])
+    print(stream_data)


Let's get rid of these extra print statements

ganisback · 2024-07-18T13:06:27Z

@yuvipanda any other comments?

krassowski · 2024-07-27T18:56:39Z

Excited for this! Spent to much of my Saturday debugging why SSE was connected but not sending anything.

krassowski · 2024-08-09T20:00:31Z

@yuvipanda @manics is there anything else that needs to be done here to get it merged?

krassowski · 2024-08-14T13:49:23Z

Hi folks, just circling back here, is there any way I could help with this PR?

ganisback · 2024-08-16T06:15:13Z

Hi folks, just circling back here, is there any way I could help with this PR?

I'm just waiting for they merge this PR

yuvipanda · 2024-08-19T06:23:41Z

Sorry for dropping this, travel / vacation time :( I'll try to spend an hour on this later this week.

krassowski · 2024-08-28T15:28:54Z

It looks like pre-commit is failing because this PR is behind main branch as of now, for example it does not include the rawsocket.py file which was added last month (https://github.com/jupyterhub/jupyter-server-proxy/blob/main/jupyter_server_proxy/rawsocket.py)

As per https://results.pre-commit.ci/run/github/71295164/1723882819.1ODn2EWuTr2D6HI5NFt0pQ

krassowski · 2024-08-28T15:32:49Z

I opened a PR against this branch (ganisback#2) to fix pre-commit checks

for more information, see https://pre-commit.ci

yuvipanda · 2024-08-29T10:11:59Z

I still don't like using regexes for parsing headers, but I this is a clear improvement over status quo. So I've rebased this, and will merge. Apologies for the long time this one took, @ganisback!

I've opened #498 to handle removing the regexes.

krassowski · 2024-08-29T11:10:48Z

Amazing, thanks so much! Do you plan to cut a new release with it now or only after #498 gets addressed?

yuvipanda · 2024-08-29T11:22:22Z

#498 doesn't need to block a release!

yuvipanda · 2024-08-29T11:22:50Z

If you make a PR following https://github.com/jupyterhub/jupyter-server-proxy/blob/main/RELEASE.md i'm happy to merge that :)

yuvipanda · 2024-08-29T11:38:06Z

Although i'm off from monday!

ganisback changed the title ~~support stream~~ support stream api Jun 18, 2024

ganisback changed the title ~~support stream api~~ feat: support stream api Jun 18, 2024

yuvipanda reviewed Jun 19, 2024

View reviewed changes

jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved

yuvipanda reviewed Jun 19, 2024

View reviewed changes

jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved

yuvipanda reviewed Jun 19, 2024

View reviewed changes

jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved

yuvipanda reviewed Jun 19, 2024

View reviewed changes

jupyter_server_proxy/handlers.py Outdated Show resolved Hide resolved

yuvipanda reviewed Jun 19, 2024

View reviewed changes

yuvipanda requested changes Jul 12, 2024

View reviewed changes

manics mentioned this pull request Jul 19, 2024

second-by-second stuttering instead of continuous data stream #494

Open

krassowski mentioned this pull request Jul 27, 2024

Better secret support for PAT, LLM keys, etc nebari-dev/nebari#2514

Open

krassowski mentioned this pull request Aug 28, 2024

Sse fix precommit ganisback/jupyter-server-proxy#2

Closed

ganisback and others added 11 commits August 29, 2024 15:38

support stream

ba33538

no timeout for stream api

53c5433

address comments

5d911bd

Add a test for eventstreams

7b57794

update time

17ee719

update resp time

1b32e65

check data

7d784d4

update data

5886c4f

address comments

38b2c69

close connection after 2 hour

fad72ab

fix pre-commit

63a4bb3

yuvipanda force-pushed the support-stream branch from ac7ecff to 63a4bb3 Compare August 29, 2024 10:08

[pre-commit.ci] auto fixes from pre-commit.com hooks

e201ffa

for more information, see https://pre-commit.ci

yuvipanda mentioned this pull request Aug 29, 2024

Stop using regexes to parse HTTP Headers when supporting streaming #498

Open

yuvipanda merged commit 7f78e1b into jupyterhub:main Aug 29, 2024
15 checks passed

krassowski mentioned this pull request Aug 29, 2024

labextension: refreeze yarn.lock #499

Merged

jwindgassen mentioned this pull request Sep 16, 2024

Responsive Proxy for other Stream Types #502

Open

jwindgassen mentioned this pull request Nov 25, 2024

Enable progressive proxy via flag #512

Draft

feat: support stream api #479

feat: support stream api #479

Conversation

ganisback commented Jun 18, 2024 • edited Loading

ganisback commented Jun 18, 2024

manics commented Jun 18, 2024

ganisback commented Jun 18, 2024

yuvipanda commented Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda commented Jun 19, 2024

yuvipanda commented Jun 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda commented Jun 19, 2024

ganisback commented Jun 19, 2024

yuvipanda commented Jul 2, 2024

ganisback commented Jul 3, 2024

yuvipanda commented Jul 6, 2024

ganisback commented Jul 10, 2024

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganisback commented Jul 18, 2024

krassowski commented Jul 27, 2024

krassowski commented Aug 9, 2024

krassowski commented Aug 14, 2024

ganisback commented Aug 16, 2024

yuvipanda commented Aug 19, 2024

krassowski commented Aug 28, 2024

krassowski commented Aug 28, 2024

yuvipanda commented Aug 29, 2024

krassowski commented Aug 29, 2024

yuvipanda commented Aug 29, 2024

yuvipanda commented Aug 29, 2024

yuvipanda commented Aug 29, 2024

ganisback commented Jun 18, 2024 •

edited

Loading

yuvipanda commented Jun 18, 2024 •

edited

Loading