Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Use pickle.dumps for proxy->replica messages #49539

Merged
merged 18 commits into from
Jan 6, 2025

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Jan 2, 2025

Why are these changes needed?

Fully skips the Ray cloudpickle serialization path for proxy -> replica communication.

The main change required is passing the proxy actor name rather than handle to the replica and fetching the actor handle on the replica side instead. I've done this inside of StreamingHTTPRequest to avoid widespread changes.

Before:

Concurrency Level:      100
Time taken for tests:   12.873 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    776.81 [#/sec] (mean)
Time per request:       128.732 [ms] (mean)
Time per request:       1.287 [ms] (mean, across all concurrent requests)
Transfer rate:          144.89 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  128  33.2    128     268
Waiting:        4  126  33.1    127     265
Total:          6  128  33.1    129     268

Percentage of the requests served within a certain time (ms)
  50%    129
  66%    140
  75%    147
  80%    152
  90%    167
  95%    187
  98%    209
  99%    222
 100%    268 (longest request)

After:

Concurrency Level:      100
Time taken for tests:   11.489 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.40 [#/sec] (mean)
Time per request:       114.890 [ms] (mean)
Time per request:       1.149 [ms] (mean, across all concurrent requests)
Transfer rate:          162.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  114  27.9    116     212
Waiting:        4  113  27.9    115     208
Total:          6  114  27.8    117     213

Percentage of the requests served within a certain time (ms)
  50%    117
  66%    123
  75%    130
  80%    136
  90%    150
  95%    161
  98%    174
  99%    182
 100%    213 (longest request)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
r
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes requested review from GeneDer and zcin January 2, 2025 17:05
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jan 2, 2025
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@property
def receive_asgi_messages(self) -> Callable[[RequestMetadata], Awaitable[bytes]]:
if self._receive_asgi_messages is None:
proxy_actor = ray.get_actor(self._proxy_actor_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unlikely to happen and there's nothing we can do about it, but I feel there is a case where this get_actor call might fail and unable to get the proxy actor and call receive_asgi_messages now if the node died or proxy actor died.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the GCS goes down before the proxy sends its first request to replica A? Will ray.get_actor(self._proxy_actor_name) go through?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'm not actually sure the behavior. It should be essentially the same as what happens currently when the actor handle is deserialized, but now it might block the event loop (not good). Let me investigate

@edoakes
Copy link
Contributor Author

edoakes commented Jan 3, 2025

Turns out the ray.get_actor caching behavior wasn't actually working... fixed by calling a pong method on the proxy when its handle is received. Updated the benchmark numbers in the PR description to match (~825 qps -> ~870 qps).

Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes enabled auto-merge (squash) January 3, 2025 19:09
@github-actions github-actions bot disabled auto-merge January 6, 2025 14:17
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes enabled auto-merge (squash) January 6, 2025 15:53
Signed-off-by: Edward Oakes <[email protected]>
@github-actions github-actions bot disabled auto-merge January 6, 2025 19:27
@edoakes edoakes enabled auto-merge (squash) January 6, 2025 19:29
@edoakes edoakes merged commit 4149b27 into ray-project:master Jan 6, 2025
6 checks passed
roshankathawate pushed a commit to roshankathawate/ray that referenced this pull request Jan 7, 2025
…9539)

Fully skips the Ray cloudpickle serialization path for proxy -> replica
communication.

The main change required is passing the proxy actor name rather than
handle to the replica and fetching the actor handle on the replica side
instead. I've done this inside of `StreamingHTTPRequest` to avoid
widespread changes.

Before:
```
Concurrency Level:      100
Time taken for tests:   12.873 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    776.81 [#/sec] (mean)
Time per request:       128.732 [ms] (mean)
Time per request:       1.287 [ms] (mean, across all concurrent requests)
Transfer rate:          144.89 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  128  33.2    128     268
Waiting:        4  126  33.1    127     265
Total:          6  128  33.1    129     268

Percentage of the requests served within a certain time (ms)
  50%    129
  66%    140
  75%    147
  80%    152
  90%    167
  95%    187
  98%    209
  99%    222
 100%    268 (longest request)
```

After:
```
Concurrency Level:      100
Time taken for tests:   11.489 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.40 [#/sec] (mean)
Time per request:       114.890 [ms] (mean)
Time per request:       1.149 [ms] (mean, across all concurrent requests)
Transfer rate:          162.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  114  27.9    116     212
Waiting:        4  113  27.9    115     208
Total:          6  114  27.8    117     213

Percentage of the requests served within a certain time (ms)
  50%    117
  66%    123
  75%    130
  80%    136
  90%    150
  95%    161
  98%    174
  99%    182
 100%    213 (longest request)
```
---------

Signed-off-by: Edward Oakes <[email protected]>
roshankathawate pushed a commit to roshankathawate/ray that referenced this pull request Jan 9, 2025
…9539)

Fully skips the Ray cloudpickle serialization path for proxy -> replica
communication.

The main change required is passing the proxy actor name rather than
handle to the replica and fetching the actor handle on the replica side
instead. I've done this inside of `StreamingHTTPRequest` to avoid
widespread changes.

Before:
```
Concurrency Level:      100
Time taken for tests:   12.873 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    776.81 [#/sec] (mean)
Time per request:       128.732 [ms] (mean)
Time per request:       1.287 [ms] (mean, across all concurrent requests)
Transfer rate:          144.89 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  128  33.2    128     268
Waiting:        4  126  33.1    127     265
Total:          6  128  33.1    129     268

Percentage of the requests served within a certain time (ms)
  50%    129
  66%    140
  75%    147
  80%    152
  90%    167
  95%    187
  98%    209
  99%    222
 100%    268 (longest request)
```

After:
```
Concurrency Level:      100
Time taken for tests:   11.489 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.40 [#/sec] (mean)
Time per request:       114.890 [ms] (mean)
Time per request:       1.149 [ms] (mean, across all concurrent requests)
Transfer rate:          162.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  114  27.9    116     212
Waiting:        4  113  27.9    115     208
Total:          6  114  27.8    117     213

Percentage of the requests served within a certain time (ms)
  50%    117
  66%    123
  75%    130
  80%    136
  90%    150
  95%    161
  98%    174
  99%    182
 100%    213 (longest request)
```
---------

Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Roshan Kathawate <[email protected]>
anyadontfly pushed a commit to anyadontfly/ray that referenced this pull request Feb 13, 2025
…9539)

Fully skips the Ray cloudpickle serialization path for proxy -> replica
communication.

The main change required is passing the proxy actor name rather than
handle to the replica and fetching the actor handle on the replica side
instead. I've done this inside of `StreamingHTTPRequest` to avoid
widespread changes.

Before:
```
Concurrency Level:      100
Time taken for tests:   12.873 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    776.81 [#/sec] (mean)
Time per request:       128.732 [ms] (mean)
Time per request:       1.287 [ms] (mean, across all concurrent requests)
Transfer rate:          144.89 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  128  33.2    128     268
Waiting:        4  126  33.1    127     265
Total:          6  128  33.1    129     268

Percentage of the requests served within a certain time (ms)
  50%    129
  66%    140
  75%    147
  80%    152
  90%    167
  95%    187
  98%    209
  99%    222
 100%    268 (longest request)
```

After:
```
Concurrency Level:      100
Time taken for tests:   11.489 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1910000 bytes
HTML transferred:       120000 bytes
Requests per second:    870.40 [#/sec] (mean)
Time per request:       114.890 [ms] (mean)
Time per request:       1.149 [ms] (mean, across all concurrent requests)
Transfer rate:          162.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:     6  114  27.9    116     212
Waiting:        4  113  27.9    115     208
Total:          6  114  27.8    117     213

Percentage of the requests served within a certain time (ms)
  50%    117
  66%    123
  75%    130
  80%    136
  90%    150
  95%    161
  98%    174
  99%    182
 100%    213 (longest request)
```
---------

Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Puyuan Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants