-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Replace blocking httpclient with async httpclient in remote inference #1839
Comments
@model-collapse @ylwu-amzn @dhrubo-os @austintlee Please chime in. |
Could you please explain why? In your provided link, I can see |
A mistake here, async httpclient also has connection pools, the 100 is an example value. |
Looking forward to this improvement! The expected improvements are very promising. |
@zane-neo Can you help test fine tune the thread pool size can help for sync client? |
@ylwu-amzn, fine tune the thread pool size definitely can improve the sync http client performance but this is not optimal, threads need system resources and also more threads will increase the thread context switch overhead, in the end this will reach to a new performance bottleneck. Using async httpclient can make sure no system resources consumption and can handle very high performance so I think we should go this way. |
Is your feature request related to a problem?
Community user brings up a performance issue here, which reveals a performance issue in HttpClient of remote inference. The flow of the prediction can be illustrated as below:
There are two major issues here:
timeout waiting for connection
described here: [FEATURE] Performance issue in CloseableHttpClient #1537.2 * num of vCPUs
, this isn't a big value since local model prediction is a CPU bound operation. But for remote inference, it's an IO bound operation and the thread pool size is relatively small.For issue1, we can enable user to update the configuration of max_connections to handle more parallel predict requests.
For issue2, we can increase the predict thread pool size to bigger number to increase the parallelism, but this is not optimal because more threads would cause more context switch and degrade the overall system performance.
What solution would you like?
Replace the blocking HttpClient with async HttpClient. With async HttpClient, both two issues above can be handled perfectly, there's no connection pool in async HttpClient and we don't need to change the default predict thread pool size since async HttpClient has better performance with only a few threads.
AWS async HttpClient: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration-crt.html
What alternatives have you considered?
Increasing the predict thread pool size and make this a system setting and configurable to user.
Do you have any additional context?
NA
The text was updated successfully, but these errors were encountered: