-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dapr Invoke Method has very low throughput compared to calling API directly #709
Comments
We need to revisit the HTTP call solution to remove dependency from OKHTTP in addition to fix this issue. Thanks for reporting this with such detail. In the meantime, you can still use Dapr for service invocation via one of the URL solutions documented here: https://docs.dapr.io/developing-applications/building-blocks/service-invocation/howto-invoke-discover-services/#additional-url-formats |
This problem is related to runtime and not the SDK. The code sample above compares calling a service with Dapr Java SDK + Dapr Runtime vs calling with native HTTP Client without runtime. If I use the HTTP Client calling into Dapr runtime, the behavior is the same as Dapr Java SDK + Runtime. HttpRequest request = HttpRequest.newBuilder()
.POST(HttpRequest.BodyPublishers.ofString("message"))
.uri(URI.create("http://localhost:3500/say"))
.header("dapr-app-id", "invokedemo")
.build(); I am moving this item to dapr runtime. |
I talked to @yaron2 about this and could still be an SDK or JVM config issue. Still needs further investigation. |
This is a very tricky bug...... I sent so many times to debug it. As shown in above picture, by default, if no specified configuration, the service invoke flow will like this.
Now I have found three issues: mDNSmDNS Name Resolver will be very slow (response in more than 10 seconds) in multi-thread. In daprd, we will call the name resolver for each request without cache or lock: func (d *directMessaging) getRemoteApp(appID string) (remoteApp, error) {
id, namespace, err := d.requestAppIDAndNamespace(appID)
if err != nil {
return remoteApp{}, err
}
request := nr.ResolveRequest{ID: id, Namespace: namespace, Port: d.grpcPort}
address, err := d.resolver.ResolveID(request)
if err != nil {
return remoteApp{}, err
}
return remoteApp{
namespace: namespace,
id: id,
address: address,
}, nil
} I tried to use a sync.once (only for verification since in the test case I only have one app to look up), it work well. fasthttp (Update: fasthttp is OK)
Update: from the tcp package captured I saw that the HTTP requests is not sent from SDK to daprd in the client side.
So, it seems that in the java SDK there is a concurrency limitation of 5 thread ? Let me check it. Update again: there is a config named maxRequestsPerHost, and it default to 5! default to 5! default to 5! slow responseEven for one single request without sleep in server, the total response time will be more than 300 ms in my test env. == server request received: id=1, start=1647454876799, received=1647454877161, interval=362 In above log, the request from client to server spent 362 ms, and the response from server to client spent 19 ms. This is unbelievable. |
@skyao thanks for the detailed breakdown - can I just query your point on the mDNS name resolution - the mDNS name resolution does use a cache internally (see here) so it should only browse the network on the first occurrence of an appID until the cached address expires. You said the responses from the name resolver took 10 seconds - is that the first resolution or consistently? The timeout on the first browse is 1 second - so you should see errors if it cannot find the app within 1 second - are you seeing retries? |
Yes, I agree with you that the mDNS should work like this. But in my test, I just use the testcase @yschneider-bosch discribed above and with a log arround the ResolveID() method in file func (d *directMessaging) getRemoteApp(appID string) (remoteApp, error) {
id, namespace, err := d.requestAppIDAndNamespace(appID)
if err != nil {
return remoteApp{}, err
}
request := nr.ResolveRequest{ID: id, Namespace: namespace, Port: d.grpcPort}
var time1 = time.Now().UnixMilli()
address, err := d.resolver.ResolveID(request)
if err != nil {
return remoteApp{}, err
}
var time2 = time.Now().UnixMilli()
log.Errorf("getRemoteApp time used: ResolveID()=%d, appid=%s, address=%s \n", time2-time1, appID, address)
return remoteApp{
namespace: namespace,
id: id,
address: address,
}, nil
} And rerun the testcase with/without sleep(1000) in the DemoServiceController. the log without sleep(1000) :
the log with sleep(1000) :
|
From the log we can see that almost of time mDNS works well and the response time of ResolveID() is 0 or near 0, except that three are some strange records with response time 3 / 6 / 9 / 12 seconds. 3 / 6 / 9 / 12 seconds, they are so regular. I added a test for 100 request to see if the serial will increase or not. the log with sleep(1000) and run 100 requests:
Only 3 / 6/ 9 / 12 seconds. So, it means that after the 4 exceptions, mDNS will work well ? |
Thanks @skyao I will see if I can repro this behaviour in a test of the mDNS name resolution on its own. Will report back as soon as I can. |
OK, you can just add the log as I shown above , rebuild the daprd and replace the one in ~/.dapr/bin, and rerun your test case. I tested it on ubuntu 20.04 with intel x86 cpu. The branch of dapr repo and java-sdk repo are master. |
Repeat one update above to avoid someone doesn't see it. fasthttp (Update: fasthttp is OK)
Update: from the tcp package captured I saw that the HTTP requests is not sent from SDK to daprd in the client side.
So, it seems that in the java SDK there is a concurrency limitation of 5 thread ? Let me check it. |
I suspect there is also a bug in the mDNS resolver too - I'll dig into it. |
update for okhttp: I checked the dapr java sdk code and found that there are some default settings in okhttp which we don't set in dapr java sdk and use the default value:
After change maxRequestsPerHost from default 5 to a big one, the problem resolved. @artursouza This is why our performance test is so fast but in the test case of this issue the throughput is so low:
So, the TPS is limited to 5. In performance test, the response from server is fast enough (less then 1 ms), for the TPS is still high with 5 connections. |
Re run the test case after some update:
Code of InvokeClient class: public static void main(String[] args) throws Exception {
var executor = Executors.newFixedThreadPool(20);
try (DaprClient client = (new DaprClientBuilder()).build()) {
// warn up for mDNS
System.out.println("warm up for mDNS");
doInvoke(client);
Thread.sleep(2 * 1000);
// warn up for HTTP connections
System.out.println("warm up for HTTP connections");
for (var i = 0; i < 20; i++) {
executor.execute(() -> {
doInvoke(client);
});
}
Thread.sleep(2 * 1000);
// do real test
System.out.println("warm up OK! Begin to do test!");
for (var i = 0; i < 10; i++) {
executor.execute(() -> {
for (var j = 0; j < 100; j++) {
doInvoke(client);
}
});
}
}
}
private static void doInvoke(DaprClient client) {
var timestamp = System.currentTimeMillis();
var time1 = System.nanoTime();
// use invokeMethod
int id = counter.getAndIncrement();
String request = "id=" + id + ",start=" + timestamp;
//System.out.println(String.format("begin to send request: %s", request));
client.invokeMethod(SERVICE_APP_ID, "say", request,
HttpExtension.POST, null,
byte[].class).block();
var used = (double)(System.nanoTime() - time1)/ 1000 / 1000;
System.out.println(String.format("client invoke done: id=%d, start=%d, used=%.2f ms", id, timestamp, used));
} After these changes, the total end-to-end latency is about 1 - 2 ms. And if we sleep 1 second in the server app, the total end-to-end latency is about 1004 - 1005 ms. |
@skyao I believe I have got to the bottom of the mDNS behaviour you were seeing - please refer to dapr/components-contrib#1591. Thanks for flagging it. |
@skyao Thanks, I am moving this issue back to Java SDK since that is where the fix is in. The mDNS finding was also great, thanks. |
Closing issue. It is planned for next release. |
Expected Behavior
invokeMethod sends requests asynchronously and parallel. The number of connections, threads etc can be configured. The performance should match calling the api endpoint directly.
Actual Behavior
We have started investigating as we noticed that under load, requests are processed almost sequentially, leading to very long processing time if the target method takes long time to complete.
There is no possibility to control the underlying http client or configure anything in order to mitigate this problem, which was only solved by increasing the number of instances.
Comparing to calling the rest endpoint directly, the throughput of invokeMethod is about 16 times slower.
Steps to Reproduce the Problem
Please find a modified version of the invoke example from the sdk. Request processing at the server side is 1 second to demonstrate the effect. HTTP Client call is commented out, please uncomment it and comment the invoke method for a comparison.
Also find modified logs of both executions, with the httpclient and with dapr invokeMethod.
As you can see, execution with the httpclient takes around 1 second for all 50 requests while with dapr invoke it takes around 16 seconds.
with-dapr-invoke.txt
with-http-client.txt
DemoServiceController.java
DemoServiceController.java
add this to the application properties:
server.tomcat.threads.max=400
Release Note
RELEASE NOTE:
The text was updated successfully, but these errors were encountered: