Duration option causes Unavailable status if transport closed when request is active #121

sanyokbig · 2019-08-12T10:00:58Z

Proto file(s)

service SleepService {
	rpc SleepFor(SleepRequest) returns (SleepResponse);
}

message SleepRequest {
	int64 Milliseconds = 1;
}

message SleepResponse {}

Command line arguments / config

{
  "proto": "./service.proto",
  "call": "main.SleepService.SleepFor",
  "concurrency": 1,
  "insecure": true,
  "connections": 1,
  "data": {
    "Milliseconds":150
  },
  "duration": "1s",
  "host": "localhost:8080",
  "format": "html",
  "output": "result.html"
}

Describe the bug
When ghz used with Duration option, any active request will fail at Duration end as connections are closed immediately, resulting in Unavailable code and rpc error: code = Unavailable desc = transport is closing error.

To Reproduce

Download project https://github.com/sanyokbig/ghz-fail
Run its server with go run .
Run ghz with projects config ghz --config config.json
Notice that 7th request failed with Unavailable code.

Expected behavior
7th result should return OK

Environment

OS:Arch 5.2.7
ghz: 0.40.0

Possible Solution
Wait for all workers to exit before closing connections. We made this change locally and now it works as expected, i can provide PR with same changes is needed.

The text was updated successfully, but these errors were encountered:

bojand · 2019-08-12T23:50:45Z

Hello thanks for using ghz and inquiring about this. This behaviour happens when using the duration option because when the timeout happens we perform a stop which closes all connections. This causes any in-flight calls to terminate with either Canceled desc = grpc: the client connection is closing or Unavailable desc = transport is closing errors. One workaround for this would be to wait for any in-flight operations to complete but that would technically mean we performed the test for (potentially significantly) longer than the specified duration, and might change the meaning of the test and results depending on the configuration. I think usually when using the duration option the intention is that we really want to know the results of completed requests within the duration. Hope this clears things up.

sanyokbig · 2019-08-13T08:56:47Z

Hello, thanks for response.

I get the reason for closing transport to prevent long calls, it makes perfect sense, but in that case results are a bit dirty with couple of errors at the end, which we would like to avoid as there are not real errors.

If i may, these are a couple of thoughts of how to solve this inconvenience (for us at least):

Simple one is what we've done locally - close connections only after all workers are done, but it can take a while, like you said.
Make 1. opt-in, leaving default behavior as is.
Add extra config to limit await time after duration end, duration-timeout for example. This will result in max test duration of duration + duration-timeout. Not a fan of this idea honestly, as this adds another option and pretty much collides with current timeout option.
Make max-duration work with duration providing same functionality as duration-timeout above.
For example, duration: 10m; max-duration:10m20s will be same as duration:10m; duration-timeout:20s, giving workers 20s to complete.
There is no extra option, but this could make some existing configurations run test longer, so probably a bad idea.

Overall i think 2. can be a good solution: it introduces a new optional behavior that matches ours (and possibly others) expectations, while not changing how current configurations are run.

About concerns of longer test runs, existing timeout can take care of it (given it is not disabled), making test run duration + timeout in the worst case. No extra code should be required on this part.

sanyokbig · 2019-08-14T11:07:01Z

Another thought is to ignore results after duration end, for example by adding stop channel to reporter and using it in Run() with signal coming from requester.Stop(), right before close loop.

This way the reporter should exit right before connections are closed so no failed requests are reported.

Benefit of this approach is that not extra configuration required, but after trying to implement it myself looks like this requires a bit more code changes than i expected, so probably not a best solution.

bojand · 2019-08-14T12:05:21Z

Thanks for all the input and suggestions. I agree this can be improved. I'll think about the best approach, but right now I think I am leaning towards the ignore suggestion as it solves the issue of errors in results and accuracy of timing. But i'll think about it some more.

bojand · 2019-10-23T01:16:39Z

@sanyokbig Please see #135 for a possible solution for this issue.

sanyokbig · 2019-10-23T08:33:55Z

@bojand This would be great, just what we need.

bojand · 2019-10-23T13:25:10Z

Released in 0.42.0. If there are any problems please feel free to create a new issue.

bojand self-assigned this Aug 12, 2019

bojand added CLI Issues for ghz CLI question Further information is requested labels Aug 12, 2019

bojand added enhancement New feature or request and removed question Further information is requested labels Aug 14, 2019

bojand mentioned this issue Oct 23, 2019

Duration stop action #135

Merged

bojand closed this as completed Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duration option causes Unavailable status if transport closed when request is active #121

Duration option causes Unavailable status if transport closed when request is active #121

sanyokbig commented Aug 12, 2019 •

edited

Loading

bojand commented Aug 12, 2019

sanyokbig commented Aug 13, 2019

sanyokbig commented Aug 14, 2019

bojand commented Aug 14, 2019

bojand commented Oct 23, 2019

sanyokbig commented Oct 23, 2019

bojand commented Oct 23, 2019

Duration option causes Unavailable status if transport closed when request is active #121

Duration option causes Unavailable status if transport closed when request is active #121

Comments

sanyokbig commented Aug 12, 2019 • edited Loading

bojand commented Aug 12, 2019

sanyokbig commented Aug 13, 2019

sanyokbig commented Aug 14, 2019

bojand commented Aug 14, 2019

bojand commented Oct 23, 2019

sanyokbig commented Oct 23, 2019

bojand commented Oct 23, 2019

sanyokbig commented Aug 12, 2019 •

edited

Loading