Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration option causes Unavailable status if transport closed when request is active #121

Closed
sanyokbig opened this issue Aug 12, 2019 · 7 comments
Assignees
Labels
CLI Issues for ghz CLI enhancement New feature or request

Comments

@sanyokbig
Copy link

sanyokbig commented Aug 12, 2019

Proto file(s)

service SleepService {
	rpc SleepFor(SleepRequest) returns (SleepResponse);
}

message SleepRequest {
	int64 Milliseconds = 1;
}

message SleepResponse {}

Command line arguments / config

{
  "proto": "./service.proto",
  "call": "main.SleepService.SleepFor",
  "concurrency": 1,
  "insecure": true,
  "connections": 1,
  "data": {
    "Milliseconds":150
  },
  "duration": "1s",
  "host": "localhost:8080",
  "format": "html",
  "output": "result.html"
}

Describe the bug
When ghz used with Duration option, any active request will fail at Duration end as connections are closed immediately, resulting in Unavailable code and rpc error: code = Unavailable desc = transport is closing error.

To Reproduce

  1. Download project https://github.com/sanyokbig/ghz-fail
  2. Run its server with go run .
  3. Run ghz with projects config ghz --config config.json
  4. Notice that 7th request failed with Unavailable code.

Expected behavior
7th result should return OK

Environment

  • OS:Arch 5.2.7
  • ghz: 0.40.0

Possible Solution
Wait for all workers to exit before closing connections. We made this change locally and now it works as expected, i can provide PR with same changes is needed.

@bojand
Copy link
Owner

bojand commented Aug 12, 2019

Hello thanks for using ghz and inquiring about this. This behaviour happens when using the duration option because when the timeout happens we perform a stop which closes all connections. This causes any in-flight calls to terminate with either Canceled desc = grpc: the client connection is closing or Unavailable desc = transport is closing errors. One workaround for this would be to wait for any in-flight operations to complete but that would technically mean we performed the test for (potentially significantly) longer than the specified duration, and might change the meaning of the test and results depending on the configuration. I think usually when using the duration option the intention is that we really want to know the results of completed requests within the duration. Hope this clears things up.

@bojand bojand self-assigned this Aug 12, 2019
@bojand bojand added CLI Issues for ghz CLI question Further information is requested labels Aug 12, 2019
@sanyokbig
Copy link
Author

Hello, thanks for response.

I get the reason for closing transport to prevent long calls, it makes perfect sense, but in that case results are a bit dirty with couple of errors at the end, which we would like to avoid as there are not real errors.

If i may, these are a couple of thoughts of how to solve this inconvenience (for us at least):

  1. Simple one is what we've done locally - close connections only after all workers are done, but it can take a while, like you said.
  2. Make 1. opt-in, leaving default behavior as is.
  3. Add extra config to limit await time after duration end, duration-timeout for example. This will result in max test duration of duration + duration-timeout. Not a fan of this idea honestly, as this adds another option and pretty much collides with current timeout option.
  4. Make max-duration work with duration providing same functionality as duration-timeout above.
    For example, duration: 10m; max-duration:10m20s will be same as duration:10m; duration-timeout:20s, giving workers 20s to complete.
    There is no extra option, but this could make some existing configurations run test longer, so probably a bad idea.

Overall i think 2. can be a good solution: it introduces a new optional behavior that matches ours (and possibly others) expectations, while not changing how current configurations are run.

About concerns of longer test runs, existing timeout can take care of it (given it is not disabled), making test run duration + timeout in the worst case. No extra code should be required on this part.

@sanyokbig
Copy link
Author

Another thought is to ignore results after duration end, for example by adding stop channel to reporter and using it in Run() with signal coming from requester.Stop(), right before close loop.

This way the reporter should exit right before connections are closed so no failed requests are reported.

Benefit of this approach is that not extra configuration required, but after trying to implement it myself looks like this requires a bit more code changes than i expected, so probably not a best solution.

@bojand
Copy link
Owner

bojand commented Aug 14, 2019

Thanks for all the input and suggestions. I agree this can be improved. I'll think about the best approach, but right now I think I am leaning towards the ignore suggestion as it solves the issue of errors in results and accuracy of timing. But i'll think about it some more.

@bojand bojand added enhancement New feature or request and removed question Further information is requested labels Aug 14, 2019
@bojand
Copy link
Owner

bojand commented Oct 23, 2019

@sanyokbig Please see #135 for a possible solution for this issue.

@sanyokbig
Copy link
Author

@bojand This would be great, just what we need.

@bojand
Copy link
Owner

bojand commented Oct 23, 2019

Released in 0.42.0. If there are any problems please feel free to create a new issue.

@bojand bojand closed this as completed Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI Issues for ghz CLI enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants