Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI flag for aggregation timeout #686

Merged
merged 1 commit into from
Apr 26, 2019
Merged

Add CLI flag for aggregation timeout #686

merged 1 commit into from
Apr 26, 2019

Conversation

johnSchnake
Copy link
Contributor

What this PR does / why we need it:
One of the most common config settings that may change between
different environments and test cases is the amount of time
Sonobuoy will wait for results. Currently, this value is configurable
but it is in a nested config structure and not terribly well
documented at this time.

Personally, in the past, I had projects where we had to start
using a sonobuoy config-based flow instead of flags due solely
to this field.

This change adds the flag to make it more easy to set.

Signed-off-by: John Schnake [email protected]

Which issue(s) this PR fixes
Fixes #482

Release note:

Added --timeout flag to set the number of seconds Sonobuoy will wait for plugins to complete.

@johnSchnake johnSchnake requested a review from stevesloka April 22, 2019 18:58
@johnSchnake
Copy link
Contributor Author

// @ceridwen

@codecov-io
Copy link

codecov-io commented Apr 22, 2019

Codecov Report

Merging #686 into master will increase coverage by 0.1%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #686     +/-   ##
=========================================
+ Coverage   42.44%   42.55%   +0.1%     
=========================================
  Files          62       62             
  Lines        3409     3419     +10     
=========================================
+ Hits         1447     1455      +8     
- Misses       1864     1866      +2     
  Partials       98       98
Impacted Files Coverage Δ
pkg/plugin/aggregation/run.go 0% <0%> (ø) ⬆️
cmd/sonobuoy/app/gen.go 48.36% <100%> (+1.3%) ⬆️
pkg/config/config.go 57.89% <100%> (ø) ⬆️
cmd/sonobuoy/app/args.go 88.88% <100%> (+0.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b56c31...52c65ee. Read the comment docs.

@stevesloka
Copy link
Contributor

So now there are 2 flags which determine "wait". There's --wait and now --timeout (added in this PR). Should those do the same thing and be combined?

@stevesloka
Copy link
Contributor

This could be a future design PR, but is there a way to define the timeout per plugin? For instance, the e2e tests may take a long time, but maybe my systemd_logs only takes 2mins? This flag could be the default of all and then later maybe let plugin developers further define?

func AddTimeoutFlag(flag *int, flags *pflag.FlagSet) {
flags.IntVar(
flag, timeoutFlag, config.DefaultAggregationServerTimeoutSeconds,
"How long (in seconds) Sonobuoy will wait for plugins to complete before exiting. 0 indicates no timeout.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be cool to make this a duration type and let users enter 1m or 2h`, etc. Otherwise, you've got to do the math to figure out the number of seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe print the default to the help so users know what it is if not defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cobra adds the default; this was copied from my console just now:

... before exiting. 0 indicates no timeout. (default 10800)

@johnSchnake
Copy link
Contributor Author

So now there are 2 flags which determine "wait". There's --wait and now --timeout (added in this PR). Should those do the same thing and be combined?

So there are two flags but they really do indicate different things and could be used in various ways.

The --timeout flag indicates the server timeout and when that is hit, everything turns off and explodes (more or less).

The --wait flag is a CLI-only flag that just indicates how long you are willing to block in the terminal for. It won't err the run just because you hit that limit.

The reason I don't want to sync the two is that a pretty reasonable use case is the following:

  • run and allow the server to timeout after 2hr
  • you really dont think it'll take that long so you just have your CLI stop waiting and return after an hour.
  • after the hour is up you can check the status; you assume it should be done but if not (and it is still running) you can investigate everything while it is still running.

@johnSchnake
Copy link
Contributor Author

This could be a future design PR, but is there a way to define the timeout per plugin? For instance, the e2e tests may take a long time, but maybe my systemd_logs only takes 2mins? This flag could be the default of all and then later maybe let plugin developers further define?

So I agree something like that may eventually be implemented. There are definitely impacts of this overall timeout and per-plugin timeout that play a role in #627 My POC implementation of plugin ordering also showed me that it could be very easy to adjust some of these details by adding them into the plugin manifest metadata.

But yes, I agree it is for another day since this is currently coded for just one big timeout and this was just to make it easier to set.

@stevesloka
Copy link
Contributor

Yeah that holds up, but I can still sonobuoy run --wait, then CTRL-C and then sonobuoy status and all works.

The --wait flag is a CLI-only flag that just indicates how long you are willing to block in the terminal for. It won't err the run just because you hit that limit.

But should it? If you had CI running and this happened you'd get a false positive wouldn't you?

I'm just trying to challenge the overall design to make sure we're solving the problem and not just throwing flags at it. Is there a scenario where you'd want the client to wait 5 mins and the server wait 2 hours?

@johnSchnake
Copy link
Contributor Author

Yeah that holds up, but I can still sonobuoy run --wait, then CTRL-C and then sonobuoy status and all works.

True, though you may want to do something like this even in CI. Before the --wait commands existed thats what I had seen done when people were rolling their own polling logic. Wait while it runs for a while, then when we hit our own timeout logic go and gather logs and maybe run some checks while it Sonobuoy is still running, then err out ourselves and clean up.

But should it? If you had CI running and this happened you'd get a false positive wouldn't you?

As the example above says, I don't think forcing the server to error and stop running because we hit a CLI timeout is always the right thing to do. It prevents a user (including CI systems) from investigating if they want to.

Just because sonobuoy run --wait returns doesn't guarantee anything about whether or not anything succeeded; do you mean false positive in that you might think it is done? That is partially why the default time for --wait is much longer than any typical run, currently at a day.

I'm just trying to challenge the overall design to make sure we're solving the problem and not just throwing flags at it.

Absolutely 👍 , IMO that is the discussion that we had when deciding to put this ticket into the milestone though. I appreciate the design questions though and am happy to defend the choice to make the new flag.

Is there a scenario where you'd want the client to wait 5 mins and the server wait 2 hours?

Yes but I also think your comment about CTRL+C plays a role here too; personally, I run checks all the time where this is the [somewhat] the case. If I am iterating on some plugin (in my case, e2e test) and I think it will just take 5m or less and I run it as such. If it runs and I find out it actually took 7m, I really didn't want the server to err at 5m. But the CLI --wait gave me an opportunity to see that 5m has passed and it still isn't done. Again, then I investigate the situation and usually restart my experiment whenever I'm done.

TL;DR; I think that in most situations your sentiment is right, you'd just want the server to pass/fail on its own and then the CLI wait for that terminal status. But I can think of situations where you want to ensure the CLI stops and gives you time to investigate before cleaning anything up. I think some of these things are also related to when/how/if plugin resources are cleaned up. In the end though, this is already a customizable value, just one that multiple people edit regularly and want it to be more simple.

If we decide to prioritize per-plugin timeouts, cleanup logic, etc then I think we should bring up these ideas again though because I do acknowledge there are surely some improvements that could be made.

One of the most common config settings that may change between
different environments and test cases is the amount of time
Sonobuoy will wait for results. Currently, this value is configurable
but it is in a nested config structure and not terribly well
documented at this time.

Personally, in the past, I had projects where we had to start
using a sonobuoy config-based flow instead of flags due solely
to this field.

This change adds the flag to make it more easy to set.

Fixes #482

Signed-off-by: John Schnake <[email protected]>
@johnSchnake johnSchnake merged commit aaa5e13 into vmware-tanzu:master Apr 26, 2019
@johnSchnake johnSchnake deleted the timeoutFlag branch April 26, 2019 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sonobuoy timeout configurable from the CLI
3 participants