-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes_connect: add timeout settings #10
Conversation
@cben Cannot apply the following labels because they are not recognized: fine/yes (actually we plan a 5.7.1 hotfix but i suppose we also want normal backports, so customer uprading will not lose hotfix functionality?) |
https://bugzilla.redhat.com/show_bug.cgi?id=1440950 |
@cben will our gem dependency in euwe bring in the new gem automatically or will we need to upgrade it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cben once the gem is release put a PR in to https://github.com/ManageIQ/manageiq-gems-pending to change https://github.com/ManageIQ/manageiq-gems-pending/blob/6ddfbb62b1178d54bd16b020d171c679bd602c5a/manageiq-gems-pending.gemspec#L42 so it will pick up the new gem.
we have strict |
@cben any luck getting a new gem released? |
@cben @agrare gem is released here: https://rubygems.org/gems/kubeclient/versions/2.4.0 |
config/settings.yml
Outdated
@@ -3,6 +3,8 @@ | |||
:ems_kubernetes: | |||
:event_handling: | |||
:event_groups: | |||
:open_timeout: 60.seconds | |||
:read_timeout: 60.seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default read_timeout in ruby / restclient was always 60. (open_timeout was infinite in ruby 2.2 but afaict euwe appliance was already ruby 2.3)
I was never able to account for the "2 minutes" number.
- "60 + 60 = 120" is not a plausible explanation, doesn't take a minute to establish tcp + tls.
- Curl showed server takes >2min but that doesn't mean anything, client may timeout after 1min.
- IIRC the source for us believing we timeout at 2 minutes is log lines ~2min apart. But we don't have per-request log lines, we have something like "start of refresh – error = 2min"... there are many requests before images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn. Customer log strongly suggests timeout was 2min.
It contains >70 timeouts, and times are very stable: 9–12sec from first connect (/api) to second connect (/oapi) then 126–128sec to timeout.
This agrees perfrect with per-request timing in VCR: all /api requests total 11sec, /oapi requests without images total 7sec.
Gonna simulate/reproduce a slow server and measure actual timeout before & after patch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> It takes 2 minutes on euwe with old kubeclient too. Nothing changed here.
The reason turned out to be that ruby's Net::HTTP unconditionally retries requests that are supposed to be idempotent (e.g. GET, DELETE but not POST)
[ankane/the-ultimate-guide-to-ruby-timeouts#8, https://bugs.ruby-lang.org/issues/10674]
@simon3z ready for merge.
@cben let me know when this is ready. |
This pull request is not mergeable. Please rebase and repush. |
config/settings.yml
Outdated
:http_proxy: | ||
:kubernetes: | ||
:host: | ||
:password: | ||
:port: | ||
:user: | ||
:container_scanning: | ||
:scanning_job_timeout: 20.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cben really? 😮 I guess you needed a more careful rebase 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoa thanks fixed.
need to revise my mergetool config, meld without --auto-merge
is error prone...
Relies on kubeclient 2.4 bumped in ManageIQ/manageiq-gems-pending#156.
Checked commit cben@e74b251 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
ManageIQ/manageiq-providers-kubernetes#10 from cben/kubeclient-timeout kubernetes_connect: add timeout settings (cherry picked from merge commit ManageIQ/manageiq-providers-kubernetes@1ee90b5) openshift_connect: use kubernetes timeout settings (cherry picked from unmerged ManageIQ/manageiq-providers-openshift#8 - unnecessary on master but required in backports) Requires kubeclient >= 2.4.0
bump kubeclient ~> 2.4.0 (ported from manageiq-gems-pending.gemspec to gems/pending/Gemfile) - Merge ManageIQ/manageiq-providers-kubernetes#10 kubernetes_connect: add timeout settings (cherry picked from merge commit ManageIQ/manageiq-providers-kubernetes@1ee90b5) - openshift_connect: use kubernetes timeout settings (cherry picked from unmerged ManageIQ/manageiq-providers-openshift#8 - unnecessary on master but required in backports)
@cben Marking as |
@miq-bot remove-label euwe/conflict |
Backported to Euwe via ManageIQ/manageiq#15188 |
Backported to Fine via ManageIQ/manageiq#15090 |
Ability to constrol kubeclient timeouts from settings.yml
openshift half:
ManageIQ/manageiq-providers-openshift#8unnecessary on master given ManageIQ/manageiq-providers-openshift#7, but will add it in backports.Tested:
https://bugzilla.redhat.com/show_bug.cgi?id=1440950
@miq-bot add-label euwe/yes, fine/yes (actually we plan a 5.7.1 hotfix but I suppose we also want normal backports, so customer uprading will not lose hotfix functionality?)
@moolitayer @agrare Please review.