-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky test] Tests in vendor/k8s.io/apiserver/pkg/server/genericapiserver_graceful_termination_test.go are flaking #114145
Comments
@Rajalakshmi-Girish: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Rajalakshmi-Girish: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Rajalakshmi-Girish i can't seem to find the commit, which PR is that |
Sorry for the confusion. It is the golang commit golang/go@8a81fdf |
has this an impact on https ? |
how close were ppc64le tests to the 100ms timeout before? linux and darwin is nowhere remotely close to the timeout on go1.19 or go master with this diff:
there's a ~tiny slowdown of the dial on linux, and a small slowdown on darwin:
what does that show before/after on ppc64le? |
cc @Rajalakshmi-Girish on the question about existing ppc64le timing in #114145 (comment) |
cc @kubernetes/sig-scalability - are we running scalability tests on go tip? did we see any increases in CPU or latency or decreases in throughput in the last two weeks? |
cc @marseel |
@liggitt we run scalability tests on go tip (using fixed version of K8s) but only using x64 architecture for masters and nodes in the cluster. We're also using Kubemark for scale testing go tip in K8s so the results might not be vocal enough about the regression. No visible change in the pod throughput in our tests: I looked at some of the most popular <resource, subresource, scope, verb> tuples for API call latency we measure in that test but I don't see any visible change at all as well. There might be an increase in apiserver CPU usage judging by the chart below but I'm not sure: That said, I'll edit the performance dashboard's config to include more runs to see whether the CPU usage bump is indeed a relatively fresh thing or not. |
These failures occur only when run with
But with the -race enabled, it is taking ~200ms on
With the just previous commit
|
Do these tests run with -race flag enabled? |
This is an end-to-end scalability test, not a test that one can run through I also doubt we use the |
The tests this issue mentions are the unit tests run using the Makefile, which by default has |
@liggitt
|
definitely not, |
if the impact is limited to unit tests with race detection enabled, there's not a notable production impact, but I still asked if the impact was expected or will be optimized in https://go-review.git.corp.google.com/c/go/+/326012/comments/e0d180a5_e40016bc |
looks like there's also a revert CL open at https://go-review.git.corp.google.com/c/go/+/452255 in case it was needed for performance reasons, so I asked there if this level of impact to race detection code was expected |
opened golang/go#56980 upstream |
that seems reasonable, since the point of this is not testing performance... though 200ms doesn't seem long enough given the comment in #114145 (comment) |
True :( |
what timeout is used for this same request in other places in this test? it looks like a 1 second timeout is used |
True, the same request is using a 1-second timeout at other places in this test! |
@liggitt As the same request is using 1 second timeout at other places, Can we increase it to 1s at https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/genericapiserver_graceful_termination_test.go#L860 |
that seems fine to me. |
…imeout-fail Fixes the issue #114145
Which jobs are flaking?
https://prow.ppc64le-cloud.org/job-history/s3/ppc64le-prow-logs/logs/postsubmit-master-golang-kubernetes-unit-test-ppc64le
Which tests are flaking?
Tests in
vendor/k8s.io/apiserver/pkg/server/genericapiserver_graceful_termination_test.go
are flaking when run with master golang on ppc64le.Since when has it been flaking?
After the commit golang/go@8a81fdf
Testgrid link
No response
Reason for failure (if possible)
The request to APIServer is timing out at https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/genericapiserver_graceful_termination_test.go#L861
The tests are Passing when the timeout value is increased to 200ms
The PASS after increasing timeout:
Anything else we need to know?
Seeing this falkiness only on
ppc64le
architecture and when run with golang versions after the commit8a81fdf165facdcefa06531de5af98a4db343035
flakinessRelevant SIG(s)
/sig testing
The text was updated successfully, but these errors were encountered: