-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simple e2e test #323
Conversation
Hi @xueweiz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @wangzhen127 |
/cc @krzyzacy |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding e2e tests! I left a few comments.
9b5b2ed
to
b7fb568
Compare
Hi Zhen, thanks for the review! I just address your comments. |
6599e5d
to
f04baab
Compare
Hi Zhen, I just fixed the above problems and added some printing for debugging (useful when test fails). |
31fc50f
to
872727b
Compare
12bedf4
to
66621dd
Compare
f0563f1
to
222246b
Compare
I just added some retry logic in the test, so that it will give NPD enough time to fully start. |
test/e2e/metriconly/metrics_test.go
Outdated
}) | ||
|
||
ginkgo.AfterEach(func() { | ||
npdMetrics := instance.RunCommand("curl http://localhost:20257/metrics") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this and below printing journal log for debugging? Would it make the output look too verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these prints are for debugging. I think it's reasonable to print out the NPD reported metrics. Because after all, they are the key subject that we are asserting against (e.g. NPD has this metric and that metric...).
In my opinion, the output looks OK for now. See example at https://gubernator-internal.googleplex.com/build/gke-prow/logs/ci-npd-e2e-cos-experimental/1161702249425014784/
I printed out the journal logs, mainly to catch invalid config file problem. i.e. The case where someone changed some config / arguments and made NPD panic just when starting.
However, I totally agree that it's best if we do not print out journal logs. They are not the tested subject, and are merely used for debugging the tested subject. And most importantly, they CAN potentially be large when the test gets more complicated.
I think the best practice for uploading large debugging text in testing, is to tar these texts and upload them to some test artifact storage. Do you know whether this can be done with Prow? Can you point me to some code sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting those into artifacts? For example, current NPD k8s e2e test stores node-problem-detector.log as one of the artifact. See the following artifacts from a CI job.
https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-npd-e2e-kubernetes-gce-gci/1161681249476022272/artifacts/e2e-3e359a827e-7abb6-minion-group-7pkc/
@krzyzacy may know how to set that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I believe this is how the artifact was setup:
https://github.com/kubernetes/test-infra/blob/master/prow/pod-utilities.md#what-the-test-container-can-expect
Experimenting now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice it worked :)
See this example: https://gubernator-internal.googleplex.com/build/gke-prow/logs/ci-npd-e2e-cos-experimental/1161754845946843136/
test/e2e/metriconly/metrics_test.go
Outdated
} | ||
|
||
// Wait for NPD to be ready for a maximum of 120 seconds. | ||
err = retry.Do(verifyUptimeMetric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some generic way of waiting for NPD coming up, put that into some function and call it after npd.SetupNPD
.
If you think waiting for uptime metric is the easiest way to check NPD is up, then put this into that function, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Added WaitForNPD()
function.
I don't want to hard code the "ready signal" (e.g. host_uptime metric) in the function. Mainly because NPD is configurable. We might test it under different configurations, which will yield different ready signals.
I'm also think about maybe we should use the systemd service property as another ready signal. But in my opinion, those are minor details, and we can improve them along the way.
3555124
to
c143b99
Compare
Hi Zhen, I just added the wait function, and uploaded these debugging data as test artifacts. Could you help take another look? Thanks! |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits on naming
The first test is a very simple test. It installs NPD on a VM, and then verifies that NPD reports metric host_uptime in Prometheus format.
Hi Zhen, I just fixed the artifact names :) Could you help take another look? Thanks! |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wangzhen127, xueweiz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest This is the above failure:
I feel like it's an infra problem. Retrying. |
This PR adds an e2e test to verify that NPD will report metric
host_uptime
in Prometheus format on a clean VM. This is part of #296I verified this test locally:
I also have an internal Prow job running this test to verify that it works in Prow environment.
See the continuous Prow jobs here, a sample run result here, and a testgrid setup here.
I will later publish the Prow pipeline setup publicly after we have this PR merged, so that we will have a public CI job and testgrid.