Add a simple e2e test #323

xueweiz · 2019-08-10T00:22:31Z

This PR adds an e2e test to verify that NPD will report metric host_uptime in Prometheus format on a clean VM. This is part of #296

I verified this test locally:

ZONE=us-central1-a PROJECT=xueweiz-experimental \
VM_IMAGE=cos-73-11647-217-0 IMAGE_PROJECT=cos-cloud \
SSH_USER=${USER} SSH_KEY=~/.ssh/id_rsa make e2e-test

I also have an internal Prow job running this test to verify that it works in Prow environment.
See the continuous Prow jobs here, a sample run result here, and a testgrid setup here.

I will later publish the Prow pipeline setup publicly after we have this PR merged, so that we will have a public CI job and testgrid.

k8s-ci-robot · 2019-08-10T00:22:39Z

Hi @xueweiz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xueweiz · 2019-08-10T00:23:48Z

/assign @wangzhen127
/cc @Random-Liu
/cc @andyxning

xueweiz · 2019-08-10T00:28:29Z

/cc @krzyzacy
And huge thanks to Sen for helping out with the test design & setup!

test/e2e/containers/prow-npd-e2e/Makefile

test/e2e/containers/prow-npd-e2e/script.sh

andyxning · 2019-08-11T01:51:47Z

/ok-to-test

wangzhen127

Thanks for adding e2e tests! I left a few comments.

config/systemd/node-problem-detector-standalone.service

Makefile

test/install.sh

test/e2e/lib/gce/gce.go

test/e2e/lib/npd/npd.go

test/e2e/standalone/basic_metrics_test.go

test/e2e/standalone/e2e_npd_test.go

test/install.sh

Makefile

test/e2e/containers/prow-npd-e2e/Dockerfile

xueweiz · 2019-08-13T00:13:50Z

Hi Zhen, thanks for the review! I just address your comments.

Makefile

pkg/util/metrics/helpers.go

test/e2e/lib/gce/gce.go

test/e2e/lib/npd/npd.go

xueweiz · 2019-08-13T20:12:29Z

Hi Zhen, I just fixed the above problems and added some printing for debugging (useful when test fails).

xueweiz · 2019-08-14T00:53:47Z

I just added some retry logic in the test, so that it will give NPD enough time to fully start.
Can you help take a look again? Thanks! :)

wangzhen127 · 2019-08-14T11:04:32Z

test/e2e/metriconly/metrics_test.go

+	})
+
+	ginkgo.AfterEach(func() {
+		npdMetrics := instance.RunCommand("curl http://localhost:20257/metrics")


Is this and below printing journal log for debugging? Would it make the output look too verbose?

Yes, these prints are for debugging. I think it's reasonable to print out the NPD reported metrics. Because after all, they are the key subject that we are asserting against (e.g. NPD has this metric and that metric...).

In my opinion, the output looks OK for now. See example at https://gubernator-internal.googleplex.com/build/gke-prow/logs/ci-npd-e2e-cos-experimental/1161702249425014784/

I printed out the journal logs, mainly to catch invalid config file problem. i.e. The case where someone changed some config / arguments and made NPD panic just when starting.

However, I totally agree that it's best if we do not print out journal logs. They are not the tested subject, and are merely used for debugging the tested subject. And most importantly, they CAN potentially be large when the test gets more complicated.

I think the best practice for uploading large debugging text in testing, is to tar these texts and upload them to some test artifact storage. Do you know whether this can be done with Prow? Can you point me to some code sample?

How about putting those into artifacts? For example, current NPD k8s e2e test stores node-problem-detector.log as one of the artifact. See the following artifacts from a CI job.
https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-npd-e2e-kubernetes-gce-gci/1161681249476022272/artifacts/e2e-3e359a827e-7abb6-minion-group-7pkc/

@krzyzacy may know how to set that up.

Thanks!
I believe this is how the artifact was setup:
https://github.com/kubernetes/test-infra/blob/master/prow/pod-utilities.md#what-the-test-container-can-expect

Experimenting now.

Nice it worked :)
See this example: https://gubernator-internal.googleplex.com/build/gke-prow/logs/ci-npd-e2e-cos-experimental/1161754845946843136/

And a screenshot:

wangzhen127 · 2019-08-14T11:10:45Z

test/e2e/metriconly/metrics_test.go

+			}
+
+			// Wait for NPD to be ready for a maximum of 120 seconds.
+			err = retry.Do(verifyUptimeMetric,


I think we need some generic way of waiting for NPD coming up, put that into some function and call it after npd.SetupNPD.

If you think waiting for uptime metric is the easiest way to check NPD is up, then put this into that function, too.

SG. Added WaitForNPD() function.

I don't want to hard code the "ready signal" (e.g. host_uptime metric) in the function. Mainly because NPD is configurable. We might test it under different configurations, which will yield different ready signals.

I'm also think about maybe we should use the systemd service property as another ready signal. But in my opinion, those are minor details, and we can improve them along the way.

xueweiz · 2019-08-14T21:51:48Z

Hi Zhen, I just added the wait function, and uploaded these debugging data as test artifacts. Could you help take another look? Thanks!

xueweiz · 2019-08-14T22:01:07Z

The above test failures (1 2) seems to be some test infra problem. Perhaps the docker-in-docker container has a bug in it.

Let's ignore it for now.

xueweiz · 2019-08-15T18:26:07Z

/retest

wangzhen127

Just some nits on naming

test/e2e/lib/npd/npd.go

The first test is a very simple test. It installs NPD on a VM, and then verifies that NPD reports metric host_uptime in Prometheus format.

xueweiz · 2019-08-16T20:02:31Z

Hi Zhen, I just fixed the artifact names :) Could you help take another look? Thanks!

wangzhen127 · 2019-08-16T20:37:31Z

/lgtm

k8s-ci-robot · 2019-08-16T20:37:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wangzhen127, xueweiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wangzhen127]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xueweiz · 2019-08-16T21:22:10Z

/retest

This is the above failure:

$ git fetch https://github.com/kubernetes/node-problem-detector.git master
fatal: unable to access 'https://github.com/kubernetes/node-problem-detector.git/': Could not resolve host: github.com
# Error: exit status 128

I feel like it's an infra problem. Retrying.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 10, 2019

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 10, 2019

k8s-ci-robot requested review from dchen1107 and wangzhen127 August 10, 2019 00:22

k8s-ci-robot assigned wangzhen127 Aug 10, 2019

k8s-ci-robot requested review from andyxning and Random-Liu August 10, 2019 00:23

xueweiz mentioned this pull request Aug 10, 2019

vendor changes for e2e tests #324

Closed

k8s-ci-robot requested a review from krzyzacy August 10, 2019 00:28

xueweiz commented Aug 10, 2019

View reviewed changes

test/e2e/containers/prow-npd-e2e/Makefile Outdated Show resolved Hide resolved

test/e2e/containers/prow-npd-e2e/script.sh Outdated Show resolved Hide resolved

xueweiz force-pushed the test branch from a85fd36 to fb9a856 Compare August 10, 2019 01:12

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 11, 2019

wangzhen127 reviewed Aug 12, 2019

View reviewed changes

xueweiz force-pushed the test branch 2 times, most recently from 9b5b2ed to b7fb568 Compare August 13, 2019 00:12

wangzhen127 reviewed Aug 13, 2019

View reviewed changes

Makefile Show resolved Hide resolved

pkg/util/metrics/helpers.go Outdated Show resolved Hide resolved

test/e2e/lib/gce/gce.go Show resolved Hide resolved

test/e2e/lib/npd/npd.go Show resolved Hide resolved

xueweiz force-pushed the test branch 2 times, most recently from 6599e5d to f04baab Compare August 13, 2019 20:11

xueweiz force-pushed the test branch 4 times, most recently from 31fc50f to 872727b Compare August 13, 2019 21:30

xueweiz force-pushed the test branch 4 times, most recently from 12bedf4 to 66621dd Compare August 13, 2019 23:17

vendor changes for e2e tests

db2dbd1

xueweiz force-pushed the test branch 2 times, most recently from f0563f1 to 222246b Compare August 14, 2019 00:36

wangzhen127 reviewed Aug 14, 2019

View reviewed changes

xueweiz force-pushed the test branch 2 times, most recently from 3555124 to c143b99 Compare August 14, 2019 21:21

xueweiz force-pushed the test branch from c143b99 to 68d3d50 Compare August 14, 2019 21:53

wangzhen127 reviewed Aug 15, 2019

View reviewed changes

test/e2e/lib/npd/npd.go Outdated Show resolved Hide resolved

test/e2e/lib/npd/npd.go Outdated Show resolved Hide resolved

Add e2e test for NPD

f9b5e60

The first test is a very simple test. It installs NPD on a VM, and then verifies that NPD reports metric host_uptime in Prometheus format.

xueweiz force-pushed the test branch from 68d3d50 to f9b5e60 Compare August 16, 2019 20:01

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2019

k8s-ci-robot merged commit 424b864 into kubernetes:master Aug 16, 2019

xueweiz mentioned this pull request Sep 18, 2019

Add e2e test for NPD metric-only mode #296

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a simple e2e test #323

Add a simple e2e test #323

xueweiz commented Aug 10, 2019 •

edited

Loading

k8s-ci-robot commented Aug 10, 2019

xueweiz commented Aug 10, 2019

xueweiz commented Aug 10, 2019

andyxning commented Aug 11, 2019

wangzhen127 left a comment

xueweiz commented Aug 13, 2019

xueweiz commented Aug 13, 2019

xueweiz commented Aug 14, 2019

wangzhen127 Aug 14, 2019

xueweiz Aug 14, 2019

wangzhen127 Aug 14, 2019

xueweiz Aug 14, 2019

xueweiz Aug 14, 2019

wangzhen127 Aug 14, 2019

xueweiz Aug 14, 2019

xueweiz commented Aug 14, 2019

xueweiz commented Aug 14, 2019

xueweiz commented Aug 15, 2019

wangzhen127 left a comment

xueweiz commented Aug 16, 2019

wangzhen127 commented Aug 16, 2019

k8s-ci-robot commented Aug 16, 2019

xueweiz commented Aug 16, 2019

Add a simple e2e test #323

Add a simple e2e test #323

Conversation

xueweiz commented Aug 10, 2019 • edited Loading

k8s-ci-robot commented Aug 10, 2019

xueweiz commented Aug 10, 2019

xueweiz commented Aug 10, 2019

andyxning commented Aug 11, 2019

wangzhen127 left a comment

Choose a reason for hiding this comment

xueweiz commented Aug 13, 2019

xueweiz commented Aug 13, 2019

xueweiz commented Aug 14, 2019

wangzhen127 Aug 14, 2019

Choose a reason for hiding this comment

xueweiz Aug 14, 2019

Choose a reason for hiding this comment

wangzhen127 Aug 14, 2019

Choose a reason for hiding this comment

xueweiz Aug 14, 2019

Choose a reason for hiding this comment

xueweiz Aug 14, 2019

Choose a reason for hiding this comment

wangzhen127 Aug 14, 2019

Choose a reason for hiding this comment

xueweiz Aug 14, 2019

Choose a reason for hiding this comment

xueweiz commented Aug 14, 2019

xueweiz commented Aug 14, 2019

xueweiz commented Aug 15, 2019

wangzhen127 left a comment

Choose a reason for hiding this comment

xueweiz commented Aug 16, 2019

wangzhen127 commented Aug 16, 2019

k8s-ci-robot commented Aug 16, 2019

xueweiz commented Aug 16, 2019

xueweiz commented Aug 10, 2019 •

edited

Loading