Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test harness survey and notes #2222

Closed
jpeach opened this issue Feb 12, 2020 · 4 comments
Closed

Integration test harness survey and notes #2222

jpeach opened this issue Feb 12, 2020 · 4 comments
Assignees
Labels
area/testing Issues or PRs related to tests or testing tools.

Comments

@jpeach
Copy link
Contributor

jpeach commented Feb 12, 2020

This issue captures notes, requirements and proposals about an integration test harness.

@jpeach jpeach added the area/testing Issues or PRs related to tests or testing tools. label Feb 12, 2020
@jpeach jpeach self-assigned this Feb 12, 2020
@jpeach
Copy link
Contributor Author

jpeach commented Feb 12, 2020

xref #1155

@jpeach
Copy link
Contributor Author

jpeach commented Feb 12, 2020

Survey of Existing Ingress Controller Integration Tests

Knative Serving

  • Relatively fat helper library infrastructure driven by Go tests.

  • Ad-hoc testing. That is, no structured way to write tests. Tests are
    driven from the Kubernetes API.

  • No support for capturing debug information.

Nginx Ingress

  • Tests initiated from shell script ./test/e22/run.sh, which deploys to
    kind and starts testing:

    • Test runner is build/run-e2e-suite.sh. This runs the "e2e" task from
      the "nginx-ingress-controller:e2e" container image with "kubectl run".

    • Uses kustomize to configure deployment types.

    • Ultimately, all this scaffolding runs a stand-alone Go binary that
      contains a Ginkgo test suite (builds with "ginkgo build").

    • Terrible operator UX, inherited from ginkgo. No way to list test or
      explore what the test suite will do. Running in a bad dev environment
      just pukes unreadable failure messages everywhere.

  • Need to enable Docker experimental features for the "buildx" subcommand.

  • Unable to get test environment tooling to work on MacOS. Why does
    the e2e run script deploy to kind but the "dev-env" make target use
    minikube? WTF.

  • Tests are written in Ginkgo, with a library of local helpers. The
    framework.Framework struct contains common APIs, helpers and
    test expectations. Also gathers diagnostics on test failure.

    • Test framework hides boilerplate deployments for echo server, GRPC
      server and other useful paraphenalia.

    • Test checks include scraping config from inside the NGiNX pods,
      which seems pretty dubious, but perhaps required for the pass-thru
      config approach.

    • Ginkgo lets test be relatively cleanly formed and have a predictable
      structure.

Gloo

  • Tests for ingress, knative and Gloo gateway scenarios in test/kube2e.

  • Very small number of tests, consisting of some shell scripts
    wrapped around ginkgo test code.

  • No significant diagnostics.

Kourier

https://github.com/3scale/kourier

  • Uses the Knative integration tests.

Ambassador

https://github.com/datawire/ambassador

  • Uses KAT (Kubernetes Acceptance Test) framework. Written in Python
    and driven by py.test.

    https://github.com/datawire/ambassador/blob/master/docs/kat/tutorial.rs),
    https://docs.pytest.org/en/latest/

  • Test cases consist of the following components:

    • Initialisation: Tests are Python subclasses so they can set
      themselves up in the constructor.

    • Manifests: A chunk of YAML to apply to the cluster. The test
      library has a set of common default YAML constants for tests to
      use. Manifests are python string templates that are expanded at
      application time by the test.

    • Config: The config method also emits a chunk of YAML, but it's
      purpose is to configure a deployed Ambassador.

    • Requirements: A list of Kubernetes resources that need to be ready
      before the actual test can start.

    • Queries: This method returns a list of HTTP Query objects. The
      harness will perform the specified HTTP request and track the
      results which are available to the check. Since queries are
      object, they can specify arbitrary parameters (TLS, SNI, URL,
      timeout, etc.). Queries can be grouped in phases and with a
      harness delay between.

    • Checks: The check method is used to run arbitrary unstructured
      tests against the test state. Typically using the Python "assert"
      keyword. Checks run after queries.

  • In most cases, the test methods return generators. Maybe this is just
    conventionally Pythonic, but it is an interesting approach to keeping
    the test driver simple while giving the test flexibility.

  • Tests can be parameterized and composed (by aggregation). So,
    given tests named A and B, it is possible to compose a new test, C,
    consisting of A(1), A(2), B(3).

  • Uses httpbin as a backend Service. There are additional kat-client
    and kat-server commands, which are packaged into container images but
    are not used in tests AFAICT. Interesting that kat-server serves both
    HTTP and GRPC.

  • Developer instructions are not especially clear and user experience
    is weak. Without some Unix build systems and Python experience it will
    be very hard to run the tests.

    $ make pytest DEV_KUBECONFIG=/Users/jpeach/.kube/config DEV_REGISTRY=docker.io/jpeach

    https://gist.github.com/jpeach/fd53248a9b76cbf54fcac7b655975542

  • Poor user experience for debugging test failures. You get a Python
    RuntimeError exception on kubectl exiting with non-zero status and get
    to pick up the pieces.

  • Since pytest is the test driver, the project assumes a lot pytest user
    knowledge, which is a hurdle.

CSI Conformance testing

Most relevant example for testing core Kubernetes API extension points.

https://kubernetes-csi.github.io/docs/functional-testing.html

kube-bench (CIS checks)

  • Probably better to think of this as running "checks" rather than
    "tests", but I'll forget and use the terms interchangeably.

  • Tests are grouped and uniquely numbered (in the spec). Seems pretty
    helpful to have a unique ID for tests. Could be used to link testable
    statements from the docs.

  • kube-bench has to run on the host it is checking (i.e. on a master
    or node host). It doesn't embed the check config, which needs to be
    distributed along with the binary.

  • Skip checks by editing the YAML definition. Seems oriented
    towards people forking the repo and committing site changes.

  • The check config YAML is unmarshalled to an internal controls.Controls
    type, which is an uncomfortable agglomeration of data format and API.

  • Actual checks are defined in YAML. The check itself is a shell command
    that is specified in the "audit" parameter. The output of this shell
    command is fed into the subsequent tests, which are a series of string
    matchers defined in YAML. It is actually surprisingly clunky, though
    the YAML is relatively readable.

  • The whole policies suite is marked "manual", because there's no real
    capability to inspect Kubernetes APi directly. Also some of the policies
    aren't testable (e.g. minimize access to foo).

@jpeach
Copy link
Contributor Author

jpeach commented Feb 12, 2020

There are many harnesses that run end-to-end tests against Kubernetes
clusters. This note collects my thoughts about how a Kubernetes test
harness should work.

Tests should be Declarative

Some projects write tests directly in Go code. The tests are driven by
the Go test runner and rely on a suite of internal helper APIs to reduce
the amount of boilerplate code. There are three primary problems with
this approach:

  1. Test development is only accessible to project programmers
  2. Too much time is spent learning internal frameworks
  3. It's too hard to understand what tests are doing

Problem (1) reduces the audience of people who are likely to build
tests. Problem (2) increases the barriers to entry further, since
contributors have to learn bespoke internal APIs to make progress.

A better approach is to express the test in a declarative DSL or data
format. A special-purpose tool should execute the test and deliver
results. The separation of the tool from the tests allows multiple
projects to develop test suites independently (obviously this assumes
the tool has good compatibility standards).

Tests should be Stepped

It is common for open-coded tests to just run actions and perform checks
with no formal separation between stages. This results in an open-ended
debugging process since it is not possible to stop the test at a desired
point, nor to easily add instrumentation or additional checks.

Instead, if the test is expressed as a sequence of steps, the harness
can pause or stop running the test at any step. Steps can be executed at
an arbitrary rate, or reordered as runtime (subject to data dependencies).

Test steps can either be actions (applying an observable change to the
cluster) or checks (verifying an expected state in the cluster).
Steps can easily be reported to a variety of outputs (test log,
CLI, web UI) so the operator can observe status.

Tests should be Debuggable

It is typical to debug Kubernetes end-to-end tests by hacking the test
code and supporting APIs to log additional state and progress. The Go
test runner has particularly weak support for logging (usually no logs
are emitted at all until the test has completed with a failure).

A test framework should be able to inspect the state of tests enough
that it can capture and emit information that can help developers triage
test failures. This information might include the state of Kubernetes
API objects, logs from important pods, HTTP requests and responses,
the outcome of specific checks, and so on. Information capture is much
easier when tests are structured as sequences of steps since the steps
create natural capture boundaries.

Test steps should be Observable

There are many kinds of observability. The questions that a test harness
really needs to be able to answer are around "what is happinging now" and
"what went wrong". Usually, the harness is executing either and action
or a step, and this status can be reported to the user. If a step fails,
this is where the harness needs visibility into checks and actions so
that it can generate enough information to illuminate the failure.

Some kinds of test runners have little insight into what the test is
doing, e.g. observing the exit status of a child process. That is not
sufficient for our purpose here. If the harness cannot observe what a
test step is doing, then is it hobbled when it needs to collect debug
information. So the requirement here is that the test harness should
deeply understand the actions taken at each step.

Test Action Types

Test actions are steps that are intended to alter the state of the
Kubernetes cluster. The most direct alterations can be made by using
the Kubernetes API, but we can imagine actions that operate on the
underlying infrastructure (e.g. kill a machine) or operate on external
state (e.g. create a target for an externalName service).

Focusing on the Kubernetes API, the harness should be able to perform
the following actions:

  • Create an object
  • Delete an object
  • Update an object

The obvious way to declaratively express creating a Kubernetes object is
with a chunk of YAML. There are various approaches (see kapp, kustomize,
kubectl) to applying YAML and checking for status.

Test actions can be expected to either succeed or fail. Successfully
applying YAML is the normal case, but it is reasonable to expect failure
so that boundary conditions can be tested. Failures may be a direct API
server response (e.g. validation failure) or a subsequent failure that
is externally observable.

To test the result of a Kubernetes API action, tooling needs to understand
something about the type to be able to know status of a Kubernetes
object. This means that status detection needs to be built as a library
that knows (in principle) about all the types under test. The kustomize
kstatus library may be a good start, and we may be able to develop common
rules for knows API groups (e.g. anything knative).

Since YAML can be verbose, the test suite could support a library of
predefined objects. This, unfortunately, implies that object names need
to be uniquified and then propagated to subsequent object references. This
risks wading into the swamp of Kuberneted YAML-wrangling tools.

There are a number of ways to update existing objects. In many cases,
a Kubernetes strategic merge patch is enough to express the object
update. However, as kustomize also supporting RFC 6902 JSON patches shows,
strategic merges don't support all useful types of updates.

There's no existing YAML syntax to delete objects.

Test Check Types

In the most general case, checks are arbitrary tests executed against
the running cluster. Since checks are arbitrary, they could be just
raw Go code, but we can make them more declarative by using the Rego
language. This is a declarative syntax that allows the test harness to
provide built-in functions and data. There are already a number of tools
that apply Rego to Kubernetes objects.

For Ingress controllers, it is essential that checks are able to be
expressed as HTTP requests. This could be implicit (as part of the Rego
execution environment) or explicit (i.e. a declarative HTTP request). HTTP
requests also need to be expressible as sequences so that tests such as
"service F receives 20% of requests" can be implemented.

Check Timing Issues

All the systems involved in a Kubernetes cluster are eventually
consistent, so the checks need to be resilient to changes in timing. For
example, a check that probes for a certain HTTP response may initially
fail because the underlying service is not yet ready. Testing the status
of a Kubernetes object will fail immediately after an action, but the
check should eventually converge to success. The test harness needs to
be cognizant of this and implicitly retry the checks with a time bound.

In some cases, there may be deterministic conditions that can be tested
after applying an action. In these cases, we can synchronize on the
condition before applying the checks. Synchronizing on a condition
could be implicit, or it could be expressed as a check itself.

In other cases, we are testing some delayed or emergent effect of an
action. We need to be able to write a check that will succeed but is
tolerant of some initial failure. For example, a HTTP request to service
A succeeds within some timeout. These checks need to be careful of false
positives where is is possible for the check to run before the action
has been processed.

Test Context

To be able to write tests in a generic way, the harness needs to be able
to inject various kinds of test context. For example, a unique test ID
that can be used to generate HTTP requests. This mixture of static and
dynamic metadata could be used in Go templating, or directly injected
into runtime evaluations of Rego expressions or HTTP requests.

Kubernetes Test metadata

The test harnesss should annotate any Kubernetes objects that it creates
with a standard set of metadata. At minimum, we need to know that the
object was created by the harness. The specific test and test run may
also be useful metadata.

Any standard objects that are created as side-effects of the test harness
also need to be labeled. This means that the harness should recurse into
pod spec templates and inject test annotations.

There are a number of possible uses for Kubernetes metadata:

- clean up state after test runs
- examine objects for test triage
- use as input for checks

Sample Tests

This test that ensures that an HTTP service is resiliant to the
termination of its underlying pods.

Action
- Deploy Service A with 2 pods (round robin load balancing)
- Deploy a HTTPProxy targeting Service A
Check
- HTTP response from pod A.1
- HTTP response from pod A.2
Action
- Kill a Service pod
Check
- HTTP response from pod A.1 only
Action
- Wait for 2nd pod to reschedule
Check
- HTTP response from pod A.1
- HTTP response from pod A.3

This test ensures that traffic weighting works as expected.

Action
- Deploy Service A
- Deploy Service B
- Deploy a HTTPPoxy targeting A for 80% and B for 20%
Check
- Run 100 HTTP requests
- Verify weighting from responses

Resources

@jpeach
Copy link
Contributor Author

jpeach commented Mar 25, 2020

Closing, since there's no associated action here.

@jpeach jpeach closed this as completed Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to tests or testing tools.
Projects
None yet
Development

No branches or pull requests

1 participant