automated system tests #229

rade · 2014-11-24T11:30:06Z

We should have some integration / full-system tests. A few things to cover, vaguely in order of priority

basic weave * command execution. Designed to catch silly mistakes we may make in the weave script. Which has happened. Single node suffices.
weave network testing. This does require multiple nodes. Most basic scenario is firing up two containers across hosts and making sure they can ping each other. But ultimately we should have tests for all weave features. We also need to include some qualitative performance tests, to catch things like forgetting to run ethtool and hence ending up with a working but abysmally performing network.
weavedns testing. ditto.
kernel versions. Would be good to test all the above across different kernel versions so we get alerted when we inadvertently introduce functionality that depends on specific kernels. We've been caught out by this already.
docker versions. ditto.
library versions. We should have tests that check our code works with the latest version of the various libs we depend on. We have been caught out by this. See also reproducible builds #228.

I see this specific issue as an umbrella for ideas; testing tends to be an open-ended issue where we can always come up with ideas for improvement. So let's record them here and then split out "actionable" pieces into their own issues.

Also, note that this issue is not about CI.

The text was updated successfully, but these errors were encountered:

squaremo · 2014-11-24T15:47:53Z

I have started by just assuming two VMs running docker that I can copy the weave images to, and scripting via ssh. This can get as far as smoke tests, though it's pretty slow (every test will involve starting and stopping the docker daemon, probably).

inercia · 2014-11-24T22:38:34Z

For more advanced testing environments, I would try to establish a set of testing topologies with some open-source network simulation platform. I have used VNX in the past with success, creating a topology of KVM nodes that can use multiple root FSs, but things like Clownix also look good...

Then you could run the tests with something like Fabric (or any other tool that could establish multiple ssh connections and run commands)...

rade · 2014-11-24T22:47:21Z

a yes. I forgot one item on my list...

network topologies, firewalls and other networking-related craziness. This partially touches on (2), since coping with these is a feature of weave.

hesco · 2014-12-04T16:51:20Z

All of this sounds like important investments towards retiring the technical debt. But I'm curious why you exclude the idea of CI automation. I cannot imagine you want to build up this extensive regression suite only to be never used because it comes in the form of a recipe which must be manually performed on each use. I'm not saying that this expensive suite of integration tests ought to be run on every commit. That is for linting, static code checks and unit tests. But these integration tests should be run on any commit promoted for potential tagging and release. And they should be available for any developer who is working on changes which are feared to impact integration and introduce regression.

bboreham · 2014-12-22T09:38:12Z

Variant of number 1: test error conditions

rade · 2015-01-16T11:04:38Z

It occurs to me that we could test all kinds of weave functionality by running a bunch of weave router containers on the same machine. This is lightweight - allowing us to simulate large(ish) networks and iterate quickly through different configurations.

Things like connection handling, topology, gossip can be tested that way. Basically anything that doesn't actually require capturing/injecting packets. We could test most of the IP assignment too that way though would have to simulate the request/release.

Local iptable rules could simulate various network topologies and failures. We could have a generator for overlay topologies, i.e. a bunch of peers and the possible connectivity between them (tcp, udp one way, udp the other). Translate that into iptable rules.

dpw · 2015-01-16T11:19:02Z

My experience of testing distributed systems is that most of the work is in testing failure conditions (and that any recovery happens as expected). These tend to involve timeouts. So to test with adequate coverage, you need to be able to test with simulated time. (Bryan did mention that he has already done something like this in the tests for the new gossip protocol.)

If we are doing thorough testing, I'd strongly recommend investing effort in that direction. I.e., simulate the network and time, and run multiple instances of weave (or some subset of its components) within a single process.

At the other end of the spectrum, we have the smoke tests. We should continue to enhance these, because they are a full system test, including the shell script. But I expect they will always be "wide but shallow".

rade · 2015-02-17T12:29:30Z

Also, note that this issue is not about CI.

#397 is.

rade added the chore label Nov 24, 2014

awh mentioned this issue Feb 25, 2015

Tear down connections on prolonged loss of UDP heartbeat #413

Merged

achanda mentioned this issue Mar 2, 2015

Add some basic integration tests #425

Closed

rade mentioned this issue Nov 16, 2016

Epic - Improve development & testing infrastructure #2647

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automated system tests #229

automated system tests #229

rade commented Nov 24, 2014

squaremo commented Nov 24, 2014

inercia commented Nov 24, 2014

rade commented Nov 24, 2014

hesco commented Dec 4, 2014

bboreham commented Dec 22, 2014

rade commented Jan 16, 2015

dpw commented Jan 16, 2015

rade commented Feb 17, 2015

automated system tests #229

automated system tests #229

Comments

rade commented Nov 24, 2014

squaremo commented Nov 24, 2014

inercia commented Nov 24, 2014

rade commented Nov 24, 2014

hesco commented Dec 4, 2014

bboreham commented Dec 22, 2014

rade commented Jan 16, 2015

dpw commented Jan 16, 2015

rade commented Feb 17, 2015