Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

automated system tests #229

Open
rade opened this issue Nov 24, 2014 · 8 comments
Open

automated system tests #229

rade opened this issue Nov 24, 2014 · 8 comments
Labels

Comments

@rade
Copy link
Member

rade commented Nov 24, 2014

We should have some integration / full-system tests. A few things to cover, vaguely in order of priority

  1. basic weave * command execution. Designed to catch silly mistakes we may make in the weave script. Which has happened. Single node suffices.
  2. weave network testing. This does require multiple nodes. Most basic scenario is firing up two containers across hosts and making sure they can ping each other. But ultimately we should have tests for all weave features. We also need to include some qualitative performance tests, to catch things like forgetting to run ethtool and hence ending up with a working but abysmally performing network.
  3. weavedns testing. ditto.
  4. kernel versions. Would be good to test all the above across different kernel versions so we get alerted when we inadvertently introduce functionality that depends on specific kernels. We've been caught out by this already.
  5. docker versions. ditto.
  6. library versions. We should have tests that check our code works with the latest version of the various libs we depend on. We have been caught out by this. See also reproducible builds #228.

I see this specific issue as an umbrella for ideas; testing tends to be an open-ended issue where we can always come up with ideas for improvement. So let's record them here and then split out "actionable" pieces into their own issues.

Also, note that this issue is not about CI.

@rade rade added the chore label Nov 24, 2014
@squaremo
Copy link
Contributor

I have started by just assuming two VMs running docker that I can copy the weave images to, and scripting via ssh. This can get as far as smoke tests, though it's pretty slow (every test will involve starting and stopping the docker daemon, probably).

@inercia
Copy link
Contributor

inercia commented Nov 24, 2014

For more advanced testing environments, I would try to establish a set of testing topologies with some open-source network simulation platform. I have used VNX in the past with success, creating a topology of KVM nodes that can use multiple root FSs, but things like Clownix also look good...

Then you could run the tests with something like Fabric (or any other tool that could establish multiple ssh connections and run commands)...

@rade
Copy link
Member Author

rade commented Nov 24, 2014

a yes. I forgot one item on my list...

  • network topologies, firewalls and other networking-related craziness. This partially touches on (2), since coping with these is a feature of weave.

@hesco
Copy link

hesco commented Dec 4, 2014

All of this sounds like important investments towards retiring the technical debt. But I'm curious why you exclude the idea of CI automation. I cannot imagine you want to build up this extensive regression suite only to be never used because it comes in the form of a recipe which must be manually performed on each use. I'm not saying that this expensive suite of integration tests ought to be run on every commit. That is for linting, static code checks and unit tests. But these integration tests should be run on any commit promoted for potential tagging and release. And they should be available for any developer who is working on changes which are feared to impact integration and introduce regression.

@bboreham
Copy link
Contributor

Variant of number 1: test error conditions

@rade
Copy link
Member Author

rade commented Jan 16, 2015

It occurs to me that we could test all kinds of weave functionality by running a bunch of weave router containers on the same machine. This is lightweight - allowing us to simulate large(ish) networks and iterate quickly through different configurations.

Things like connection handling, topology, gossip can be tested that way. Basically anything that doesn't actually require capturing/injecting packets. We could test most of the IP assignment too that way though would have to simulate the request/release.

Local iptable rules could simulate various network topologies and failures. We could have a generator for overlay topologies, i.e. a bunch of peers and the possible connectivity between them (tcp, udp one way, udp the other). Translate that into iptable rules.

@dpw
Copy link
Contributor

dpw commented Jan 16, 2015

My experience of testing distributed systems is that most of the work is in testing failure conditions (and that any recovery happens as expected). These tend to involve timeouts. So to test with adequate coverage, you need to be able to test with simulated time. (Bryan did mention that he has already done something like this in the tests for the new gossip protocol.)

If we are doing thorough testing, I'd strongly recommend investing effort in that direction. I.e., simulate the network and time, and run multiple instances of weave (or some subset of its components) within a single process.

At the other end of the spectrum, we have the smoke tests. We should continue to enhance these, because they are a full system test, including the shell script. But I expect they will always be "wide but shallow".

@rade
Copy link
Member Author

rade commented Feb 17, 2015

Also, note that this issue is not about CI.

#397 is.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants