-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functional-test: add advance network failure cases #6918
functional-test: add advance network failure cases #6918
Conversation
add more network failures such as packet corruption, reordering, loss, and network partition. resolve etcd-io#5614
676df8e
to
69b7117
Compare
} | ||
|
||
// SetPacketReordering reorders packets. rp% of packets (with a correlation of cp%) gets send immediately. The rest will be delayed for ms millisecond | ||
func SetPacketReordering(rp int, cp int, ms int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only tests the tcp stack; etcd will still see everything in order, so why have it?
} | ||
|
||
// SetPackLoss randomly drop packet at p% probability | ||
func SetPackLoss(p int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this any different from injecting random latencies? the tcp stack will retransmit
@@ -19,10 +19,5 @@ tester: | |||
- /etcd-tester | |||
- -agent-endpoints | |||
- "172.20.0.2:9027,172.20.0.3:9027,172.20.0.4:9027" | |||
- -limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the functional-tester run on docker image mirrors the one we run using goreman?
@@ -1,6 +1,6 @@ | |||
FROM alpine | |||
RUN apk update | |||
RUN apk add -v iptables sudo | |||
RUN apk --update add iptables bash iproute2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add bash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably don't need.
slowNetworkLatency = 500 // 500 millisecond | ||
randomVariation = 50 | ||
snapshotCount = 10000 | ||
slowNetworkLatency = 500 // 500 millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably use 500 * time.Millisecond
(same for others) instead of having to comment about the units
@@ -41,6 +41,82 @@ func RecoverPort(port int) error { | |||
return err | |||
} | |||
|
|||
// SetPacketCorruption corrupts packets at p% | |||
func SetPacketCorruption(p int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of this will be corrected by tcp checksums, for the packets that aren't, I don't see how etcd would be able to pass its checks (e.g., suppose a lease key is corrupted and when the lease checker looks for the intended key, it's gone)
Some of these faults make sense but not by manipulating data frames with
Since tcp corrects most of the faults, these work better at the tcp level:
There's already a small proxy that does the above, but it's not wired to the functional-tester: |
See #5614 (comment). I agree with @heyitsanthony. The more interesting test is reordering between multiple connections. You can do this at pkg level, but most of time you will reorder pkgs within one tcp connection. @heyitsanthony aggressive pkg lost, corruption, recording might create some interesting corner cases randomly, but i am not convinced we should prioritize this now. |
44ca396
to
4301f49
Compare
add more network failures such as packet corruption, reordering, loss, and network partition.
resolve #5614