Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch pod to world testing from cilium.io to one.one.one.one #511

Merged
merged 1 commit into from
Sep 3, 2021

Conversation

nbusseneau
Copy link
Member

Tentative fix for #367.

We have two main hypothesis around why connections to external domains are flaky:

  • The external domain itself is unreliable.
  • The external domain works fine, but CoreDNS is unreliable.

cilium.io is hosted on single non-HA EC2 instance, which is definitely the most robus thing out there. We propose switching to
one.one.one.one as a quick fix to check if that helps with the reliability.

If yes, we will evaluate moving to a HA system that we control ourselves (e.g. a DNS zone hosted at a major provider).

If not, we will investigate the CoreDNS hypothesis.

@nbusseneau nbusseneau added the area/CI Continuous Integration testing issue or flake label Sep 1, 2021
@nbusseneau nbusseneau requested a review from a team as a code owner September 1, 2021 18:03
@nbusseneau nbusseneau temporarily deployed to ci September 1, 2021 18:03 Inactive
@nbusseneau nbusseneau force-pushed the pr/switch-to-one-one-one-one branch from 558b09b to a705db7 Compare September 1, 2021 18:15
@nbusseneau nbusseneau temporarily deployed to ci September 1, 2021 18:15 Inactive
@ldelossa
Copy link
Contributor

ldelossa commented Sep 1, 2021

Will "client-egress-to-fqdns-cilium-io.yaml" need to be updated?

client-egress-to-fqdns-cilium-io.yaml
1:apiVersion: cilium.io/v2
5:  name: client-egress-to-fqdns-cilium-io
20:    - matchName: "cilium.io"

@nbusseneau
Copy link
Member Author

Will "client-egress-to-fqdns-cilium-io.yaml" need to be updated?

You are right, I missed that one.

Tentative fix for #367.

We have two main hypothesis around why connections to external domains
are flaky:

- The external domain itself is unreliable.
- The external domain works fine, but CoreDNS is unreliable.

`cilium.io` is hosted on single non-HA EC2 instance, which is definitely
the most robus thing out there. We propose switching to
`one.one.one.one` as a quick fix to check if that helps with the
reliability.

If yes, we will evaluate moving to a HA system that we control ourselves
(e.g. a DNS zone hosted at a major provider).

If not, we will investigate the CoreDNS hypothesis.

Signed-off-by: Nicolas Busseneau <[email protected]>
@nbusseneau nbusseneau force-pushed the pr/switch-to-one-one-one-one branch from a705db7 to 6d4e16a Compare September 2, 2021 09:56
@nbusseneau nbusseneau temporarily deployed to ci September 2, 2021 09:56 Inactive
Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe this will fix our flakes as you're aware from my recent update to the flake issue proper, but I do think this is a good idea none-the-less.

@tklauser tklauser merged commit 455b031 into master Sep 3, 2021
@tklauser tklauser deleted the pr/switch-to-one-one-one-one branch September 3, 2021 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants