Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-cluster building doesn't work on DigitalOcean (doks) #877

Closed
edvald opened this issue Jun 24, 2019 · 15 comments
Closed

In-cluster building doesn't work on DigitalOcean (doks) #877

edvald opened this issue Jun 24, 2019 · 15 comments
Assignees

Comments

@edvald
Copy link
Collaborator

edvald commented Jun 24, 2019

Bug

Current Behavior

Currently, in-cluster building doesn't work with DigitalOcean doks clusters.

The garden-system all deploy as normal, but the cluster is unable to pull from the in-cluster registry. The proximal cause appears to be that hostPort pods (through DaemonSets) can't be reached (connection refused), even for other services. I've tried all manner of things and am stuck on fixing this.

Expected behavior

For in-cluster building to work, same as on GKE, AKS etc.

Reproducible example

Try configuring a doks cluster environment in the demo-project and deploying the project. It will eventually fail with ImagePullBackOff because the cluster is unable to reach the in-cluster registry.

Workaround

Use the default local-docker build mode when deploying to doks clusters.

Suggested solution(s)

We need to reach out to DO to figure out why hostPort services refuse connections.

Your environment

Latest master.

@clems71
Copy link

clems71 commented Jun 28, 2019

Hey there!

Same thing on my side with a Kops provisioned cluster on AWS. Using default Kops settings. No fancy networking stuff. I'm basically getting the same error.
Everything deployed properly, cluster-init ran fine as well. Only when I want to deploy or dev, it fails with:

Error deploying backend: ImagePullBackOff - Back-off pulling image "127.0.0.1:5000/demo-project/backend:v-5b72e91a7c"

@clems71
Copy link

clems71 commented Jun 28, 2019

By investigating more and looking into the logs of the registry-proxy DaemonSet, I've found that requests have been filtered out. That's mostly due to the range parameter used on socat. Here is a sample log line I've got:

2019/06/28 08:42:51 socat[8] W refusing connection from AF=2 100.96.3.1:40244 due to range option

Initial command in registry-proxy:

socat -d TCP-LISTEN:5000,fork,range=10.0.0.0/8 TCP:garden-docker-registry.garden-system.svc.cluster.local:5000

I updated it for the moment to make it work by removing the range param (I'm sure there are some security implications I'm not aware of, but at least it makes things work):

socat -d TCP-LISTEN:5000,fork TCP:garden-docker-registry.garden-system.svc.cluster.local:5000

HTH,
Cheers

@valerauko
Copy link

I face the same issue. Sadly removing the range option didn't resolve it (or maybe I removed it from the wrong spec)

@eddiezane
Copy link

👋 Eddie from the DigitalOcean DevRel team here.

TLDR: This should be fixed with a new DOKS image shipping this week.

This is due to the currently used version of our CNI (Cilium) not supporting hostPort out of the box. A newer version adds a flag that makes enabling it easy. A new version of DOKS should be shipping this week that enables this.

I've been told you this can be used as a workaround for the time being https://github.com/snormore/cilium-portmap.

@edvald
Copy link
Collaborator Author

edvald commented Jul 1, 2019

@clems71 Ah, we probably need to dynamically work out the correct address range. Key thing was to make sure we're not allowing outside traffic accidentally, as a side-effect of our little trickery to get the in-cluster registry going. I'll dig into this, see how we might best solve this across the board.

@eddiezane Thanks for the quick response! Once it's released, I expect I need to update my existing cluster(s)?

@eddiezane
Copy link

@edvald you should be able to upgrade your cluster to the latest patch version once it lands via the console or the automatic maintenance window. Fix should be baked into all images/minor versions.

@edvald
Copy link
Collaborator Author

edvald commented Jul 1, 2019

@eddiezane That's awesome, thanks again for the fast response!

@clems71 your issue is something we need to fix on our side, we'll figure it out for our next patch release this week.

@eysi09 eysi09 added this to the 0.10.1 milestone Jul 2, 2019
@clems71
Copy link

clems71 commented Jul 3, 2019

Awesome thanks!

@eysi09 eysi09 self-assigned this Jul 3, 2019
@edvald
Copy link
Collaborator Author

edvald commented Jul 7, 2019

@clems71 I believe #930 solves your issue. Just tried it myself on a kops cluster and seems to do the trick. It'll be in v0.10.1, so you can get rid of the workaround then .)

@clems71
Copy link

clems71 commented Jul 7, 2019 via email

@eysi09 eysi09 removed their assignment Jul 8, 2019
@edvald
Copy link
Collaborator Author

edvald commented Jul 12, 2019

@eddiezane I just checked with the latest version (1.14.3-do.0) and still have the same issue. Do you have an issue filed that we could track?

@eddiezane
Copy link

@edvald ack. Just pinged the team again.

@edvald edvald removed this from the 0.10.1 milestone Jul 16, 2019
@timoreimann
Copy link

@edvald we recently created https://github.com/digitalocean/DOKS to allow DOKS users to create issues and generally get in touch with our team. Feel free to file a bug report so that we can keep you posted on any updates. (I know some of my colleagues are already looking into the issue.)

Thanks!

@solomonope
Copy link
Contributor

Hello,

I tested this with a Digital Ocean K8s cluster version 1.16.2-do.0 . I was able to build and deploy.

@edvald
Copy link
Collaborator Author

edvald commented Dec 6, 2019

Cool! Could you take a quick look at #995 as well @solomonope?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants