Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVPN DNS issues with Canal >= v3.0.0 #6068

Closed
jonasrmichel opened this issue Nov 12, 2018 · 7 comments
Closed

OpenVPN DNS issues with Canal >= v3.0.0 #6068

jonasrmichel opened this issue Nov 12, 2018 · 7 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@jonasrmichel
Copy link

1. What kops version are you running? The command kops version, will display
this information.

$ kops version
Version 1.11.0-alpha.1 (git-a95f3b9cb)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

# create a cluster
$ kops create cluster \
  --name mycluster.mydomain.com \
  -- zones us-east-1a \
  --master-zones us-east-1a \
  --networking canal

# install tiller
$ helm init

# add the "stable" helm chart repository
$ helm repo add stable https://kubernetes-charts.storage.googleapis.com

# install the OpenVPN helm chart with its default config (version 3.10.0)
$ helm install stable/openvpn

# follow the instructions on the OpenVPN chart's Usage docs to create a new client cert
# https://github.com/helm/charts/tree/master/stable/openvpn#usage

# connect to the VPN using the client cert

5. What happened after the commands executed?

After successfully connecting to the VPN:

  • no DNS (nslookups of public domains time out)
  • can ping the OpenVPN pod by IP, but no other services/pods within the cluster or public IPs

6. What did you expect to happen?

Full network access to public domains and my cluster's pods and services.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-11-12T16:21:14Z
  name: mycluster.mydomain.com
spec:
  additionalPolicies:
    master: '[ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": ["*"] }, {
      "Effect": "Allow", "Action": ["route53:ChangeResourceRecordSets"], "Resource":
      ["arn:aws:route53:::hostedzone/*"] }, { "Effect": "Allow", "Action": [ "route53:ListHostedZones",
      "route53:ListResourceRecordSets" ], "Resource": ["*"] } ]'
    node: '[ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": ["*"] }, {
      "Effect": "Allow", "Action": ["route53:ChangeResourceRecordSets"], "Resource":
      ["arn:aws:route53:::hostedzone/*"] }, { "Effect": "Allow", "Action": [ "route53:ListHostedZones",
      "route53:ListResourceRecordSets" ], "Resource": ["*"] } ]'
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kubernetes-mydomain.com-state-store/mycluster.mydomain.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationMode: RBAC
    enableAdmissionPlugins:
    - Initializers
    - NamespaceLifecycle
    - LimitRanger
    - ServiceAccount
    - PersistentVolumeLabel
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - NodeRestriction
    - Priority
    - ResourceQuota
    - PodPreset
    - MutatingAdmissionWebhook
    - ValidatingAdmissionWebhook
    runtimeConfig:
      admissionregistration.k8s.io/v1alpha1: "true"
      autoscaling/v2beta1: "true"
      rbac.authorization.k8s.io/v1alpha1: "true"
      settings.k8s.io/v1alpha1: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.11.2
  masterInternalName: api.internal.mycluster.mydomain.com
  masterPublicName: api.mycluster.mydomain.com
  networkCIDR: 172.20.0.0/16
  networking:
    canal: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-11-12T16:21:15Z
  labels:
    kops.k8s.io/cluster: mycluster.mydomain.com
  name: master-us-east-1a
spec:
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: c5.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-11-12T16:21:15Z
  labels:
    kops.k8s.io/cluster: mycluster.mydomain.com
  name: nodes
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: "yes"
    kubernetes.io/cluster/mycluster.mydomain.com: "true"
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: c5.xlarge
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-1a

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

These symptoms seem to be related to the Canal networking plugin. Specifically, changes to the quay.io/calico/node and quay.io/calico/cni images.

There are no VPN issues using Canal <= 2.6.12. After experiencing these symptoms with Canal 3.2.3 (the version installed by kops 1.11.0-alpha.1), with some experimentation I discovered they persist all the way down to Canal 3.0.0.

@jonasrmichel
Copy link
Author

@gambol99 -- Tagging you here as this issue seems to be related to changes introduced by #5927. Any insight you could offer would be much appreciated.

@peterbosalliandercom
Copy link

It seems that ip_forwarding has been disabled in the container. I have made openvpn a privilleged container and set it to 1 temporarily with echo 1 > /proc/sys/net/ipv4/ip_forward and then it works again. Still needs some testing and also I don't want a privilleged container. Somebody can help with that?

@peterbosalliandercom
Copy link

Addition to previous post, could this be a structural solution? https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#setting-sysctls-for-a-pod

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2019
@gambol99
Copy link
Contributor

@johanhubens

No doubt the issue has been resolved now, but yes, this hit us too ... taking our vpn for a while ... I believe the issue was down to projectcalico/cni-plugin@b4b3746#diff-c6517b83d7f7154fe1226d90607e1696 .. Just placing the container as privileged in order to make /prod rw and enabling the ip forwarding fixed the issue .. Not ideal but the workaround ..

@gambol99
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@gambol99: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants