Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

azure-cni-networkmonitor pods are scheduled on Windows #3404

Closed
adelina-t opened this issue Jul 3, 2018 · 9 comments
Closed

azure-cni-networkmonitor pods are scheduled on Windows #3404

adelina-t opened this issue Jul 3, 2018 · 9 comments
Assignees
Labels

Comments

@adelina-t
Copy link
Contributor

ISSUE:


What version of acs-engine?:
v0.19.0

Kubernetes 1.11.0

azure-cni-networkmonitor pods are scheduled on Windows and thus cannot be spawned and get stuck in Creating state:

Events:
  Type     Reason       Age                From                   Message
  ----     ------       ----               ----                   -------
  Warning  FailedMount  6m (x179 over 6h)  kubelet, 36822k8s9003  Unable to mount volumes for pod "azure-cni-networkmonitor-8gmlf_kube-system(ee9ec03a-7aa9-11e8-a9cc-000d3a069e0e)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"azure-cni-networkmonitor-8gmlf". list of unmounted volumes=[ebtables-rule-repo]. list of unattached volumes=[log ebtables-rule-repo default-token-wbl6x]
  Warning  FailedMount  1m (x210 over 6h)  kubelet, 36822k8s9003  MountVolume.SetUp failed for volume "ebtables-rule-repo" : hostPath type check failed: /var/run/ is not a directory

apimodel file (as minimally and precisely as possible):

{
  "apiVersion": "vlabs",
  "location": "westus2",
  "tags": null,
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.11",
      "kubernetesConfig":{
         "customHyperkubeImage": "atuvenie/hyperkube-amd64:1006092042008268800",
         "customWindowsPackageURL": "http://k8szipstorage.blob.core.windows.net/mystoragecontainer/1006092042008268800.zip"
        }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-ed-1-10-deployment-3",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 4,
        "vmSize": "Standard_D2s_v3",
        "availabilityProfile": "AvailabilitySet",
        "osType": "Windows",
        "preProvisionExtension": {
                    "name": "node_setup",
                    "singleOrAll": "all"
                  }
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
           "keyData": 
          }
        ]
      }
    },
    "windowsProfile": {
          "adminUsername": "azureuser",
          "adminPassword": "Passw0rdAdmin"
    },
    "extensionProfiles": [
          {
            "name": "node_setup",
            "version": "v1",
            "rootURL": "http://taeduard.go.ro/testing/",
            "script": "node_setup.ps1"
          }
        ],
    "servicePrincipalProfile": {

    }
  }
}
@PatrickLang
Copy link
Contributor

I will add a test case to make sure there's no failed pods after the Windows pod is scheduled

PatrickLang pushed a commit to PatrickLang/acs-engine that referenced this issue Jul 3, 2018
PatrickLang pushed a commit to PatrickLang/acs-engine that referenced this issue Jul 3, 2018
@PatrickLang
Copy link
Contributor

Should be fixed by #3407 with test added too

@Cherishty
Copy link

Also experience this issue on acs-engine 0.19.1 + kubernetes 1.10.5 as well as acs-engine 0.19.0 + kubernetes 1.9.8
Addtionally, maybe result from above issue, after I deploy several pods in this cluster and assign to windows node, they all have NO access to internet

@adelina-t
Copy link
Contributor Author

@PatrickLang Tested this fix on our env. No more cni-monitor pods scheduled on Windows anymore. but we do experience the issue @Cherishty has seen. Simple ping from within the pod won't work.

@Cherishty
Copy link

Cherishty commented Jul 4, 2018

@adelina-t @PatrickLang
I use acs-engine v0.18.3 to create a k8s cluster v1.9.8 with WS1803 as worker node,
Obviously this time the azure-cni-networkmonitor will not reproduce, but in pod environment the container still can NOT access to internet. I find strange that we I try to ping bing.com, it can get the true ip, but will time-out finally
I just use
acs-engine deploy --subscription-id <my_subscription_id> --dns-prefix <new_resource_group_name> --location japanwest --api-model kubernetes.json

where the kubernetes.json is quite simple:

{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.9.5"
},
"masterProfile": {
"count": 1,
"dnsPrefix": "",
"vmSize": "Standard_D2_v2"
},
"agentPoolProfiles": [
{
"name": "windowspool2",
"count": 1,
"vmSize": "Standard_D2_v2",
"availabilityProfile": "AvailabilitySet",
"osType": "Windows"
}
],
"windowsProfile": {
"adminUsername": "azureuser",
"adminPassword": "replacepassword1234$"
},
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
},
"servicePrincipalProfile": {
"clientId": "",
"secret": ""
}
}
}

and the pod deployment is also provided by our doc https://github.com/Microsoft/SDN/blob/master/Kubernetes/flannel/l2bridge/manifests/simpleweb.yml

May you give any clue, suggestion or workaround? Maybe I need some extra work to do?

@sharmasushant
Copy link
Contributor

@Cherishty please dont use ping (most of the servers block it). Use wget www.bing.com

@adelina-t
Copy link
Contributor Author

@sharmasushant Tested with wget as well.
wget : Unable to connect to the remote server

Used both DNS name and IP, none work.

@PatrickLang
Copy link
Contributor

@adelina-t @Cherishty can you open a new issue for the failed egress traffic? I think we should close this one as the stuck pod is fixed.

@PatrickLang
Copy link
Contributor

The network monitor pods aren't scheduled in recent versions, closing. If you have other network issues - please file new issues. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants