Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Windows Nodes #69

Closed
ofiliz opened this issue Dec 16, 2018 · 59 comments
Closed

EKS Windows Nodes #69

ofiliz opened this issue Dec 16, 2018 · 59 comments
Labels
EKS Amazon Elastic Kubernetes Service Windows Windows containers

Comments

@ofiliz
Copy link

ofiliz commented Dec 16, 2018

EKS Windows worker nodes to run Windows containers.

Update – 10/8/2019
Amazon EKS now fully supports Windows containers and Windows worker nodes. #69 (comment)

Get started by looking at the EKS documentation: https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html

@ofiliz ofiliz added the EKS Amazon Elastic Kubernetes Service label Dec 16, 2018
@vicpada
Copy link

vicpada commented Dec 17, 2018

Hi, is there any ETA on this issue?

@ofiliz
Copy link
Author

ofiliz commented Dec 17, 2018

We are targeting a public beta in early 2019. Windows Server Containers is a beta feature in Kubernetes and we intend to support Windows following the same guidelines.

Please +1 this issue and tell us what you'd like to see (your preferred Windows Server version, Kubernetes version, features...) so we can plan accordingly! :)

@tonysneed
Copy link

Would like to see support for Windows Server 2019, which brings Windows containers much closer to feature parity with Linux containers. See http://stefanscherer.github.io/docker-on-windows-server-2019/.

@vicpada
Copy link

vicpada commented Dec 19, 2018

Would like to see support for Windows Server 2019, which brings Windows containers much closer to feature parity with Linux containers. See http://stefanscherer.github.io/docker-on-windows-server-2019/.

Also AWS announced recently that WS2019 is supported:
https://aws.amazon.com/about-aws/whats-new/2018/11/Windows-Server-1809/

@jsamuel1
Copy link

Windows Server 2019 containers on a "current" (1.12/1.13) version of kubernetes would be great. Looks like 1.14 may have WS2019 as a minimum according to Kubernetes Sig Windows notes:
https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#

@csdhome
Copy link

csdhome commented Jan 2, 2019

Windows Server 2019 containers on a "current" (1.12/1.13) version of kubernetes would be great. Looks like 1.14 may have WS2019 as a minimum according to Kubernetes Sig Windows notes:
https://docs.google.com/document/d/1Tjxzjjuy4SQsFSUVXZbvqVb64hjNAG5CQX8bK7Yda9w/edit#

I agree, we are looking to use Windows containers largely for a CI/CD workload on Kubernetes and would love to see this in place for EKS rather than having to manage our own K8s cluster.

@stsukrov
Copy link

stsukrov commented Feb 8, 2019

Agree.
Need Windows containers for CI/CD.

@trimbleAdam
Copy link

This is strongly desired.

@netlancer2012
Copy link

windows server 2019 with kubernetes 1.13.+

@stsukrov
Copy link

stsukrov commented Feb 18, 2019

windows server 2019 with kubernetes 1.13.+

Do you mean, you got it working?
Or is it just a request?

@netlancer2012
Copy link

netlancer2012 commented Feb 19, 2019

It's a request.

@coreyjohnston
Copy link

We'd love to see this feature.

@msuiche
Copy link

msuiche commented Mar 9, 2019

Yes. AFAIK, Azure Container Services does not support hybrid containers either. It would be very interesting to see AWS supporting this before Azure actually.

@tabern tabern added the Developer Preview This issue has an open developer preview label Mar 27, 2019
@tabern tabern changed the title EKS Windows Nodes EKS Windows Nodes (preview) Mar 27, 2019
@tabern
Copy link
Contributor

tabern commented Mar 27, 2019

Hi all,
Amazon EKS now supports Windows containers and Windows worker nodes as a public preview.

Learn more and get started here: https://github.com/aws/containers-roadmap/tree/master/preview-programs/eks-windows-preview

Please leave feedback and comments on the preview using this ticket.

@mike-mosher
Copy link

Found an issue trying to launch the 'amazon-eks-cfn-quickstart-windows.yaml' template.

There are three nested stacks in this template: 'EKSVPCStack', 'EKSLinuxWorkerStack', 'EKSWindowsWorkerStack'. The first two of these are providing a TemplateURL property pointing to an S3 URL, but the third nested stack (EKSWindowsWorkerStack) is pointing to a github url. Here is the resource:

  EKSWindowsWorkerStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://raw.githubusercontent.com/aws/containers-roadmap/master/preview-programs/eks-windows-preview/amazon-eks-windows-nodegroup.yaml
      ...

The documentation states that these URLs can only be S3 URLs. This causes the stack to fail and roll back with the following error:

	CREATE_FAILED	AWS::CloudFormation::Stack	EKSWindowsWorkerStack	TemplateURL must be an Amazon S3 URL.

I have verified that the stack can create successfully if that template is put in an S3 bucket and the TemplateURL property is replaced with this S3 URL.

Here is the quick create link I used to launch the stack so that this issue can be reproduced (just need to replace '<keyname>' with a valid keypair name):

https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?filter=active&templateURL=https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fcf-templates-2nak5ih76ymi-us-west-2%2F2019087WcT-test.yml&stackName=test-eks-windows&param_ClusterName=test-eks-windows&param_LinuxNodeImageId=ami-0ed0fe5ff74520950&param_WindowsNodeAutoScalingGroupDesiredCapacity=3&param_WindowsNodeAutoScalingGroupMaxSize=4&param_WindowsNodeAutoScalingGroupMinSize=1&param_WindowsNodeImageId=ami-047f9f0be88cb9b8b&param_WindowsNodeInstanceType=m5a.large&param_KeyName=<keyname>

@tabern
Copy link
Contributor

tabern commented Mar 28, 2019

@mike-mosher good call out - this was not right. I just updated the readme instructions to document how to download the YAML and upload to S3 so this works. We’re in the process of getting this into our service S3 buckets to simplify the setup as well.

@cdenneen
Copy link

Opened #227 as the Windows example couldn’t get to run and few kube-system DS won’t run.

@tabern
Copy link
Contributor

tabern commented Mar 28, 2019

We’ve added the windows-nodegroup and QuickStart YAML files to our production S3 buckets and updated the readme for provisioning the Windows worker nodes. This simplifies the setup process.

@cdenneen
Copy link

Any 1.12 ami’s available for the README?

@tabern
Copy link
Contributor

tabern commented Mar 29, 2019

@cdenneen we're working on making Windows AMIs for v1.12 available

@nigel-decosta-rft
Copy link

nigel-decosta-rft commented Jul 3, 2019

I get frequent nw related errors when deploying to EKS Windows nodes. Typical manifestation is that the PODs cannot access ClusterIP addresses within the cluster. They can access POD IP addresses though. To check I run nslookup on the Windows POD and when faulty this will time out attempting to connect to the core-dns ClusterIP.

The work around is to restart the Windows nodes and the vpc-resource-controller. This may resolve the issue temporarily (a few hours at best). More recently I am finding the resolution lasts only a few minutes.

Is anybody else having this problem?

@cmboughey
Copy link

To expand upon the previous poster, this is my symptom....

Issue I have seen and cannot overcome, any suggestions would be greatly appreciated.

Once a Window machine has been deployed, let us say I have 5 slots for pods, each with their own IP.
The networking appears to be valid, I can consume any ClusterIP which the pod is allowed to utilize, at the moment this is most likely running on Linux. (CoreDNS and Jenkins).

Now after I have reached the deployment limit, I find that I lose the capacity to connect to those ClusterIPs and networking fails for internal communications. I still have external capability.

This can be proven by deleting pods and waiting for them to be recreated, if I try to connect, it fails. No port is available to connect to…. The only solution is to redeploy the EC2 instance again.

Is this a known issue, is there a better work around ?

@nigel-decosta-rft
Copy link

@cmboughey - This does sound similar to the issues I have been facing. What exactly do you mean when you say "Once a Window machine has been deployed, let us say I have 5 slots for pods, each with their own IP"?

I have a script which reboots the EC2 instances which does make it a bit easier. Still a pain.

@cmboughey
Copy link

cmboughey commented Jul 22, 2019

@nigeldecosta - Basically, depending on the size of your compute instance, you have a limit on the IPAddresses which will be used for the instance. Part of the CNI configuration, AWS uses elastic network adapters and assigns them to machine.
As for the script, that's a pain! It may be usable if you're just deploying pods whcih don't need to be recreated often but is not usable for Jenkins.

@nigel-decosta-rft
Copy link

@cmboughey Is this related to the primary + secondary private IPs on the Windows EC2 instances? I am currently using EC2 type m4.16xlarge. Could I expect the limit to be higher on other types? I couldn't see where such limits are listed. If you have a link that would help. Thanks.

@cmboughey
Copy link

cmboughey commented Jul 22, 2019

@nigeldecosta

@realrill
Copy link

realrill commented Aug 22, 2019

Hi all. I've had the same issue as @anjanitsip commented .
...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address

I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too.

It solved does not solved the issue. See update at the bottom.

VPC-* and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from.

vpc-resource-* container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need.

update It does not solve the issue. The windows-server-iis container in crashloopback...

update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error

update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node:
kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation.
upadte #273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline

final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

@dcopestake
Copy link
Contributor

dcopestake commented Sep 4, 2019

Should ENIs be dynamically allocated to Windows nodes like they are with Linux nodes?

The reason I ask is that I've got two nodes in my cluster, one on Linux and one on Windows, however the Windows node only seems to be able to run 5 pods at a time (both instances are t3.medium) whereas the Linux node can handle 17. I can see that the Linux node has 3 ENIs and 18 total private IPs, however the Windows node seems to only have a single ENI and a single private IP.

Update: @vsiddharth kindly responded via email and confirmed that ENIs are in fact not dynamically allocated for the Windows nodes at the moment.

@rparsonsbb
Copy link

Is there an ETA on windows nodes for 1.14?

@dcopestake
Copy link
Contributor

Submitted pull request #453 - which adds versions of the quickstart shell scripts written in PowerShell - just in case anyone wanted to get going with the preview but didn't have access to a bash shell.

@smiron
Copy link

smiron commented Sep 14, 2019

Hi all. I've had the same issue as @anjanitsip commented .
...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address

I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too.

It solved does not solved the issue. See update at the bottom.

VPC-* and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from.

vpc-resource-* container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need.

update It does not solve the issue. The windows-server-iis container in crashloopback...

update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error

update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node:
kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation.
upadte #273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline

final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

How did u solve it in the end? Please share.

@realrill
Copy link

Hi all. I've had the same issue as @anjanitsip commented .
...network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
I've added the required label to the windows iis sample yaml with a random IP form the subnet where the nodes are. Also, I have restarted the Windows instance, the vpc-resource-controller and the aws-node DaemonSet too.
It solved does not solved the issue. See update at the bottom.
VPC-* and aws-node are up, running and healthy. All logs are ok so I don't know where the label or the ip should come from.
vpc-resource-* container log:

I0822 08:02:30.396621       1 ipaddress.go:77] IPAddressProvider initialized instance yyyyyyyyyyyy resource pool {Capacity:5 InUse:map[xxxxxxxxx:node] Warm:[xxxxxxxxx xxxxxxxxx xxxxxxxxx] Pending:0}.
I0822 08:02:30.396826       1 manager.go:190] Node manager advertising resource vpc.amazonaws.com/PrivateIPv4Address quantity 5 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.400701       1 watcher.go:121] Pod watcher cache synced.
I0822 08:02:30.400773       1 manager.go:88] Node manager is starting.
I0822 08:02:30.400787       1 controller.go:155] Controller started.
I0822 08:02:30.400863       1 watcher.go:130] Pod watcher worker 1 started.
I0822 08:02:30.407539       1 manager.go:141] Node manager added node {name:kkkkkkkk.us-east-2.compute.internal instanceID:yyyyyyyyyyyy instanceType:t3.medium os:windows managed:true}.
I0822 08:02:30.407567       1 watcher.go:190] Node watcher completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.407713       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-b894j on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407811       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-linux-785d945579-25287 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407871       1 watcher.go:190] Pod watcher ignoring pod vpc-resource-controller-85c8f9475d-jpgcf on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407885       1 watcher.go:190] Pod watcher ignoring pod aws-node-htjkv on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.407942       1 watcher.go:190] Pod watcher ignoring pod kube-proxy-n9jkc on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408001       1 watcher.go:190] Pod watcher ignoring pod vpc-admission-webhook-deployment-67bd7fb7d5-54c9k on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408020       1 watcher.go:190] Pod watcher ignoring pod coredns-54989b8657-jgjtt on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:30.408085       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:30.408096       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-z2q7h.
I0822 08:02:30.408146       1 watcher.go:190] Pod watcher ignoring pod spotinst-kubernetes-cluster-controller-windows-75d57fd74c-2jqw2 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:02:40.401067       1 reconciler.go:30] Node manager reconciler started.
I0822 08:02:40.401118       1 reconciler.go:102] Reconciler worker 1 starting processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401145       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/ENI warmpool size 0 desired 0 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401153       1 reconciler.go:123] Reconciler checking resource vpc.amazonaws.com/PrivateIPv4Address warmpool size 3 desired 3 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:02:40.401159       1 reconciler.go:106] Reconciler worker 1 completed processing node kkkkkkkk.us-east-2.compute.internal.
I0822 08:05:40.094352       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-z2q7h on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899838       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:23:55.899868       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-fn9n6.
I0822 08:24:40.098314       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-fn9n6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:32:01.610958       1 watcher.go:247] Pod watcher processing deleted pod aws-node-htjkv on node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:32:31.641403       1 watcher.go:190] Pod watcher ignoring pod aws-node-zlnt7 on unmanaged node aaaaaaaaaaaaaaaaa.us-east-2.compute.internal.
I0822 08:51:55.080317       1 watcher.go:194] Pod watcher processing pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:51:55.080341       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-7fb74d9fc-8xj7m.
I0822 08:53:10.102789       1 watcher.go:247] Pod watcher processing deleted pod windows-server-iis-7fb74d9fc-8xj7m on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807919       1 watcher.go:194] Pod watcher processing pod windows-server-iis-b4b96d88c-smn64 on node kkkkkkkk.us-east-2.compute.internal.
I0822 08:58:34.807955       1 watcher.go:236] Pod watcher completed processing pod windows-server-iis-b4b96d88c-smn64.
I0822 09:14:37.721407       1 watcher.go:194] Pod watcher processing pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.
I0822 09:14:37.721430       1 watcher.go:236] Pod watcher completed processing pod bash-77ccdf87d9-khxf6.
I0822 09:16:20.106958       1 watcher.go:247] Pod watcher processing deleted pod bash-77ccdf87d9-khxf6 on node kkkkkkkk.us-east-2.compute.internal.

aws-node container log

===== Starting installing AWS-CNI =========
===== Starting amazon-k8s-agent ===========

vpc-admission* container log

I0821 15:32:00.477762       1 main.go:64] Initializing vpc-admission-webhook version beta.
I0821 15:32:00.478603       1 main.go:76] Webhook Server started.

I am more than happy to provide more logs for investigation just give me what you need.
update It does not solve the issue. The windows-server-iis container in crashloopback...
update the container goes to Error state then restart itself. While it's in running mode I am able to exec into but can't see the reason of the Error
update Looks like the manual attached label for IPv4Address does not affect the container networking. I try to investigate as much as I can but slowly run out of ideas. See error below.

kubectl logs windows-server-iis-b4b96d88c-lnc6n

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    No             Success        {Common HTTP Features, Default Documen...
Invoke-WebRequest : The remote name could not be resolved:
'dotnetbinaries.blob.core.windows.net'
At line:1 char:32
+ ... Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbi ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:Htt
   pWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShe
   ll.Commands.InvokeWebRequestCommand

C:\ServiceMonitor.exe : The term 'C:\ServiceMonitor.exe' is not recognized as
the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:311
+ ... ml>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc' ...
+                                             ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\ServiceMonitor.exe:String) [
   ], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Few logs from the Windows node:
kubelet

E0822 12:52:51.499796    3428 remote_runtime.go:115] StopPodSandbox "98eed946e8662ac4a6fb5f76a24ce1a517e13dcdeefd4fc92ee3e6330abbf7fe" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "windows-server-iis-7fb74d9fc-fn9n6_default" network: failed to parse Kubernetes args: failed to get pod windows-server-iis-7fb74d9fc-fn9n6: pods "windows-server-iis-7fb74d9fc-fn9n6" not found

update Flanneld service is missing from the node. The question now, why and what step missed the installation.
upadte #273 It looks to me, it could be the source of my issue. Someone senior please confirm/decline
final update Solved. It was mostly user error. I mean, I have provisioned the instances within an environment with strict network policies and few port has been blocked.

How did u solve it in the end? Please share.

IIRC 443 port has been blocked that caused malfunction on the Windows woker/pod side.

@dcopestake
Copy link
Contributor

Is the EKS Windows preview still actually running/being developed? There doesn't seem to be a huge amount of activity here (other than people raising issues) and today I got an email from AWS saying that 1.11 is going to be deprecated in early November, making it impossible (presumably) to actually run a 1.11 cluster with Windows nodegroups, so not sure what the plan is?

@nigel-decosta-rft
Copy link

I thought the public release of Windows EKS was imminent. Not pleased that 1.11 will be deprecated before we get a supported version of Windows EKS.

@emohammad
Copy link

How can i get istio installed? I am getting failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address

@mikestef9
Copy link
Contributor

GA availability for EKS Windows will be released before 1.11 is no longer supported. This issue will be updated as soon as we make the GA announcement.

@bcmedeiros
Copy link

@mikestef9 thanks for that! Where can we find the most up-to-date windows node AMIs? should we still use the 1.11 ones?

@mikestef9 mikestef9 changed the title EKS Windows Nodes (preview) EKS Windows Nodes Oct 8, 2019
@mikestef9
Copy link
Contributor

Amazon EKS now fully supports adding Windows nodes as worker nodes and scheduling Windows containers.

Windows workloads are supported with Amazon EKS clusters running Kubernetes version 1.14 or later.

Please note that as of today, at least one Linux worker node is required in the cluster to support Windows node and container networking. We recommend two for high availability.

Learn how to configure a cluster for Windows support and add Windows workers nodes in the EKS documentation.

@mikestef9 mikestef9 added Windows Windows containers and removed Developer Preview This issue has an open developer preview labels Oct 21, 2019
@mikestef9
Copy link
Contributor

Interested to hear any feedback so far on EKS Windows support. Feel free to leave comments on this issue. Thanks!

@bcmedeiros
Copy link

@mikestef9 my path trying to use Windows worker nodes is being pretty rocky so far...

Firstly I bumped into a issue where I could not have IPs assigned to my pods: eksctl-io/eksctl#1512

After manually creating another routing table and finally overcoming this first issue, I kinda had Windows nodes working, but then I started having Linux pods being scheduled to windows nodes (yeah, this is documented, but it's very hard to set node selectors to all my deployments, including "internal" ones such as ingress and other things), so I decided to put a taint and bumped into another (it seems) eksctl bug: eksctl-io/eksctl#1590

Manually tainting the node (and also tolerate the taints in windows pods) seems to be working, but I cannot roll out this to production until everything is fully automated, so I guess I have to wait a little bit until someone at eksctl look into my reported issues.

@rparsonsbb
Copy link

@mikestef9 we're using EKS mixed OS clusters in production right now.

Aside from some hiccups with the vpc-resource-controller evicting itself after writing 500GB of logs (might not be windows related but I am working on pulling logs/opening aws support case) my BIGGEST gripe hands down is that none of the tooling that AWS provides surrounding EKS supports mixed OS clusters right now.

New node termination handler: aws/aws-node-termination-handler#8

Container Insights: #503

This is certainly more widespread than just AWS and I understand that some of this can be provided b the OSS community but I'd really like to see more support surrounding mixed OS EKS, not just spitting out AMIs that interact with the EKS masters.

@mikestef9
Copy link
Contributor

@brunojcm I escalated the Windows node group taints to eksctl team and they have created a Pull Request that will land in the next eksctl release.

@rparsonsbb thanks for opening those issues. We will continue to work on improving tooling in the Kubernetes Windows ecosystem.

@bcmedeiros
Copy link

@mikestef9 thanks for that! I got the notification there, I was wondering if you had helped it to happen anyhow :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Amazon Elastic Kubernetes Service Windows Windows containers
Projects
None yet
Development

No branches or pull requests