diff --git a/src/content/advanced/connectivity/ingress/configuration/index.md b/src/content/advanced/connectivity/ingress/configuration/index.md index 6ddb127f04..701bd22ec9 100644 --- a/src/content/advanced/connectivity/ingress/configuration/index.md +++ b/src/content/advanced/connectivity/ingress/configuration/index.md @@ -26,7 +26,7 @@ user_questions: - How can I use Ingress NGINX Controller as a Web Application Firewall? - How can I protect my workload from malicious requests? - How can I enable & configure ModSecurity inside of the Ingress NGINX Controller? -last_review_date: 2023-11-07 +last_review_date: 2023-11-23 aliases: - /guides/advanced-ingress-configuration/ - /advanced/ingress/configuration/ @@ -267,11 +267,11 @@ This functionality is based on the [auth_request](https://nginx.org/en/docs/http ### CORS -To enable Cross-Origin Resource Sharing (CORS) in an Ingress rule add the annotation `ingress.kubernetes.io/enable-cors: "true"`. +To enable Cross-Origin Resource Sharing (CORS) in an Ingress rule add the annotation `nginx.ingress.kubernetes.io/enable-cors: "true"`. ### Rewrite -In some scenarios the exposed URL in the backend service differs from the specified path in the Ingress rule. Without a rewrite any request will return 404. To circumvent this you can set the annotation `ingress.kubernetes.io/rewrite-target` to the path expected by the service. +In some scenarios the exposed URL in the backend service differs from the specified path in the Ingress rule. Without a rewrite any request will return 404. To circumvent this you can set the annotation `nginx.ingress.kubernetes.io/rewrite-target` to the path expected by the service. This can for example be used together with path based routing, when the application expects to be on `/`: @@ -297,11 +297,9 @@ spec: number: SERVICE_PORT ``` -If the application contains relative links it is possible to add an additional annotation `ingress.kubernetes.io/add-base-url` that will prepend a [`base` tag](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base) in the header of the returned HTML from the backend. - ### Rate limiting -The annotations `ingress.kubernetes.io/limit-connections` and `ingress.kubernetes.io/limit-rps` define a limit on the connections that can be opened by a single client IP address. This can be used to mitigate [DDoS Attacks](https://www.nginx.com/blog/mitigating-ddos-attacks-with-nginx-and-nginx-plus). +The annotations `nginx.ingress.kubernetes.io/limit-connections` and `nginx.ingress.kubernetes.io/limit-rps` define a limit on the connections that can be opened by a single client IP address. This can be used to mitigate [DDoS Attacks](https://www.nginx.com/blog/mitigating-ddos-attacks-with-nginx-and-nginx-plus). `nginx.ingress.kubernetes.io/limit-connections`: Number of concurrent connections allowed from a single IP address. @@ -351,10 +349,6 @@ The annotation `nginx.ingress.kubernetes.io/affinity` enables and sets the affin If you use the `cookie` type you can also specify the name of the cookie that will be used to route the requests with the annotation `nginx.ingress.kubernetes.io/session-cookie-name`. The default is to create a cookie named `route`. -The annotation `nginx.ingress.kubernetes.io/session-cookie-hash` defines which algorithm will be used to hash the used upstream. Default value is `md5` and possible values are `md5`, `sha1` and `index`. - -The `index` option is not hashed, an in-memory index is used instead, it's quicker and the overhead is shorter. Warning: The matching against the upstream servers list is inconsistent. So, at reload, if upstreams servers have changed, index values are not guaranted to correspond to the same server as before! Use with caution and only if you need to! - This feature is implemented by the third party module [nginx-sticky-module-ng](https://bitbucket.org/nginx-goodies/nginx-sticky-module-ng). The workflow used to define which upstream server will be used is explained in the [module documentation (PDF)](https://bitbucket.org/nginx-goodies/nginx-sticky-module-ng/raw/08a395c66e425540982c00482f55034e1fee67b6/docs/sticky.pdf). ### Configuration snippets diff --git a/src/content/advanced/connectivity/ingress/multi-nginx-ic/index.md b/src/content/advanced/connectivity/ingress/multi-nginx-ic/index.md index c5c0e5ace8..0eb2d39517 100644 --- a/src/content/advanced/connectivity/ingress/multi-nginx-ic/index.md +++ b/src/content/advanced/connectivity/ingress/multi-nginx-ic/index.md @@ -12,7 +12,7 @@ user_questions: - How do I configure Ingress NGINX Controller for internal traffic? - How do I override the NodePorts on KVM Ingresses? - How do I configure Ingress NGINX Controller to allow weak ciphers? -last_review_date: 2023-11-07 +last_review_date: 2023-11-23 aliases: - /guides/multi-nginx/ - /advanced/multi-nginx/ @@ -23,7 +23,7 @@ owner: Ingress NGINX Controller handles [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) resources, routing traffic from outside the Kubernetes cluster to services within the cluster. Starting with [Ingress NGINX Controller v1.8.0](/changes/managed-apps/nginx-ingress-controller-app/v1.8.0/), one can install multiple Ingress NGINX Controllers in a Kubernetes cluster. The optional Ingress NGINX Controller can be [installed as an App on your cluster]({{< relref "/content/getting-started/ingress-controller/index.md" >}}). -[Ingress NGINX Controller v2.2.0](/changes/managed-apps/nginx-ingress-controller-app/v2.2.0/) will start installing a IngressClass with default name `nginx` and controller value `k8s.io/ingress-nginx`. +[Ingress NGINX Controller v2.2.0](/changes/managed-apps/nginx-ingress-controller-app/v2.2.0/) will start installing an IngressClass with default name `nginx` and controller value `k8s.io/ingress-nginx`. Some use cases for this might be: @@ -33,18 +33,18 @@ Some use cases for this might be: Most Ingress NGINX Controller configuration options have controller-wide defaults. They can also be overriden on a per-Ingress resource level. -In each case below, one installs a second Ingress NGINX Controller with a different global-only configuration and separate IngressClass. Ingress resources managed by this Ingress NGINX Controller installation cannot be customized on a per-Ingress resource level. +In each case below, one installs a second Ingress NGINX Controller with a different global-only configuration and a separate IngressClass. Ingress resources managed by this Ingress NGINX Controller installation can still be customized on a per-Ingress resource level. Further information on configuring Ingress NGINX Controller can be found on the [Advanced ingress configuration]({{< relref "/advanced/connectivity/ingress/configuration/index.md" >}}) page. ## Quick installation instructions for a second Ingress NGINX Controller -1. Install a second Ingress NGINX Controller app (and subsequent apps) with a different global-only configuration. Ingress resources managed by this Ingress NGINX Controller installation cannot be customized on a per-Ingress resource level. -2. Change the ingressClassName to the appropriate IngressClass name. Make sure the IngressClass name and controller value of each Ingress Controller do not collide with each other. +1. Install a second Ingress NGINX Controller App (and subsequent apps) with a different global-only configuration. +2. Change the `ingressClassName` to the appropriate IngressClass name. Make sure the IngressClass name and controller value of each Ingress Controller do not collide with each other. ## Set the ingressClassName of each Ingress -__Note__ that if you are running multiple Ingress Controllers you need to use the appropriate ingressClassName in your Ingress resources, e.g. +__Note__ that if you are running multiple Ingress Controllers you need to use the appropriate `ingressClassName` in your Ingress resources, e.g. ```yaml ... @@ -62,9 +62,9 @@ spec: ... ``` -Not specifying the ingressClassName will lead to no Ingress Controller claiming your Ingress. Specifying a value which does not match the class of any existing Ingress Controllers will result in all Ingress Controllers ignoring the ingress. +Not specifying the `ingressClassName` will lead to no Ingress Controller claiming your Ingress. Specifying a value which does not match the class of any existing Ingress Controller will result in all Ingress Controllers ignoring the Ingress. -Additionally, please ensure the Ingress Class of each of your Ingress Controllers do not collide with each other and with the [preinstalled Ingress Controllers in legacy clusters]({{< relref "/platform-overview/cluster-management/releases/index.md#apps" >}}). For the community supported Ingress NGINX Controller this is described in the [official documentation](https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/). +Additionally, please ensure the IngressClass of each of your Ingress Controllers does not collide with each other and with the [preinstalled Ingress Controllers in legacy clusters]({{< relref "/platform-overview/cluster-management/releases/index.md#apps" >}}). For the community supported Ingress NGINX Controller this is described in the [official documentation](https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/). ## Separating public from internal ingress traffic @@ -75,45 +75,32 @@ This is how one can achieve it by using multiple Ingress NGINX Controllers: - Deploy one Ingress NGINX Controller App using default settings, for making selected cluster services accessible via public internet - Deploy second internal Ingress NGINX Controller App. Use these user configuration overrides: - - on AWS and Azure - - ```yaml - controller: - ingressClass: nginx-internal - ingressClassResource: - name: nginx-internal - controllerValue: k8s.io/ingress-nginx-internal - service: - public: false - subdomain: "*.ingress-internal" - ``` - - - on KVM - - ```yaml - controller: - ingressClassResource: - name: nginx-internal - controllerValue: k8s.io/ingress-nginx-internal - service: - public: false - subdomain: ingress-internal - nodePorts: - http: 31010 - https: 31011 - ``` +```yaml +controller: + ingressClassResource: + name: nginx-internal + controllerValue: k8s.io/ingress-nginx-internal + service: + public: false + subdomain: ingress-internal + # Required for KVM only. + # Do not set on AWS, Azure & others. + nodePorts: + http: 30012 + https: 30013 +``` -Each Ingress NGINX Controller App installation has to have an unique IngressClass. Ingress resources can then declare which Ingress Controller should be handling their route definition by referencing the respective IngressClass in the `ingressClassName` spec field. Default Ingress NGINX Controller IngressClass is `nginx`. In the above example we configure `nginx-internal` as the second Ingress NGINX Controller installation's IngressClass. +Each Ingress NGINX Controller App installation has to have an unique IngressClass. Ingress resources can then declare which Ingress Controller should be handling their route definition by referencing the respective IngressClass in the `ingressClassName` field. The default Ingress NGINX Controller IngressClass is `nginx`. In the above example we configure `nginx-internal` as the second Ingress NGINX Controller installation's IngressClass. -On AWS and Azure, Ingress NGINX Controller `LoadBalancer` service is fronted by the cloud provider's managed load balancer service. By default, Ingress NGINX Controller will have a public load balancer. Changing `controller.service.public` flag to `false` declares that internal load balancer should be created instead. +On AWS and Azure, Ingress NGINX Controller's `LoadBalancer` Service is fronted by the cloud provider's managed load balancer service. By default, Ingress NGINX Controller will have a public load balancer. Changing `controller.service.public` flag to `false` declares that an internal load balancer should be created instead. -Similarly, cloud load balancer created for each Ingress NGINX Controller installation on AWS and Azure has to have unique host name associated with it. Host name suffix is common for all, and equals to the workload cluster's base domain name. Prefix is configurable via `controller.service.subdomain` configuration property and defaults to `ingress`. In example configuration for second internal Ingress NGINX Controller, it is overriden to `ingress-internal`. +Similarly, the cloud load balancer created for each Ingress NGINX Controller installation on AWS and Azure has to have an unique hostname associated with it. The hostname suffix is common for all and equals to the workload cluster's base domain name. The prefix is configurable via the `controller.service.subdomain` configuration property and defaults to `ingress`. In the example configuration for the second internal Ingress NGINX Controller, it is overriden to `ingress-internal`. -For Ingress NGINX Controller running on on-prem (KVM) workload clusters there's no out-of-the-box `LoadBalancer` service type support. Therefore, Ingress NGINX Controller Service type defaults to `NodePort`. For every Ingress NGINX Controller installation, one must assign a set of unique http and https node ports. The default Ingress NGINX Controller http and https node ports are `30010` and `30011`. The example sets `31010` and `31011` as overrides for the internal Ingress NGINX Controller. +For Ingress NGINX Controllers running on on-prem (KVM) workload clusters there's no out-of-the-box `LoadBalancer` service type support. Therefore, Ingress NGINX Controller Service type defaults to `NodePort`. For every Ingress NGINX Controller installation, one must assign a set of unique HTTP and HTTPS node ports. The default Ingress NGINX Controller's HTTP and HTTPS node ports are `30010` and `30011`. The example sets `30012` and `30013` as overrides for the internal Ingress NGINX Controller. -More information on this topic can be found in document [Services of type LoadBalancer]({{< relref "/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md" >}}). +More information on this topic can be found in the document [Services of type LoadBalancer]({{< relref "/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md" >}}). -It is also possible to only install a single Ingress NGINX Controller and to delegate both internal and external traffic to it. Here is a minimal working example on how to achieve this goal. +It is also possible to only install a single Ingress NGINX Controller and to delegate both external and internal traffic to it. Here is a minimal working example on how to achieve this goal. ```yaml controller: @@ -125,33 +112,33 @@ controller: subdomain: ingress-internal # default value ``` -In other words, it is sufficient to set `controller.service.internal.enabled` to `true` to create two services: one for public traffic and one for private one. On cloud providers, the Services we create will be of type `LoadBalancer`; on premise, depending on the platform, they might be either of type `LoadBalancer` or `NodePort`. +In other words, it is sufficient to set `controller.service.internal.enabled` to `true` to create two services: one for public traffic and one for internal one. On cloud providers, the Services we create will be of type `LoadBalancer`; on premise, depending on the platform, they might be either of type `LoadBalancer` or `NodePort`. ## Using weak ciphers for legacy clients -In [Ingress NGINX Controller v1.2.0](https://github.com/giantswarm/ingress-nginx-app/blob/main/CHANGELOG.md#120-2020-01-21), there was a notable security improvement: weak SSL ciphers were removed from the default configuration. Some older clients (like web browsers, http libraries in apps) could no longer establish secure connections with cluster services exposed via new Ingress NGINX Controller. This is because these clients only supported SSL ciphers that got removed. +In [Ingress NGINX Controller v1.2.0](https://github.com/giantswarm/ingress-nginx-app/blob/main/CHANGELOG.md#120-2020-01-21), there was a notable security improvement: weak SSL ciphers were removed from the default configuration. Some older clients (like web browsers or HTTP libraries in apps) could no longer establish secure connections with cluster services exposed via the new Ingress NGINX Controller. This is because these clients only supported SSL ciphers that got removed. -With single Ingress NGINX Controller, one could restore weak SSL ciphers configuration in order to support services with older clients until clients get upgraded. Problem with this approach, since SSL ciphers are global settings, was that changing default SSL ciphers back by restoring weak ciphers would apply to all Ingresses and service behind them, not just the one with old clients. +With a single Ingress NGINX Controller, one could restore weak SSL ciphers configuration in order to support services with older clients until clients get upgraded. Problem with this approach, since SSL ciphers are global settings, was that changing default SSL ciphers back by restoring weak ciphers would apply to all Ingresses and services behind them, not just the one with old clients. With multiple Ingress NGINX Controllers, one can have separate Ingress NGINX Controller installations with different SSL ciphers configuration. This allows one to limit restoring weak ciphers only to services used by legacy clients. Here is how this can be achieved: - Deploy one Ingress NGINX Controller using default settings -- Deploy second internal Ingress NGINX Controller. Use these user configuration overrides: +- Deploy second Ingress NGINX Controller. Use these user configuration overrides: ```yaml controller: + config: + ssl-ciphers: ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256 ingressClassResource: name: nginx-weak controllerValue: k8s.io/ingress-nginx-weak service: subdomain: ingress-weak - configmap: - ssl-ciphers: ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256 ``` -For the second Ingress NGINX Controller installation, ingress class and host name subdomain are customized for uniqueness. Additionally, default SSL ciphers are overriden to include weak ciphers for legacy clients. +For the second Ingress NGINX Controller installation, IngressClass name and hostname subdomain are customized for uniqueness. Additionally, default SSL ciphers are overriden to include weak ciphers for legacy clients. ## Additional resources diff --git a/src/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md b/src/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md index 565ea1ba2d..31cbe80ac4 100644 --- a/src/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md +++ b/src/content/advanced/connectivity/ingress/service-type-loadbalancer/index.md @@ -13,7 +13,7 @@ user_questions: - How do I configure an internal Load Balancer on AWS? - How do I configure an internal Load Balancer on Azure? - How do I configure an internal Load Balancer on GCP? -last_review_date: 2023-11-07 +last_review_date: 2023-11-23 aliases: - /guides/services-of-type-loadbalancer-and-multiple-ingress-controllers/ - /advanced/ingress/service-type-loadbalancer-multi-ic/ @@ -96,7 +96,7 @@ If you want the AWS ELB to be available only within your VPC (can be extended to metadata: name: my-service annotations: - service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 + service.beta.kubernetes.io/aws-load-balancer-internal: "true" ``` On Azure you can configure internal Load Balancers like this. @@ -163,7 +163,7 @@ metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: true # The interval for publishing the access logs (can be 5 or 60 minutes). - service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: 60 + service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: "60" service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: my-logs-bucket service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: logs/prod ``` @@ -180,7 +180,40 @@ metadata: service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: "60" ``` -#### AWS network load balancer +#### Other AWS ELB configuration options + +There are more annotations to manage Classic ELBs that are described below. + +```yaml +metadata: + name: my-service + annotations: + service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" + # The time, in seconds, that the connection is allowed to be idle (no data has + # been sent over connection) before it is closed by the load balancer. + service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" + # Specifies whether cross-zone load balancing is enabled for the load balancer. + service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "environment=prod,owner=devops" + # A comma-separated list of key-value pairs which will be recorded as + # additional tags in the ELB. + service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "" + # The number of successive successful health checks required for a backend to + # be considered healthy for traffic. Defaults to 2, must be between 2 and 10. + service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "3" + # The number of unsuccessful health checks required for a backend to be + # considered unhealthy for traffic. Defaults to 6, must be between 2 and 10. + service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "20" + # The approximate interval, in seconds, between health checks of an + # individual instance. Defaults to 10, must be between 5 and 300. + service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5" + # The amount of time, in seconds, during which no response means a failed + # health check. This value must be less than the service.beta.kubernetesaws-load-balancer-healthcheck-interval + # value. Defaults to 5, must be between 2 and 60. + service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: "sg-53fae93f,sg-42efd82e" + # A list of additional security groups to be added to the ELB. +``` + +#### AWS Network Load Balancers AWS is in the process of replacing ELBs with NLBs (Network Load Balancers) and ALBs (Application Load Balancers). NLBs have a number of benefits over "classic" ELBs including scaling to many more requests. @@ -229,41 +262,54 @@ To avoid downtime, we can create an additional Kubernetes `Service` of type `Loa Always ensure to closely monitor the system throughout this entire process to minimize any unforeseen disruptions. Additionally, remember to perform these tasks during a maintenance window or a period of low traffic to minimize the impact on end users. ---------------------------------------------------- +#### Pitfalls and known limitations of AWS Network Load Balancers -#### Other AWS ELB configuration options +There are several pitfalls and known limitations of AWS Network Load Balancers which can take a long time to troubleshoot. -There are more annotations to manage Classic ELBs that are described below. +##### Martian Packets when using internal AWS Network Load Balancers + +When creating a service of type `LoadBalancer`, Kubernetes normally allocates node ports for each of the exposed ports. The cloud provider's load balancer then uses all your nodes in conjunction with those node ports in its target group to forward traffic into your cluster. + +In this so called target type `instance` the AWS Network Load Balancer by default preserves the client IP. Together with `externalTrafficPolicy: Local` your service will be able to see the untouched source IP address of your client. This is - theoretically - possible, because the traffic back to your client passes the AWS network and the AWS Network Load Balancer is probably a part of this, so can keep track of responses and handle them. + +This works perfectly fine for public client IP addresses, but gets a bit difficult especially for traffic egressing from nodes of the same cluster to internally addressed AWS Network Load Balancers: + +Imagine a pod requesting your AWS Network Load Balancer. The packet hits the load balancer using the node's IP it is running on. This source IP is not getting changed. If the target pod of the service called is running on the same node, the traffic is passing the load balancer and getting sent back to the same node. This node then only sees traffic coming from "somewhere else" with its own IP address. Suspicious, isn't it? Indeed! And because of this the traffic is getting dropped. In the end the whole connection (if TCP) won't be established and simply times out on the client side (both TCP & UDP). + +This whole circumstance is called "Martian Packets". If you are relying on accessing an internal AWS Network Load Balancer from inside the same cluster, you sadly need to disable the client IP preservation of your AWS Network Load Balancer by adding the following annotation: ```yaml metadata: name: my-service annotations: - service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" - # The time, in seconds, that the connection is allowed to be idle (no data has - # been sent over connection) before it is closed by the load balancer. - service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" - # Specifies whether cross-zone load balancing is enabled for the load balancer. - service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "environment=prod,owner=devops" - # A comma-separated list of key-value pairs which will be recorded as - # additional tags in the ELB. - service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "" - # The number of successive successful health checks required for a backend to - # be considered healthy for traffic. Defaults to 2, must be between 2 and 10. - service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "3" - # The number of unsuccessful health checks required for a backend to be - # considered unhealthy for traffic. Defaults to 6, must be between 2 and 10. - service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "20" - # The approximate interval, in seconds, between health checks of an - # individual instance. Defaults to 10, must be between 5 and 300. - service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5" - # The amount of time, in seconds, during which no response means a failed - # health check. This value must be less than the service.beta.kubernetesaws-load-balancer-healthcheck-interval - # value. Defaults to 5, must be between 2 and 60. - service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: "sg-53fae93f,sg-42efd82e" - # A list of additional security groups to be added to the ELB. + # Disable AWS Network Load Balancer client IP preservation. + service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false ``` +See [Target groups for your Network Load Balancers: Client IP preservation](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#client-ip-preservation) for more information about this whole feature. + +##### Health Checks failing when using PROXY protocol and `externalTrafficPolicy: Local` + +The before mentioned limitation directly leads us the next pitfall: One could think "well, if the integrated client IP preservation is not working, I can still use PROXY protocol". In theory and at least for the Kubernetes integrated Cloud Controller this should work. In theory. + +In reality we need to step back and take a look at how health checks are being implemented with `externalTrafficPolicy: Local`: By default and with `externalTrafficPolicy: Cluster` the AWS Network Load Balancer sends its health check requests to the same port it's sending traffic to: The traffic ports defined in the Kubernetes service. From there they are getting answered by the pods backing your service. + +Now, when enabling PROXY protocol, AWS assumes your service is able to understand PROXY protocol for both, traffic and health checks. And this is what happens for services using `externalTrafficPolicy: Cluster` with PROXY protocol enabled: AWS is using it for both the traffic and the health check because in the end and if not configured otherwise by using annotations, they end up on the same port. + +But things change when using `externalTrafficPolicy: Local`: Local means the traffic hitting a node on the allocated traffic node port stays on the same node. Therefore only nodes running at least one pod of your service are eligible targets for the AWS Network Load Balancer's target group as other nodes fail to respond to its health checks. + +Since the health check might get false negative when two pods are running on the same node and one of them is not healthy (anymore), Kubernetes allocates a separate health check node port and configures it in the AWS Network Load Balancer. This health check node port and requests hitting it is handled by `kube-proxy` or its replacement (Cilium in our case). Unfortunately both of them are not able to handle PROXY protocol and therefore all health checks will fail starting the moment you enable PROXY protocol in your AWS Network Load Balancer in conjunction with `externalTrafficPolicy: Local`. + +At last this means there is currently no way of preserving the original client IP using internal AWS Network Load Balancers being accessed from inside the same cluster, except of using PROXY protocol and `externalTrafficPolicy: Cluster` which leads to the AWS Network Load Balancer balancing traffic across all nodes and adding an extra hop for distribution inside the cluster. + +##### Security Group configuration on internal AWS Network Load Balancers + +Last but not least there is one thing, you should take care of, left. If you are not accessing an internal AWS Network Load Balancer from inside your cluster and therefore can actually use the integrated client IP preservation, you might still want to access this load balancer from other internal sources, which is totally fine and working. + +But since their source IP addresses are not getting changed, they are hitting your nodes with their original IP addresses. This can become a problem when using the default Security Group configuration for your nodes. By default an AWS Network Load Balancer adds exceptions for both its own IP addresses and, if public, the internet (0.0.0.0/0). Unfortunately and in the case of internal AWS Network Load Balancer with client IP preservation enabled, your traffic matches none of them. Therefore you might need to manually add the source IP addresses of your other internal services accessing the load balancer to the nodes' Security Group configuration. + +--------------------------------------------------- + ## Further reading - [Running Multiple Ingress NGINX Controllers]({{< relref "/content/advanced/connectivity/ingress/multi-nginx-ic/index.md" >}})