Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Optional Metrics Server #176

Closed
ntwkninja opened this issue Feb 13, 2024 · 14 comments · Fixed by #629
Closed

Enable Optional Metrics Server #176

ntwkninja opened this issue Feb 13, 2024 · 14 comments · Fixed by #629
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers
Milestone

Comments

@ntwkninja
Copy link
Member

ntwkninja commented Feb 13, 2024

Is your feature request related to a problem? Please describe.

As an AWS EKS / UDS Core user, I would like to deploy all my EKS add-ons in a single package via zarf prior to installing UDS Core. Some EKS add-ons depend on metrics server to exist.

Describe the solution you'd like

  • Given a metrics server is already deployed
  • When UDS Core is deployed
  • Then use the previously installed metrics server

Describe alternatives you've considered

1.) Split the EKS Add-Ons into two zarf packages and deploy them on either side of UDS Core (in a bundle) to account for race conditions.

  • Zarf Init EKS w/S3 backed registry
  • Zarf package deploy EBS & EFS CSI drivers (+ EKS Add-Ons that don't depend on metrics server) & configure StorageClass
  • Zarf package deploy UDS Core
  • Zarf package deploy EKS Add-Ons that depend on metrics server

2.) Deploy all EKS Add-Ons prior to UDS Core (in a bundle) and let the austoscaler & node termination handler pods that depend on metrics server dangle until the metrics server is deployed with UDS Core

Additional context

OpenShift has a non-optional metrics server but I'm not aware of other distros or deployments that may benefit from this.

@ntwkninja ntwkninja added the enhancement New feature or request label Feb 13, 2024
@zachariahmiller
Copy link
Contributor

Thinking longer term this feels like something we may want to implement an overall process/package for that does these things, not that said package would live IN uds-core.

We are looking at/thinking about ways to be able to enable or disable a component in a package getting included into a bundle, but nothing exists today (optional components doesnt really fit the use case. FSYA Rob created an issue for that here to at least start the discussion on what that could look like. We were primarily thinking of the metrics server scenario when talking about it.

I know it was mentioned in a thread that you figured out a way to get past the limitation of metrics server being available, but would probably be useful to identify which addons/eks things you are using and if they are consistently used.

Feel free to if any of these are wrong or if something's missing (im sure im forgetting something):

  1. custom zarf init with s3 backend for registry
  2. EBS CSI
  3. EFS CSI
  4. Load Balancer Controller
  5. AWS Node Termination Handler
  6. Cluster Autoscaler
  7. Calico?
  8. VPC CNI?

@ntwkninja
Copy link
Member Author

I know it was mentioned in a thread that you figured out a way to get past the limitation of metrics server being available, but would probably be useful to identify which addons/eks things you are using and if they are consistently used.

Feel free to if any of these are wrong or if something's missing (im sure im forgetting something):

Thanks @zachariahmiller

vpc-cni, kube-proxy & coredns deploy by default (part of core EKS).

We dropped calico right after this feature droppped because netpols enforcement can be done natively via a vpc-cni flag.

EKS Add-Ons used today:

  • EBS CSI
  • EFS CSI
  • AWS Node Termination Handler
  • Cluster Autoscaler
  • Metrics Server
  • AWS Loadbalancer Controller

EKS Add-Ons we're exploring:

  • External Secrets
  • Karpenter (not in IB but seems to be the gold standard for EKS / EC2 autoscaling)

cc: @zack-is-cool @bunchmj

@mjnagel
Copy link
Contributor

mjnagel commented Feb 20, 2024

There was a good/long conversation in Slack about this issue and how to resolve. The tentative position I would like to take is that we provide metrics-server always with uds-core. This means that you would have to disable the distro-provided (or addon) one that you may already be installing so that there aren't any conflicts. Ideally this means less for your deployment to maintain (i.e. Ironbank images, updates to the chart, etc), but won't remove any flexibility since you will still have access to all the helm overrides as needed.

The one part that I would want to confirm - will this cause any issues deploying other things due to dependencies? My understanding is that anything using metrics typically has a soft dependency on it - something like an autoscaler/hpa would deploy successfully but not be "functional" until metrics-server is deployed. I would be curious to hear perspective on whether this is acceptable or problematic at all.

Ultimately just hoping to reduce friction and hopefully reduce maintenance burden on your deploy side. By leveraging uds-core's metrics-server you would get:

  • Ironbank + updates handled for you
  • Network policies included with uds package CR (and other things down the line potentially)
  • Istio mTLS baked in

Ideally these are all pros, so looking for any help understanding the cons here.

@zack-is-cool
Copy link
Member

One con I could see is adoption to certain environments with metrics-server already installed which delivery does not have control over the cluster. Say we were deploying uds-core to an existing customer's cluster.

Would there be a way to make it optional but default to enabled, or at that point are we talking about extra zarf package flavors?

@mjnagel
Copy link
Contributor

mjnagel commented Feb 20, 2024

Yeah @zack-is-cool that's definitely a scenario that would be annoying. Hope is that the most likely 2 environments are EKS and RKE2. EKS doesn't provide metrics-server out of the box, and RKE2 makes it easy to disable so hopefully that's a quick solve.

Optional metrics-server is a path we could look at, but I don't think it makes sense to handle entirely at the zarf/uds-core layer and would likely rely on something like defenseunicorns/uds-cli#426 to allow someone to deselect a component. I think where I'm sort of landing is maybe we should switch metrics-server from required: true to default: true on metrics-server which would allow you to deselect it (if using zarf) and then if/when uds-cli supports that it would work the same way on the bundle side.

@zachariahmiller
Copy link
Contributor

Agreed @mjnagel. I spoke with @UncleGedd earlier on implementing the referenced uds issue with this case in mind and I think once that is implemented and metrics has been switched to default: true this issue will effectively be addressed in a clean, minimally invasive way.

mjnagel added a commit that referenced this issue Feb 20, 2024
## Description

Switches metrics-server from required to default to support deselecting
it when you want to use a k8s distro default metrics-server.

## Related Issue

Related to #176

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request)
followed
@mjnagel
Copy link
Contributor

mjnagel commented Feb 21, 2024

Going to have to take a step back here temporarily - default: true was causing issues due to an upstream zarf bug so we are reverting that change for now. Still intend to support this use-case and make metrics-server optional, but might wait on the refactor to optionality in zarf.

@mjnagel
Copy link
Contributor

mjnagel commented Mar 21, 2024

Blocked by zarf-dev/zarf#2320 / the refactor to optionality in zarf.

robmcelvenny pushed a commit to owen-grady/uds-core-slim-dev that referenced this issue Jun 3, 2024
## Description

Switches metrics-server from required to default to support deselecting
it when you want to use a k8s distro default metrics-server.

## Related Issue

Related to defenseunicorns/uds-core#176

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request)
followed
@mjnagel mjnagel removed the blocked label Jun 24, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Jul 2, 2024

I believe at this point we should be able to switch to default: true and validate this works as expected on the zarf side (we hit a bug deploying from OCI, so may want to publish to a temp location and validate that process).

The linked uds-cli issue would still be necessary to provide this at the bundle layer, but this would complete everything we need to resolve on our side.

@mjnagel mjnagel added the good first issue Good for newcomers label Jul 2, 2024
@drewhagen drewhagen self-assigned this Jul 8, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Jul 12, 2024

@drewhagen just a heads up this might be a bit in the air pending some CLI discussions on how they handle those changes. Might be worth moving this a blocked state pending decision there.

@drewhagen
Copy link
Contributor

drewhagen commented Jul 15, 2024

@mjnagel Sounds good. Switching to blocked.
Are we tracking a CLI issue for these discussions that can be linked up?

@mjnagel
Copy link
Contributor

mjnagel commented Jul 15, 2024

I think defenseunicorns/uds-cli#426 and defenseunicorns/uds-cli#757 are the most relevant here. Per some discussions with @UncleGedd last week the likely path forward is that uds-cli will have more intelligence/logic specifically around handling of uds-core.

@mjnagel mjnagel removed the blocked label Jul 19, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Jul 19, 2024

@drewhagen pivoting a bit here based on thinking this through and chatting with the team. Most k8s distros that we validate testing on provide metrics-server by default (ex: rke2, k3s, etc) or are quite common addons (ex: eks).

New path forward would be making metrics-server an optional component, not deployed by default:

  • Change metrics-server from required: true to required: false, specifically commenting it as an optional component since most distros provide it out of the box
  • Identify this as a breaking change (we use conventional commits)
  • Ensure that metrics-server is still tested in CI and part of the demo-bundle + slim-dev by leveraging uds-cli optionalComponents. These bundles are used for demos + other package CI and uds-k3d does not provide metrics-server so we should keep it as a component.
  • Add to the EKS test bundle using the same optionalComponents

@mjnagel mjnagel added this to the 0.25.0 milestone Jul 22, 2024
mjnagel added a commit that referenced this issue Aug 1, 2024
BREAKING CHANGE: metric server is no longer required and is optional. To
enable it, add it to the optionalComponents key on the core package

Release-As: 0.25.0

## Description
Since many of our supported Kubernetes distros (rke2, k3d, eks) often
come with metrics-server or have an add-on, we make this change to
optionally enable it at the uds-core layer.

## Related Issue

Relates to
[uds-core#176](#176)

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ x ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [ ] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed

Co-authored-by: Micah Nagel <[email protected]>
@mjnagel
Copy link
Contributor

mjnagel commented Aug 1, 2024

Linking a small docs update to close this out fully - although the primary change has already happened in code.

mjnagel added a commit that referenced this issue Aug 1, 2024
## Description

Updates docs to reference override for TLS version + reflect metrics
server being optional now.

## Related Issue

Related to #599

Fixes #176

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants