Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readme steps for installation of apps #2364

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Elias-elastisys
Copy link
Contributor

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

What does this PR do / why do we need this PR?

Tries to improve the readme so that it is possible to install apps solely based on it.

Decided in https://github.com/elastisys/ck8s-arch/issues/215

Information to reviewers

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change upgrades CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found in the README but the change in the update-ips script looks dangerous. :o

Comment on lines 164 to 165
log_error "No IPs for ${domain} was found. Will block all IPs"
echo "0.0.0.0"
Copy link
Contributor

@simonklb simonklb Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not do the opposite, i.e. allow all IPs? This function is used to resolve a domain name to an IP so that it can be added to the allow list, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This eventually sets the CIDR in the NP to 0.0.0.0/32, i.e. just the 0.0.0.0 ip which would never be a real IP meaning it blocks everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process_ips_to_cidrs seems to be the one adding the /32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course 🤦

I'm still a bit worried that we might at some point rewrite this and not all IPs go through that same IP to CIDR transformation function.

Also, why would you want it not to fail if the domain can't be resolved? Is it not better to become aware of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when you use a LoadBalancer Service for ingress-nginx you can't create the domains before installing ingress-nginx. And before you could not run update-ips since it would fail and the schema validation would complain if you tried to apply anything. This seemed like a better approach then having to manually set all NPs to something first. Not perfect but I'm not sure if there is a better solution.

Although I just realized I forgot to mention you would have to re-run update-ips again after setting up the DNS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it not be left emptied instead to disallow everything that way? 🤔
(IIRC the associated rules shouldn't be rendered in that case, because empty netpols allow everything, but we have a split between the old and new netpol still.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split the script change and the documentation changes into separate PRs, both good for changelog purposes and review purposes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it not be left emptied instead to disallow everything that way? 🤔

If I leave it empty instead the CIDR ends up as just /32 which fails in the apply

Copy link
Contributor

@cristiklein cristiklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the improvements, but I'm missing a section which discusses "the forest" before discussing "the trees".

@@ -114,6 +114,21 @@ There are two options when managing DNS records, manually or ExternalDNS.

Assuming you already have everything needed to install the apps, this is what you need to do.

> [!NOTE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a section giving an overview of Apps. This is stuff that we already know and have in our DNA, which is why it's so hard to write it. Someone completely new to the project should first learn that:

  • Welkin follows the "configuration as code" principle and expects all configuration to be in a so-called configuration folder.
  • Welkin strongly recommends to have this configuration folder in .git. Apps may be a submodule, so that the configuration folder is the source of truth for how an environment is set up.
  • Welkin Apps' functionality is accessed threw the ./bin/ck8s command-line tool.

Makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a change. Is that satisfactory? Hard to know where to exactly place it.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
@simonklb simonklb dismissed their stale review December 19, 2024 10:27

No longer relevant since script change is moved to separate PR

>
> ```bash
> ./bin/ck8s ops helmfile sc apply -lapp=ingress-nginx --include-transitive-needs
> ./bin/ck8s ops helmfile wc apply -lapp=ingress-nginx --include-transitive-needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this README somewhere and somehow should state that you might not be able to run commands for wc if you have not installed Dex in sc first, since we usually assume for Welkin that wc is configured to point to Dex in sc 🤔

Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe include that the nodes should be labelled with the elastisys.io/node-group based on what group it belongs to (e.g. worker or control-plane) for our node capacity alerts to work properly

@@ -114,6 +120,21 @@ There are two options when managing DNS records, manually or ExternalDNS.

Assuming you already have everything needed to install the apps, this is what you need to do.

> [!NOTE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we need to rethink the structure here.
Because first we tell them to setup domains to something, which in the normal case (with infrastructure providers supporting LB SVC) won't work.
Then second we tell them to setup ingress controller without first setting up the environment directory, which you have to remember to go back to.

I feel like we need to better identify the happy path, and make the structure better at being referable should we need to jump somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants