Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart config for individual hubs as separate files rather than one cluster-level config #857

Closed
1 of 4 tasks
Tracked by #879
consideRatio opened this issue Nov 26, 2021 · 23 comments
Closed
1 of 4 tasks
Tracked by #879
Assignees

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Nov 26, 2021

I think it is a recurring pain point that the helm chart config are embedded in config/hubs/<k8s-cluster-name>.cluster.yaml files that doesn't have a typical helm chart config. I have found this to be troublesome for myself during local development and debugging, and I'm now running into a situation where additional complexity would be required to do some helm template validation logic for our CI system to resolve #279.

I'd like to propose that we extract the helm chart specific config from <k8s-cluster-name>.cluster.yaml into <hub-name>.hub.helm-chart-config.yaml for example.

I would argue that doing this will reduce the overall complexity of things as well, and that it would move us in the right direction with regards to ease of maintenance and supporting a practical right to replicate.

Tasks to close issue

  • Reach agreement on extracting the helm chart config into a standalone file is a good idea
  • Update the deployer script to work with standalone helm chart config files in one way or another
  • Update config/hubs/schema.yaml to reflect the new structure
  • Extract the helm chart configs into standalone files

Work blocked by this

I'll wait with investing more time into trying to resolve #279 until we have an agreement on what to do with regards to having the helm chart config embedded in a non-helm-chart config yaml file.

If we agree to extract the helm chart specific config, I can keep working on a PR to close #279 that can be merged as soon as we have resolved this.

@sgibson91
Copy link
Member

sgibson91 commented Nov 26, 2021

I think this issue will also need to be thought about in terms of #818 as well.

The current proposal there is:

instead of having a big config file for the cluster, let's have a directory to represent the cluster plus 1) a cluster-specific yaml file and 2) several hub-specific yaml files for each staging/prod pair.

So a directory tree might look something like:

|-config/
  |-{{ cluster_name }}/
    |-prod.yaml
    |-staging.yaml
    |-cluster.yaml

@consideRatio consideRatio changed the title Separate helm chart config from <name>.cluster.yaml files into <name>.chart-config.yaml files? Separate helm chart config from <k8s-cluster-name>.cluster.yaml files into <hub-name>.hub.chart-config.yaml files? Nov 26, 2021
@consideRatio consideRatio changed the title Separate helm chart config from <k8s-cluster-name>.cluster.yaml files into <hub-name>.hub.chart-config.yaml files? Separate helm chart config from <k8s-cluster-name>.cluster.yaml files into <hub-name>.hub.helm-chart-config.yaml files? Nov 26, 2021
@consideRatio
Copy link
Contributor Author

Thanks @sgibson91 for chiming in! I'll try to chime in at #818 regarding that as well but will try to avoid mixing the scopes of these issues that still represents somewhat different needs.

If we can reach agreement on extracting the helm chart config into a standalone file is a good idea, its a big step forward for this issue no matter what is done with regards to the bigger structure refactoring in #818.

@choldgraf
Copy link
Member

Just jumping in quickly from vacation to say that I think the proposal in #818 (comment) sounds reasonable to me.

It feels to me that resolving this issue / that comment could be considered a precursor to resolving #818 (or sub-task of resolving it). If we think we are heading in the direction of factoring out hub- and cluster-specific configuration into separate config, I'd be +1 on going for it now.

Anything that can lay a long-term groundwork that we are happy with is a good idea to do now, while there are relatively fewer unique clusters/hubs to maintain.

Can anybody give a strong reason why we want to keep the configuration combined? If not, I think we should just put this on our backlog and do it

@yuvipanda
Copy link
Member

Right now, we can use YAML anchors to refer to a prod hub that has the same config as a staging hub - like in

. This is a common pattern, as we want to have a staging hub for each cluster we have. If we have separated files, how would we achieve this?

@consideRatio
Copy link
Contributor Author

This is a common pattern, as we want to have a staging hub for each cluster we have. If we have separated files, how would we achieve this?

I suggest that the deployer script is then made able to recognize multiple separate values files, just like helm upgrade --values file1.yaml --values file2.yaml can be used, the deployer script would be able to install some chart with an ordered list of values files.

@damianavila
Copy link
Contributor

This is interesting: https://stackoverflow.com/questions/44910886/pyyaml-include-file-and-yaml-aliases-anchors-references, although it feels unstable, not reliable enough...
Going forward with the R2R principle... maybe we should avoid running with this pattern? I know then you would have a new problem because you may want to keep things in sync... 🤔
BTW, @consideRatio seems an interesting alternative if we want to keep the pattern...

@consideRatio
Copy link
Contributor Author

consideRatio commented Nov 29, 2021

@damianavila what did this pattern refer to in "maybe we should avoid running with this pattern?"

If we want to have shared configuration between hubs etc, I'm thinking we can handle it by having shared values files rather than anchors in a single massive file, and to have the shared values files be used like helm upgrade ... --values file1.yaml --values file2.yaml etc.

For me, the R2R is improved by reduced complexity and ease of comprehension to a large extent. I'm thinking that spitting apart the k8s cluster config files we currently have into helm chart native values files is an improvement regarding complexity and ease of comprehension.

@damianavila
Copy link
Contributor

@damianavila what did this pattern refer to in "maybe we should avoid running with this pattern?"

The pattern of relying on anchors to avoid duplication... I am thinking if it makes sense to be repetitive for the sake of explicity...

For me, the R2R is improved by reduced complexity and ease of comprehension to a large extent. I'm thinking that spitting apart the k8s cluster config files we currently have into helm chart native values files is an improvement regarding complexity and ease of comprehension.

I totally agree with that! And I think your proposal with several "value" files is probably a good compromise between easiness to comprehend and not being so repetitive...

@consideRatio
Copy link
Contributor Author

Ah thanks for clarifying @damianavila!

@choldgraf
Copy link
Member

Update: let's make a proposal

In our team meeting today, we agreed that we have had enough discussion around this that it's time to make a proposal for configuration structure that we can iterate on. @consideRatio and @sgibson91 were interested in working together to come up with something that we can then iterate on.

@choldgraf choldgraf changed the title Separate helm chart config from <k8s-cluster-name>.cluster.yaml files into <hub-name>.hub.helm-chart-config.yaml files? Helm chart config for individual hubs as separate files rather than one cluster-level config Jan 4, 2022
@consideRatio
Copy link
Contributor Author

A summary of the work I see needed to resolve this and other related issues

This issue was created about one goal, but I hope to see multiple goals solved with the proposed API changes to our configuration that the deployer script reads.

  1. To have pure helm chart values files allowing us to run helm template --validate etc without trouble.
  2. To enable GitHub workflows define jobs that trigger effectively to not redeploy too much or too little.
  3. To enable a dependency on misc helm charts, like for example jupyterhub-ssh
  4. To enable a dependency on a cluster support chart or similar.

I'm not sure what proposal of changes to our custom configuration structure is needed, but I'd like us to consider these kinds of challenges when coming up with a config structure proposal.

@sgibson91
Copy link
Member

RE: swapping YAML anchors for separate values file, I wonder if it's a good idea to replicate the common.yaml structure I've seen a lot where multiple hub configs are used. So common.yaml represents config common to a group of hubs and is always loaded, then staging.yaml, prod.yaml, my-other-cool-hub.yaml represent config specific to a single hub.

@consideRatio
Copy link
Contributor Author

consideRatio commented Jan 4, 2022

About common.yaml, I'm very open to the idea of having such file, but, I don't want it to be used because of its name - but because it is explicitly listed as a values file for the hub.

So, concretely - the deployer script will deploy a helm chart based on values files explicitly specified - not by naming convention. I think that will solve situations where you for example want to have multiple common.yaml files that you opt into.

There is a restriction though, which is that multiple values files that are coalesced will merge in the helm way, while an anchor can be used with finer level of detail to inject elements into an list I think for example - but that is a compromise I think we just have to accept no matter what.

@consideRatio
Copy link
Contributor Author

Some concrete ideas

  1. What we have referred to a "hub" is practically a helm chart release (an installation of a helm chart).
  2. Each hub / helm chart release explicitly lists its values files
  3. Each hub / helm chart release explicitly lists dependencies to other hub / helm chart releases
  4. We attempt to make the support deployment another hub / helm chart release that the hub's in the cluster explicitly depend on

(I'm taking a computer break for today now!)

@sgibson91
Copy link
Member

So long as I can still write python deployer deploy ... HUB_NAME on the command line, and the deployer knows where to go for these explicit mappings, I am open to this suggestion. I foresee human error if I have to explicitly remember each values file I need to provide on the command line. We can then also check the mappings into version control too and hence rollback to given config mappings via git and CI can then read them too.

I'm also think I need some help understanding/visualising your other suggestions and how that relates to just isolating config for validation or deployment? (But we can iterate on that!)

@consideRatio
Copy link
Contributor Author

So long as I can still write python deployer deploy ... HUB_NAME on the command line, and the deployer knows where to go for these explicit mappings, I am open to this suggestion. I foresee human error if I have to explicitly remember each values file I need to provide on the command line

Absolutely, I'm thinking the values files are explicit because they are part of the deployer scripts configuration.

@sgibson91
Copy link
Member

The currently proposal is outline in https://hackmd.io/@29EjZnuAQx-ME7Olgi2JXw/Sk5S7xP6F and looks something like the following (exact details TBD)

config
└── clusters
    ├── cluster_1
    │   ├── cluster.yaml
    │   ├── prod.hub.values.yaml
    │   ├── staging.hub.values.yaml
    │   └── support.values.yaml
    └── cluster_2
        ├── cluster.yaml
        ├── prod.hub.values.yaml
        ├── staging.hub.values.yaml
        └── support.values.yaml

@choldgraf asked what a version for communities for with more than one hub on a cluster might look like, possibly something like this

config
└── clusters
    ├── cluster_1
    |   ├── community_A
    |   |   ├── staging.hub.values.yaml
    |   |   └── prod.hub.values.yaml
    │   ├── cluster.yaml
    │   ├── prod.hub.values.yaml
    │   ├── staging.hub.values.yaml
    │   └── support.values.yaml
    └── cluster_2
        ├── cluster.yaml
        ├── prod.hub.values.yaml
        ├── staging.hub.values.yaml
        └── support.values.yaml

I think we would have to be careful in terms of name clashing here though...

... Actually maybe a better to address this is to have the filename convention {{ COMMUNITY_NAME }}.{{ HUB_NAME }}.values.yaml. This way we can:

  • Group hubs together by community based on a simple sort
  • Can extract {{ COMMUNITY_NAME }}.{{ HUB_NAME }} into the name of the helm release (replace the . for a -)

@damianavila
Copy link
Contributor

... Actually maybe a better to address this is to have the filename convention {{ COMMUNITY_NAME }}.{{ HUB_NAME }}.values.yaml.

In that case, I guess we would not have a staging.hub.values.yaml nor prod.hub.values.yaml files anymore because all would be prepended by a community name, right? Our hubs would be 2i2c ones, I suppose?

@sgibson91
Copy link
Member

@damianavila The thought still needs fleshing out, but I don't think it's critical right now as we don't currently host any hubs following that pattern.

@choldgraf
Copy link
Member

choldgraf commented Mar 1, 2022

I think that with the recent refactors, we could now close this one

some relevant PRs:

is that right?

@sgibson91
Copy link
Member

I believe so, unless it is strongly felt that the support config is also included in this umbrella?

@choldgraf
Copy link
Member

Hmm - I'll leave the answer to that question up to the @2i2c-org/tech-team because I don't understand the implications. Is there a strong reason to block closing this issue on grouping the support config as well?

@consideRatio
Copy link
Contributor Author

Let's close and start fresh for anything remaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants