Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kasten K10 Helm Chart #28

Closed
wants to merge 15 commits into from
Closed

Kasten K10 Helm Chart #28

wants to merge 15 commits into from

Conversation

ajbergh
Copy link

@ajbergh ajbergh commented Jan 24, 2023

Issue #, if available:

Description of changes:

Initial PR for Kasten K10 Helm Chart

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Contributor

@elamaran11 elamaran11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this issue on values. Also please submit a functional job as we discussed.

version: 5.5.3
interval: 1m0s
targetNamespace: kasten-io
values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajbergh This will make the deployment fail. values is empty. Please either remove values line or add some values thats needed for your installation under it.

Copy link
Contributor

@elamaran11 elamaran11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajbergh @shuguet This file primer.yaml should be moved to https://github.com/aws-samples/eks-anywhere-addons/tree/main/eks-anywhere-common/Testers under a new folder for Kasten/K10.

Copy link
Contributor

@shapirov103 shapirov103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajbergh is observability tooling critical for the product to work? I see prometheus, grafana, etc. can you supply config without that tooling?

@shuguet
Copy link

shuguet commented Jan 27, 2023

Without that tooling, many parts of K10 won't work. If we remove them as part of the installation of K10, we expect the customer/end-user to supply them (so the expectation is that they would already be installed).

Therefor, easiest way to test K10 functionality here is to simply install them as part of K10.

Copy link
Contributor

@elamaran11 elamaran11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix these comments and update the PR.

VSC_NAME="csi-aws-vsc";
apk add --no-cache helm curl bash jq;
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl";
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job fails Please see the log below. It is expecting an annotation.

CSI Snapshot Walkthrough:
  Unable to find a VolumeSnapshotClass for provisioner (ebs.csi.aws.com) with k10 annotation (k10.kasten.io/is-snapshot-class) set to true.  -  Error
Error: {"message":"Unable to find a VolumeSnapshotClass for provisioner (ebs.csi.aws.com) with k10 annotation (k10.kasten.io/is-snapshot-class) set to true.","function":"kasten.io/k10/kio/tools/k10primer.(*TestRetVal).Errors","linenumber":180,"file":"kasten.io/k10/kio/tools/k10primer/k10primer.go:180"}

serviceaccount "k10-primer" deleted
clusterrolebinding.rbac.authorization.k8s.io "k10-primer" deleted
job.batch "k10primer" deleted

@@ -0,0 +1,18 @@
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Prometheus and all Observability pods which is added to this helm release.

@shapirov103
Copy link
Contributor

Without that tooling, many parts of K10 won't work. If we remove them as part of the installation of K10, we expect the customer/end-user to supply them (so the expectation is that they would already be installed).

Therefor, easiest way to test K10 functionality here is to simply install them as part of K10.

@shuguet Is grafana necessary for kasten to work? We will supply prometheus and metrics server in future as part of the installation. I understand that observability is critical for kubernetes cluster as a whole, however, we cannot afford each product supplying them as part of their installation. it is an edge device, so far from cloud scale.

@shuguet
Copy link

shuguet commented Jan 27, 2023

Grafana can be disabled.
After checking with our engineering team, Prometheus can also be disabled if really starved on resources.
I have a branch up on our fork of the repo for my colleague @ajbergh to review with the necessary helm values set: https://github.com/kastenhq/eks-anywhere-addons/blob/k10-without-prometheus-grafana/eks-anywhere-common/Addons/Partner/Kasten/K10/helm.yaml
I also updated our Tester script that was failing, it's waiting in that same branch: https://github.com/kastenhq/eks-anywhere-addons/blob/k10-without-prometheus-grafana/eks-anywhere-common/Testers/Kasten/K10/primer.yaml
I can merge both in this branch if someone can test on your end, it's end of day for me here (CEST).

@shapirov103
Copy link
Contributor

Grafana can be disabled. After checking with our engineering team, Prometheus can also be disabled if really starved on resources. I have a branch up on our fork of the repo for my colleague @ajbergh to review with the necessary helm values set: https://github.com/kastenhq/eks-anywhere-addons/blob/k10-without-prometheus-grafana/eks-anywhere-common/Addons/Partner/Kasten/K10/helm.yaml I also updated our Tester script that was failing, it's waiting in that same branch: https://github.com/kastenhq/eks-anywhere-addons/blob/k10-without-prometheus-grafana/eks-anywhere-common/Testers/Kasten/K10/primer.yaml I can merge both in this branch if someone can test on your end, it's end of day for me here (CEST).

@shuguet
Please merge to main in your fork. It is on gitops will sync and we will tell you the outcome.

@elamaran11
Copy link
Contributor

@shuguet Im facing the same error again even the new changes running from your fork with k10-without-prometheus-grafana. Please fix the errors, run the complete test job from your end and also move it to your main in your repo and let us know please.

Basically i dont know why kasten job expects `VolumeSnapshotClass1 to be annoted, thats needs to be fixed in the test job.

Using "K10_PRIMER_CONFIG_YAML" env var content as config source
Using "K10_PRIMER_CONFIG_YAML" env var content as config source
I0127 22:43:22.873254      12 request.go:601] Waited for 1.049352568s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/generators.external-secrets.io/v1alpha1
CSI Snapshot Walkthrough:
  Unable to find a VolumeSnapshotClass for provisioner (ebs.csi.aws.com) with k10 annotation (k10.kasten.io/is-snapshot-class) set to true.  -  Error
Error: {"message":"Unable to find a VolumeSnapshotClass for provisioner (ebs.csi.aws.com) with k10 annotation (k10.kasten.io/is-snapshot-class) set to true.","function":"kasten.io/k10/kio/tools/k10primer.(*TestRetVal).Errors","linenumber":180,"file":"kasten.io/k10/kio/tools/k10primer/k10primer.go:180"}
❯ k describe volumesnapshotclasses.snapshot.storage.k8s.io csi-aws-vsc                                         ─╯
Name:             csi-aws-vsc
Namespace:
Labels:           kustomize.toolkit.fluxcd.io/name=classes
                  kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:      <none>
API Version:      snapshot.storage.k8s.io/v1
Deletion Policy:  Delete
Driver:           ebs.csi.aws.com
Kind:             VolumeSnapshotClass
Metadata:
  Creation Timestamp:  2023-01-27T15:54:41Z
  Generation:          1
  Managed Fields:
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  creationTimestamp: "2023-01-26T22:26:21Z"
  finalizers:
  - finalizers.fluxcd.io
  generation: 8
  name: testers
  namespace: flux-system
  resourceVersion: "775929"
  uid: 660db8c4-2f98-4d0f-a44e-7b66d155f763
spec:
  gitImplementation: go-git
  interval: 1m0s
  ref:
    branch: k10-without-prometheus-grafana
  timeout: 60s
  url: https://github.com/kastenhq/eks-anywhere-addons

@elamaran11
Copy link
Contributor

I also see metering svc crashing consistently now. Please check that too

❯ kgp -n kasten-io                                                                                             ─╯
NAME                                     READY   STATUS                  RESTARTS        AGE
aggregatedapis-svc-744598f446-6knj7      1/1     Running                 0               15m
auth-svc-6ffb67c7dc-5sqtr                1/1     Running                 0               15m
catalog-svc-5bb5d97cbf-7cj2z             2/2     Running                 0               15m
controllermanager-svc-7cb78d758d-9xmk2   1/1     Running                 1 (9m28s ago)   15m
crypto-svc-6bf7d4bd99-mx9tl              4/4     Running                 0               15m
dashboardbff-svc-54f755c4c8-lv74p        2/2     Running                 2 (7m48s ago)   15m
executor-svc-78bc87dbcc-bn25d            2/2     Running                 0               15m
executor-svc-78bc87dbcc-mmss5            2/2     Running                 0               15m
executor-svc-78bc87dbcc-pn7ct            2/2     Running                 0               15m
frontend-svc-85fd7787cb-bwdz6            1/1     Running                 0               15m
gateway-7c59756b88-86kxp                 1/1     Running                 1 (14m ago)     15m
jobs-svc-7b95ffdb8d-dqlsb                1/1     Running                 0               15m
k10-primer-job-c69mf                     0/1     Completed               0               7m28s
kanister-svc-6f7c58c8c9-7l4k2            1/1     Running                 1 (13m ago)     15m
logging-svc-7d57957cd7-j82dc             1/1     Running                 0               15m
metering-svc-669b9f4d9c-mzf4x            0/1     Init:CrashLoopBackOff   5 (106s ago)    15m
state-svc-66bb5fc4cc-nn4kw               2/2     Running                 0               15m

@elamaran11
Copy link
Contributor

More logs

❯ k logs k10-primer-job-c69mf -n kasten-io                                                                     ─╯
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/12) Installing ncurses-terminfo-base (6.3_p20221119-r0)
(2/12) Installing ncurses-libs (6.3_p20221119-r0)
(3/12) Installing readline (8.2.0-r0)
(4/12) Installing bash (5.2.15-r0)
Executing bash-5.2.15-r0.post-install
(5/12) Installing ca-certificates (20220614-r4)
(6/12) Installing brotli-libs (1.0.9-r9)
(7/12) Installing nghttp2-libs (1.51.0-r0)
(8/12) Installing libcurl (7.87.0-r1)
(9/12) Installing curl (7.87.0-r1)
(10/12) Installing helm (3.10.2-r1)
(11/12) Installing oniguruma (6.9.8-r0)
(12/12) Installing jq (1.6-r2)
Executing busybox-1.35.0-r29.trigger
Executing ca-certificates-20220614-r4.trigger
OK: 60 MiB in 27 packages
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    910      0 --:--:-- --:--:-- --:--:--   913
100 45.7M  100 45.7M    0     0  12.2M      0  0:00:03  0:00:03 --:--:-- 12.9M
"kasten" has been added to your repositories
Error from server: etcdserver: request timed out
Namespace option not provided, using default namespace

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

This PR has been automatically marked as stale because it has been open 60 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Aug 2, 2023
@elamaran11
Copy link
Contributor

@ajbergh @shuguet Please confirm if you are still working on this to fix our comments . The ticket will be closed in 10 days if there is no action.

@slotdawg
Copy link

slotdawg commented Aug 2, 2023

Hi team, I'm picking this up on behalf of @ajbergh as I am his backfill. I'm just getting up to speed on this EKS-A add-on, so will need a bit of time to deploy an environment and start testing.

@elamaran11 elamaran11 removed the stale label Aug 2, 2023
@elamaran11
Copy link
Contributor

@slotdawg Sounds good, thankyou for responding. I have removed the stale flag for now. Please keep us posted.

@elamaran11
Copy link
Contributor

@slotdawg Please confirm if you are still working on this.

@slotdawg
Copy link

Hi @elamaran11 - I am, yes. Last test everything worked as expected but I'm running through additional testing now to confirm

@elamaran11
Copy link
Contributor

Hi @elamaran11 - I am, yes. Last test everything worked as expected but I'm running through additional testing now to confirm

Sounds good.

@elamaran11 elamaran11 closed this Oct 2, 2023
@elamaran11 elamaran11 reopened this Oct 2, 2023
@elamaran11
Copy link
Contributor

Closing the PR since there is no movement on this work. @slotdawg Please reach back when the PR is ready.

@elamaran11 elamaran11 closed this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants