Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC old tekton resources periodically #750

Closed
Bobgy opened this issue Aug 25, 2020 · 8 comments · Fixed by #814
Closed

GC old tekton resources periodically #750

Bobgy opened this issue Aug 25, 2020 · 8 comments · Fixed by #814

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Aug 25, 2020

Discovered in GoogleCloudPlatform/kubeflow-distribution#128.

Currently all triggered PipelineRun and other tekton resources accumulate for ever.
We need to GC tekton resources.

Tekton community has an issue tektoncd/experimental#479 tracking this.
But there's no ready-to-use solution yet.

Before sth comes out of above issue, we should have at least a bash script that cleans up these resources.

@Bobgy
Copy link
Contributor Author

Bobgy commented Aug 25, 2020

/cc @jlewi

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.86
platform/gcp 0.86
kind/bug 0.67

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@Bobgy
Copy link
Contributor Author

Bobgy commented Aug 25, 2020

I wrote a very simple bash script for this purpose:

#!/bin/bash

set -e

# This script deletes k8s resources in batches of 100.
# usage: ./rm_k8s_res.sh <input-file-path>
# for example: ./rm_k8s_res.sh ./input.txt
# get the input file by
# ```
# kubectl get <resource-type> --sort-by=.metadata.creationTimestamp -o name > input.txt
# ```
# input file is a list of resource names to be deleted ordered by creation timestamp.
# if only want to delete old resources, you can manually remove the last section to keep latest resources.

input="$1"
resource="$2"
echo "Deleting $2 from file $input"
names=()
while IFS=" " read -r name age
do
  names+=($name)
  if [ ${#names[@]} -gt 100 ]; then
    echo "Deleting ${names[@]}"
    kubectl delete ${names[@]} || echo "already deleted"
    names=()
  fi
done < "$input"
if [ ${#names[@]} -gt 0 ]; then
  echo "Deleting ${names[@]}"
  kubectl delete ${names[@]} || echo "already deleted"
fi

@Bobgy
Copy link
Contributor Author

Bobgy commented Aug 25, 2020

After deleting until ~100 pipelineruns left in the cluster, https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/auto-deploy/pipelineruns can successfully load pipelineruns now.

@jlewi
Copy link
Contributor

jlewi commented Aug 25, 2020

@Bobgy FYI this is the script we use to cleanup argo workflows.

def cleanup_workflows(args):

@jlewi
Copy link
Contributor

jlewi commented Oct 29, 2020

#767 has a very simple program. The program can be used with arbitrary K8s resources and can delete things based on minimum age.

We need to turn that into a regularly running job with a suitable minimum age (e.g. 72 hours).

We can use the same pattern as we do for groups sync:
https://github.com/kubeflow/internal-acls/tree/master/google_groups/manifests

  • Create a deployment that runs it periodically in a loop with some delay
  • Run git-sync in a sidecar to fetch the config from the repo

We could also use a cron job.

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 29, 2020

@jlewi I think it will be easier to debug if the tool is called from a periodic prow job, that makes maintenance easier -- finding logs, health status, change commands, etc

@Bobgy
Copy link
Contributor Author

Bobgy commented Nov 17, 2020

I created #813 to dockerize the tool @jlewi built, next step would be adding a cronjob to run this in kf-releasing cluster.

Bobgy added a commit that referenced this issue Nov 19, 2020
…#750 (#814)

* wip

* fix cleanup job

* fix cleanup cronjob permission

* add cleanup job for kf-ci-v1 cluster

* update namespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants