Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with idpbuilder and argocd on e2e GH workflow #163

Closed
cmoulliard opened this issue Jun 21, 2024 · 0 comments
Closed

Race condition with idpbuilder and argocd on e2e GH workflow #163

cmoulliard opened this issue Jun 21, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@cmoulliard
Copy link
Collaborator

cmoulliard commented Jun 21, 2024

Issue

From time to time we are facing to the following issue and that we can resume as such using argocd 2.10.7 running part of a github workflow in a kube cluster v1.29 :

  • A bootstrap Application has been created and status is synced and auto-prune => namespace argocd (= where argocd lives)
  • The parent - bootstrap Application creates from a gtihub repository 3 children Applications within a namespace named dummy . Example: https://github.com/ch007m/my-quarkus-app-job/blob/main/argocd/my-quarkus-app-job-build.yaml
  • A AppProject has been created for the the children Applications and don't report errors (= yaml content)
  • We can list the children Applications (and their YAML) but their status is empty and no activity, errors happen
  • The Log of Application controller don't report errors or messages that it is processing the children Applications !

Sometimes the github workflow succeeds or fails !
Job succeeded: https://github.com/ch007m/test-2e2-job/actions/runs/9611362409
Job failed: https://github.com/ch007m/test-2e2-job/actions/runs/9611571457

Investigation

After digging into the logs and talking about that with idpbuildder folks, it appeared that the ArgoCD Application (in,stalling Argocd) was not refreshed and by consequence the Application controller started without the patched configMap changing the default values and adding new like: application.namespaces, etc

If argocd is started without the property application.namespaces defining the namespaces where Applications can be created, then they will not processed. This is exactly the problem that we have been faced ;-)

A temporary workaround has been added to the job till idpbuilder 0.6.0 will fix that problem

      - name: Wait till IDP ArgoCD application is sync; ConfigMap patched
        run: |
          SCRIPTS=$(pwd)/.github/scripts
          
          echo "Temporary workaround to refresh ArgoCD Application till https://github.com/cnoe-io/idpbuilder/pull/307 is released"
          kubectl annotate --overwrite applications -n argocd argocd argocd.argoproj.io/refresh='normal'
          
          if ! $SCRIPTS/waitFor.sh application argocd argocd Healthy; then
            echo "Failed to watch application argocd in namespace argocd"
            exit 1;
          fi

          echo "Wait till ConfigMap is patched with data: application.namespaces ..."          
          until kubectl get -n argocd cm/argocd-cmd-params-cm -o json | jq -e '.data | has("application.namespaces")'; do
             echo "Still waiting ..."
             sleep 10s
          done
          
          echo "Rollout Argocd as resources changed ..."
          kubectl rollout restart -n argocd deployment argocd-server
          kubectl rollout restart -n argocd statefulset argocd-application-controller

          kubectl rollout status --watch statefulset/argocd-application-controller -n argocd --timeout=600s
@cmoulliard cmoulliard added the bug Something isn't working label Jun 21, 2024
@cmoulliard cmoulliard changed the title Race condition with e2e GH workflow Race condition with idpbuilder and argocd on e2e GH workflow Jun 21, 2024
cmoulliard added a commit to ch007m/backstage-playground that referenced this issue Jun 21, 2024
cmoulliard added a commit that referenced this issue Jun 26, 2024
* WIP. Add backstage e2e workflow and template

Signed-off-by: cmoulliard <[email protected]>

* WIP. Use kustomize to create the VM resource where images are pused on namespace: vm-images

Signed-off-by: cmoulliard <[email protected]>

* Add write permissions

Signed-off-by: cmoulliard <[email protected]>

* Use manifest like test-data folders to get data to play, templates

Signed-off-by: cmoulliard <[email protected]>

* Add events to see if the job is triggered part of this PR

Signed-off-by: cmoulliard <[email protected]>

* Add pull_request to see if the job is triggered part of this PR

Signed-off-by: cmoulliard <[email protected]>

* Set the node version using an env var. Remove the non needed step to checkout this project as done by the first step

Signed-off-by: cmoulliard <[email protected]>

* Use double quotes as the action cannot get the env var

Signed-off-by: cmoulliard <[email protected]>

* Use proper syntax to pass the NODE version

Signed-off-by: cmoulliard <[email protected]>

* Increase sync reconciliation process time for argocd

Signed-off-by: cmoulliard <[email protected]>

* Create gitea organization to be used within the e2e test

Signed-off-by: cmoulliard <[email protected]>

* Let's make a test using gitea as repository

Signed-off-by: cmoulliard <[email protected]>

* Specify the port of the server: 8443 and prefix the curl URL with https

Signed-off-by: cmoulliard <[email protected]>

* Add missing " char end of the echo line

Signed-off-by: cmoulliard <[email protected]>

* Removing the workingDir

Signed-off-by: cmoulliard <[email protected]>

* Remove from path backstage-playground/ as non needed

Signed-off-by: cmoulliard <[email protected]>

* Fix the wrong enum value of the imageRepository

Signed-off-by: cmoulliard <[email protected]>

* Create the gitea registry credentials as kube secret

Signed-off-by: cmoulliard <[email protected]>

* Adding missing gitea provider to the template. Add missing values to the data body

Signed-off-by: cmoulliard <[email protected]>

* Tekton pipeline will fail as the git repo ingress url cannot be accessed in a pod. WIP

Signed-off-by: cmoulliard <[email protected]>

* Revert the template and test data to use github. Job will fail as tokens are not yet passed using github secrets

Signed-off-by: cmoulliard <[email protected]>

* Add more enum. WIP

Signed-off-by: cmoulliard <[email protected]>

* Fix typo error. WIP

Signed-off-by: cmoulliard <[email protected]>

* Comment the step to create the gitea org as non used. Pass the ARGOCD_SERVER_URL

Signed-off-by: cmoulliard <[email protected]>

* Remove suffix -job. Add missing enum

Signed-off-by: cmoulliard <[email protected]>

* Remove suffix -job

Signed-off-by: cmoulliard <[email protected]>

* Set the proper step name to get the repoContentsUrl

Signed-off-by: cmoulliard <[email protected]>

* Use now github and quay to create repo or image with credentials

Signed-off-by: cmoulliard <[email protected]>

* Remove the double port 8443

Signed-off-by: cmoulliard <[email protected]>

* Rename ARGO_* to ARGOCD_* env vars

Signed-off-by: cmoulliard <[email protected]>

* Pass the github secreats as ENV vars

Signed-off-by: cmoulliard <[email protected]>

* Add missing $ in front of env var QUAY_ORG

Signed-off-by: cmoulliard <[email protected]>

* Add rewrite rule for pod to pod communication using gitea

Signed-off-by: cmoulliard <[email protected]>

* Switch quay org from snowdrop to qshift

Signed-off-by: cmoulliard <[email protected]>

* Use catalogInfoUrl for both gitea and github

Signed-off-by: cmoulliard <[email protected]>

* Get task events

Signed-off-by: cmoulliard <[email protected]>

* Replace simple quotes with double quotes otherwise we dont pass the TASK_ID but the env var name

Signed-off-by: cmoulliard <[email protected]>

* Sleep before to fetch the events

Signed-off-by: cmoulliard <[email protected]>

* Double quotes the GITEA_PASSWORD as it includes special chars

Signed-off-by: cmoulliard <[email protected]>

* Increase sleeping time to get all the events = log messages

Signed-off-by: cmoulliard <[email protected]>

* Use qshift instead of snowdrop for quay.io image repository

Signed-off-by: cmoulliard <[email protected]>

* Describe the resource not found. Show the pods running within the namespace used to test

Signed-off-by: cmoulliard <[email protected]>

* Add 2>&1 and change the logic to wait till we got an event of type completion

Signed-off-by: cmoulliard <[email protected]>

* Add missing double quote

Signed-off-by: cmoulliard <[email protected]>

* Add missing until reties

Signed-off-by: cmoulliard <[email protected]>

* Remove non needed -n chars

Signed-off-by: cmoulliard <[email protected]>

* Rename variable from RESPONSE to EVENTS

Signed-off-by: cmoulliard <[email protected]>

* Let's trigger a change to see if the job succeeded or if we still have a missing github token

Signed-off-by: cmoulliard <[email protected]>

* Increase time to wait to get the events

Signed-off-by: cmoulliard <[email protected]>

* Stop the job if the scaffolding fails

Signed-off-by: cmoulliard <[email protected]>

* Fix bash syntax error with done and add break

Signed-off-by: cmoulliard <[email protected]>

* fix the race condition error and improve the logging. #163

Signed-off-by: cmoulliard <[email protected]>

* Removing dummy as only used for e2e test and added: amunozhe

Signed-off-by: cmoulliard <[email protected]>

* Fix wrong git repo for Aurea

Signed-off-by: cmoulliard <[email protected]>

---------

Signed-off-by: cmoulliard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant