Skip to content
This repository has been archived by the owner on Jun 19, 2022. It is now read-only.

Brokercell reconciling sometimes happens long after trigger finished reconciling #1437

Closed
yolocs opened this issue Jul 13, 2020 · 2 comments
Closed
Labels
area/broker flaky-test kind/bug Something isn't working priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 storypoint/5
Milestone

Comments

@yolocs
Copy link
Member

yolocs commented Jul 13, 2020

Describe the bug
During the debug of an e2e test, I noticed a long delay (~18s) between a trigger-controller finished reconciling and brokercell reconciled TargetsConfig configmap.

The test flow is 1) create broker/source (wait until ready) 2) create event display and trigger (wait until ready) 3) wait 15s 4) send pubsub message and expect it to appear in event display. If the brokercell fails to reconcile the configmap within 15s after trigger being ready, the test will fail.

From the controller log, the two entries have 18s gap (the order is reversed):

 {
   "insertId": "01gijhkt6lunv1rab",
   "jsonPayload": {
     "caller": "record/event.go:281",
     "ts": "2020-07-13T03:33:34.524Z",
     "msg": "Event(v1.ObjectReference{Kind:\"BrokerCell\", Namespace:\"cloud-run-events\", Name:\"default\", UID:\"cb5ae973-5b6a-43d2-b171-71dd01d6fa7c\", APIVersion:\"internal.events.cloud.google.com/v1alpha1\", ResourceVersion:\"13251\", FieldPath:\"\"}): type: 'Normal' reason: 'ConfigMapUpdated' Updated configmap cloud-run-events/default-brokercell-broker-targets",
     "events.cloud.google.com/controller": "brokercell-controller",
     "logger": "controller.brokercell-controller.event-broadcaster",
     "level": "info"
   },
   "resource": {
     "type": "k8s_container",
     "labels": {
       "namespace_name": "cloud-run-events",
       "container_name": "controller",
       "project_id": "cr-gke-boskos-6",
       "location": "us-west1",
       "cluster_name": "cr-gkeaddon-testing-cluster-17697",
       "pod_name": "controller-55d9d75c79-8gvr4"
     }
   },
   "timestamp": "2020-07-13T03:33:34.524551810Z",
   "severity": "INFO",
   "labels": {
     "k8s-pod/pod-template-hash": "55d9d75c79",
     "k8s-pod/role": "controller",
     "k8s-pod/app": "cloud-run-events"
   },
   "logName": "projects/cr-gke-boskos-6/logs/stdout",
   "receiveTimestamp": "2020-07-13T03:33:37.519008093Z"
 },
 {
   "insertId": "01gijhkt6lunv1qvy",
   "jsonPayload": {
     "caller": "record/event.go:281",
     "ts": "2020-07-13T03:33:16.407Z",
     "msg": "Event(v1.ObjectReference{Kind:\"Trigger\", Namespace:\"test-smoke-cloud-run-events-gnxt6\", Name:\"gke-cr-test-pubsub-tr-ddxsvfsd\", UID:\"144f91a2-7f51-481d-8254-54a5c6e4e437\", APIVersion:\"eventing.knative.dev/v1beta1\", ResourceVersion:\"14114\", FieldPath:\"\"}): type: 'Normal' reason: 'TriggerReconciled' Trigger reconciled: \"test-smoke-cloud-run-events-gnxt6/gke-cr-test-pubsub-tr-ddxsvfsd\"",
     "logger": "controller.event-broadcaster",
     "level": "info"
   },
   "resource": {
     "type": "k8s_container",
     "labels": {
       "pod_name": "controller-55d9d75c79-8gvr4",
       "container_name": "controller",
       "cluster_name": "cr-gkeaddon-testing-cluster-17697",
       "namespace_name": "cloud-run-events",
       "location": "us-west1",
       "project_id": "cr-gke-boskos-6"
     }
   },
   "timestamp": "2020-07-13T03:33:16.407711743Z",
   "severity": "INFO",
   "labels": {
     "k8s-pod/role": "controller",
     "k8s-pod/app": "cloud-run-events",
     "k8s-pod/pod-template-hash": "55d9d75c79"
   },
   "logName": "projects/cr-gke-boskos-6/logs/stdout",
   "receiveTimestamp": "2020-07-13T03:33:19.527781077Z"
 },
 {
   "insertId": "01gijhkt6lunv1qvx",
   "jsonPayload": {
     "msg": "Reconcile succeeded. Time taken: 322.620468ms",
     "ts": "2020-07-13T03:33:16.407Z",
     "knative.dev/key": "test-smoke-cloud-run-events-gnxt6/gke-cr-test-pubsub-tr-ddxsvfsd",
     "level": "info",
     "logger": "controller.github.com-google-knative-gcp-pkg-reconciler-trigger.Reconciler",
     "caller": "controller/controller.go:474",
     "knative.dev/traceid": "01f6ca33-8335-400f-8860-fadfd7ed1a59"
   },
   "resource": {
     "type": "k8s_container",
     "labels": {
       "project_id": "cr-gke-boskos-6",
       "location": "us-west1",
       "container_name": "controller",
       "namespace_name": "cloud-run-events",
       "cluster_name": "cr-gkeaddon-testing-cluster-17697",
       "pod_name": "controller-55d9d75c79-8gvr4"
     }
   },
   "timestamp": "2020-07-13T03:33:16.407631036Z",
   "severity": "INFO",
   "labels": {
     "k8s-pod/app": "cloud-run-events",
     "k8s-pod/pod-template-hash": "55d9d75c79",
     "k8s-pod/role": "controller"
   },
   "logName": "projects/cr-gke-boskos-6/logs/stdout",
   "receiveTimestamp": "2020-07-13T03:33:19.527781077Z"
 },

Expected behavior
The brokercell controller should be able to reconcile the configmap fairly close to the trigger controller.

To Reproduce
This is case by case (which is causing the flakiness). Reach out to @yolocs for more details.

Exit
At least understand what's causing the delay.

@yolocs yolocs added kind/bug Something isn't working release/2 priority/1 Blocks current release defined by release/* label or blocks current milestone area/broker flaky-test labels Jul 13, 2020
@yolocs yolocs added this to the Backlog milestone Jul 13, 2020
@grantr
Copy link
Contributor

grantr commented Sep 16, 2020

This is probably fixed by #1705

@ian-mi
Copy link
Contributor

ian-mi commented Sep 16, 2020

I agree this should be fixed now. I'll go ahead and close this but please reopen if the issue persists.

@ian-mi ian-mi closed this as completed Sep 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/broker flaky-test kind/bug Something isn't working priority/1 Blocks current release defined by release/* label or blocks current milestone release/2 storypoint/5
Projects
None yet
Development

No branches or pull requests

3 participants