Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event listener restarts when under minimal stress #1618

Open
ggallen opened this issue Jul 28, 2023 · 78 comments
Open

Event listener restarts when under minimal stress #1618

ggallen opened this issue Jul 28, 2023 · 78 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ggallen
Copy link

ggallen commented Jul 28, 2023

Expected Behavior

When under minimal stress, the event listener restarts. I was able to get the EL to restart by sending < 3K message to it using curl in 20 seconds.

Actual Behavior

An event listener should be able to handle thousands of messages per second.

Steps to Reproduce the Problem

  1. Configure and start an event listener
  2. Run the attached script with the attached JSON file ./send-cloudevent-forever.sh cloudevent5.json
  3. Watch for the event listener to restart
    send-cloudevent-forever.sh.txt
    cloudevent5.json.txt

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:58:16Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:25Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}


- Tekton Pipeline version:

  **Output of `tkn version` or `kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'`**
tekton-pipelines:  v0.47.2
tekton-triggers: v0.24.1
knative-eventing: v1.10.0


<!-- Any other additional information -->
@ggallen ggallen added the kind/bug Categorizes issue or PR as related to a bug. label Jul 28, 2023
@ggallen
Copy link
Author

ggallen commented Aug 2, 2023

I believe I have found the problem here.

I started of trying to debug by sending a single message to my broker. Surprisingly, this let to 101 events being sent to my EventListener. After turning on trigger debugging, I reran my test and again check the logs. After analyzing the messages, it broke down like this:

      1 Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1]
    254 Ce-Type:[dev.tekton.event.triggers.accepted.v1]

Further investigation reveals that this message is sent in sink.go (https://github.com/tektoncd/triggers/blob/main/pkg/sink/sink.go#L236-L255). If the message encoding is unknown, a simple HTTP response is sent with an "Accepted" status. However, if the encoding is known, the CloudEvent message "dev.tekton.event.triggers.accepted.v1" is sent. In my case it appears the encoding is known.

However, this appears to cause a furious loop. As the accepted event is sent, the broker accepts it and responds by forwarding it back to the EventListener. The EventListener receives the event, processes it, and since it has a known encoding it again responds by sending an accepted evetnt. There appears to be an infinite loop, and I'm not sure why it ever ends, but at some point it does. It could be because the EL gets so overwhelmed the eventually it can't respond in time and breaks the cycle.

To me this appears to be a flaw in the logic of sending an event in response to another event. There's probably a good reason to do so, but it's nothing that I am aware of. But my setup does not seem unusual so it seems like this could be a common problem for anyone who is sending events with a known encoding.

To test my theory, I commented out the sending of the accepted event and ran my curl test. The test ran for an hour, and it send 1.5 million events to the EL without a single restart. A check of the EventListener log reveals no timeout errors and no errors (other than the expected FailedPrecondition errors).

I'm not sure what the fix should be here. To me it would make sense for --el-events disable to also disable the sending of this event. Or maybe my configuration needs to change somehow, but it's unclear to me how to do this. It seems to be a flaw in the logic of responding to an event with another event, unless the EventListener can/should be configured to ignored responding to events that it actually sent. That may be a better solution, and maybe it should do that by default.

@khrm
Copy link
Contributor

khrm commented Aug 2, 2023

@afrittoli What do you think about this request? Is there a flaw in the logic of sending cloud-event responses to cloud-event requests?

@ggallen
Copy link
Author

ggallen commented Aug 2, 2023

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

@afrittoli
Copy link
Member

afrittoli commented Aug 2, 2023

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

Thanks @ggallen for your analysis of this.
To give some more context, Tekton Triggers can both consume and produce CloudEvents.

Producing CloudEvents

Tekton can produce CloudEvents to notify consumers about things that happen in Tekton, like a trigger was received, a pipeline was executed, etc. This functionality has existed in Tekton Pipelines for some time, and has recently been introduced into Tekton triggers as well.

Consuming CloudEvents

An event producer creates a CloudEvent, which can be sent directly to an event listener sink or sent to a CloudEvent broker and delivered to the event listener sink via a subscription.

In Tekton's own infra cluster, we use a Knative Eventing channel to route CloudEvents. The Knative subscription object sends the events to the event listener sink and expects a response. If the response is missing, or if the response is not a valid CloudEvent, it considers the message delivery to be failed and it attempts again. Tekton triggers added a CloudEvent response to CloudEvents triggers to address this issue.

Tekton must respond to an incoming CloudEvents, and that response must be in valid CloudEvents format according to the CloudEvents spec. Something is not working either in the cloudevent response or in the broker.


With that context, I would like to make sure I understand where the loop happens:

sequenceDiagram
    participant broker
    participant event-listener
    participant el-sink
    broker->>event-listener: sends a CloudEvent `com.redhat.agogos`
    event-listener ->>el-sink: produces a new CloudEvent `dev.tekton.event.triggers.started.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
Loading

If this is correct, it looks like the event broker does not recognise the response produced by the event listener as a response, instead, it interprets it as a new event to which it replies with a response to the event listener, which didn't expect a response and treats the incoming message as a new event and so on. I wonder how that is possible :S
I guess that the trigger configured filters by event type, or else the trigger would fire many times as well!

Do you have a trace of the content of the event listener response to the initial event?

@ggallen
Copy link
Author

ggallen commented Aug 2, 2023

@afrittoli, I think the loop happens between the broker and the event listener as you show in your diagram. The UIDs are changing, so I also think the broker must be interpreting the response as a new event.

But I don't know how the broker is supposed to determine that the new event is in fact a response and not a new event. Is there something missing in the response that is supposed to indicate that to the broker?

If you tell me how to get what you want for debugging information, I can get it for you. But this should be easy to reproduce.

@ggallen
Copy link
Author

ggallen commented Aug 2, 2023

@afrittoli, if it makes any difference, we are using the Knative MT broker.

@ggallen
Copy link
Author

ggallen commented Aug 7, 2023

@afrittoli ?

@khrm
Copy link
Contributor

khrm commented Aug 8, 2023

@afrittoli I think what @ggallen want is a receiving without a cloud event.

Sending a cloud event in response to a cloud event is a valid response as we can see from this receiver:
https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L48

What @gallen want is just a Result response (which consists of a status code and message string) to CloudEvent.
https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L43

Can't we just filter events by the source on the broker side?

@ggallen
Copy link
Author

ggallen commented Aug 8, 2023

Well, I don't really care what I get for a response, I just don't want all the messages. But I think there is a problem here. The question is where, and I see a couple of possibilities:

  1. The EL is sending a bad response back to the broker, causing the broker to relay the response back as a new message.
  2. The broker is misinterpreting a valid EL response as a new event, and responding accordingly.

I think the problem, wherever it is, should be fixed. It's possible that it's #2, and the options are to either get a fix to the MT broker or change to a different broker that handles things correctly.

I think the question is where is the actual problem? I don't have the knowledge to answer that question.

@khrm
Copy link
Contributor

khrm commented Aug 8, 2023

If we can't filter events by the source on the broker side, then we can probably add an annotation to EventListener to determine what to do for CloudEvents request - default - send Events or just status-code and msg.

@khrm
Copy link
Contributor

khrm commented Aug 8, 2023

@afrittoli If you are free, please clarify on how to proceed with this.

@afrittoli
Copy link
Member

Hello @khrm and @ggallen - sorry about the late reply.
Unfortunately, I've had no time to look further into this - and I'm now on PTO until September 😓

To summarise the issues as I understand them today:

  1. The original way the event listeners responded to a CloudEvent from the Knative Broker was not recognised by the Knative Broker as a valid CloudEvent response, which triggered retries and thus a lot of duplicated messages. The workaround for that was to set retries to 0 in the subscription. The fix was to change triggers responses to CloudEvents
  2. The current way the event listener responds to a CloudEvent is recognised by the Knative Broker as a new CloudEvent, so the Knative broker sends a reply (type dev.tekton.event.triggers.accepted.v1)
  3. The reply by the Knative broker is interpreted by Tekton triggers as a new event, etc.

In both issues (2) and (3), the response sent is not identified as a response, but rather as a new event, which triggers a response.

In both those cases, the issue can be:

  • either the message by the producer is not formatted according to spec, so the consumer does not read it as a response
  • or the consumer is parsing the response incorrectly, and taking it as a new message

Note that in (2) and (3) producer and consumers are inverted, so we could have two separate issues:

  • in (2) the producer of the response is tekton, consumer is knative
  • in (3) the producer of the response is knative, the consumer is tekton

The next steps would be to have examples of each of the responses produced by both systems, compare them with the specification and figure out whether the issue is the producer or consumer of the response.

I hope this helps move forward towards a solution.

@ggallen
Copy link
Author

ggallen commented Aug 15, 2023

I have gotten the Broker and EL messages. I chose three message/response pair as I think that's enough to get a picture of what's going on.

I also looked at the CDEvents documentation at https://github.com/cdevents/spec, but it wasn't helpful to someone with my level of knowledge about this topic. But I am wondering if it makes any difference that the initial event being sent to the EL from the Broker is not one of the predefined CD events.

Is that possibly the problem?

Here are the messages. Hopefully, this information can shed some light on what might be the problem.

Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.743Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: { \"name\": \"dummy-component-dep-2\", \"resource\": \"Component\", \"instance\": \"eea42e7e-4543-4309-bc6b-11cf35549575\" } and header: map[Accept-Encoding:[gzip] Ce-Id:[e3b1494d-05f1-45aa-8792-fcc3608fbf24] Ce-Knativearrivaltime:[2023-08-15T17:53:38.719745294Z] Ce-Source:[http://localhost] Ce-Specversion:[1.0] Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1] Content-Length:[123] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-9f3e62b83c92cacb-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.744Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n id: 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 54 50 98 54 101 50 98 98 45 97 54 56 102 45 52 100 99 56 45 98 55 56 98 45 57 102 98 57 49 98 54 98 99 98 56 98 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" }
Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Knativearrivaltime:[2023-08-15T17:53:38.749146357Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-88f37a21ff5b0be1-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170\n id: ab7642c2-0e65-463b-be45-6ff498034170\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 97 98 55 54 52 50 99 50 45 48 101 54 53 45 52 54 51 98 45 98 101 52 53 45 54 102 102 52 57 56 48 51 52 49 55 48 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" }
Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"ab7642c2-0e65-463b-be45-6ff498034170\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[ab7642c2-0e65-463b-be45-6ff498034170] Ce-Knativearrivaltime:[2023-08-15T17:53:38.752346245Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-71a8938ac1254cf6-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 919bcc93-8586-4d09-89e8-b6919a2bbc21\n id: 919bcc93-8586-4d09-89e8-b6919a2bbc21\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 57 49 57 98 99 99 57 51 45 56 53 56 54 45 52 100 48 57 45 56 57 101 56 45 98 54 57 49 57 97 50 98 98 99 50 49 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" }

@khrm
Copy link
Contributor

khrm commented Aug 16, 2023

That's fine. EL can receive any CE-Events. It will respond with Accepted Event which shouldn't come back.

@khrm
Copy link
Contributor

khrm commented Aug 16, 2023

@ggallen Can you test the broker with this sample? https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/samples/http/responder/main.go

If your broker while receiving the CE event sends back the event again, then it's broken. Otherwise, EL has a problem.

@ggallen
Copy link
Author

ggallen commented Aug 16, 2023

I don't understand what you want me to do. I can't run that code. Do you want me to send that message?

You can see from my example that the Broker ends up sending back the accepted message to the EL.

@khrm
Copy link
Contributor

khrm commented Aug 16, 2023

@ggallen , can you try setting filters like given here:
https://github.com/knative/eventing/blob/main/docs/broker/filtering.md ?

I wanted you to run something similar and send a message via a broker. Let me create a program.

@khrm
Copy link
Contributor

khrm commented Aug 16, 2023

@ggallen Please use this deployment file: https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

First, send a direct request after port-forwarding the service:

kubectl port-forward svc/ce-test 8080

Afterward, you can use the below commands to send a request:

cd cmd/client
go run main.go

Afterward, you can port-forward the MT broker endpoint (expose as port 8080) and send the request using the above command.

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

@khrm, something is not quite right, but here are the results. Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

Sending to the MT broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 08:05:25 Event sent at 2023-08-18 08:05:25.166677156 -0400 EDT m=+0.005350907
2023/08/18 08:05:25 Response status code 404: 

@khrm
Copy link
Contributor

khrm commented Aug 18, 2023

@ggallen Was message received at CE-Test service port? I don't think so.

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

Well, I can't get anything to run now. I see this in the events:

22s         Normal    Created             pod/ce-test-54c5f5f7-2n6k4        Created container ce-test
21s         Normal    Started             pod/ce-test-54c5f5f7-2n6k4        Started container ce-test
23s         Warning   Unhealthy           pod/ce-test-54c5f5f7-2n6k4        Liveness probe failed: HTTP probe failed with statuscode: 405
23s         Normal    Killing             pod/ce-test-54c5f5f7-2n6k4        Container ce-test failed liveness probe, will be restarted

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

@khrm, the ce-test pod keeps restarting.

[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS              RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     ContainerCreating   0          13s
el-agogos-65596c479b-lmhhc   1/1     Running             0          3m51s
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS    RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     Running   0          19s
el-agogos-65596c479b-lmhhc   1/1     Running   0          3m57s
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS             RESTARTS       AGE
ce-test-54c5f5f7-rl757       0/1     CrashLoopBackOff   11 (27s ago)   28m
el-agogos-65596c479b-lmhhc   1/1     Running            0              32m
[gallen@localhost ce-test] (master)$ 

In the output above, I haven't done anything but let time pass.

I am seeing the liveness probe fail as in the previous comment.

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

@khrm, I was able to finally send a message directly to the broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 10:54:51 Event sent at 2023-08-18 10:54:51.843636122 -0400 EDT m=+0.005916899
2023/08/18 10:54:51 Response status code 202: 

And what the EL shows in the log:

{"level":"debug","ts":"2023-08-18T14:57:25.966Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T14:57:25.967Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T14:57:25.968Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.971Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[8a01378a-4f68-4baf-8d10-93240543774d] Ce-Knativearrivaltime:[2023-08-18T15:01:18.939591943Z] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T15:01:18.933199105Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-e8655703484c759904c8dd719d58244c-f9e172fdf6b5ca8c-00] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000231657}

Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

And the EL log after that:

{"level":"debug","ts":"2023-08-18T15:02:21.767Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T15:02:21.768Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.770Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}

Nothing there, although maybe that is expected.

I think this is what you asked for. If you need anything else please let me know.

@khrm
Copy link
Contributor

khrm commented Aug 18, 2023

@ggallen Actually, I wanted you to use CE-Test in place of EventLisener. Put CE-Test in broker in place of EventLisener and send event via broker.

I assume at the moment when we send event to Broker, it goes to EL. Now we want to send event to Broker, it should go to CE-Test.

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

@khrm, I will try that. I'm afraid the problem is going to be that I can't get the log for CE-test, unless I get lucky, since it keeps restarting.

@ggallen
Copy link
Author

ggallen commented Aug 18, 2023

@khrm, I believe this is what you are asking for.

CE-Test OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

CE-Test LOG:

{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:04:53.229Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.231Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[233b6c63-0ddf-4fc9-b721-806f2911d4e4] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:16:28.87933523Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000258366}

BROKER OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "017b097c-4f3c-4b26-b97a-4ee12d31bfc2"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

BROKER LOG:

{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:18:50.630Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.633Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /gallen/agogos, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[9a3ea66f-d5f3-485d-8e9c-eed7e99a446b] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:21:05.842938993Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000326446}

@khrm
Copy link
Contributor

khrm commented Aug 18, 2023

No. I meant instead of broker sending request to EL, it should send request to CE-Test service. In the above, I can see log from EL.

@ggallen
Copy link
Author

ggallen commented Aug 20, 2023

@khrm, yes. I replaced the regular EL with the CE-test image.

@khrm
Copy link
Contributor

khrm commented Aug 21, 2023

@ggallen But the logs that you shared are from EL.

Can you use the updated deployment file from here:
https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

I removed probes and updated the image with one with logs. Now, you it shouldn't crash.

@khrm
Copy link
Contributor

khrm commented Sep 6, 2023

After this change,
ce-test and EventListener respectively:

[::1]:40490: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: be265890-fa18-482b-a79b-bbe2ef5107ed
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-06T15:19:01.089905807Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 31b
    Ce-Id: d3a75085-ec79-4c2c-b602-f5dbf99ff956
    Ce-Source: /test
    Ce-Specversion: 1.0
    Ce-Subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
    Ce-Time: 2023-09-06T15:19:01.094062725Z
    Ce-Type: samples.http.test
    Content-Length: 31
    Content-Type: application/json
    Date: Wed, 06 Sep 2023 15:19:01 GMT

    {
        "id": 0,
        "message": "test done!"
    }

[20:49:01.597][[::1]:40490] client disconnect
[20:49:01.598][[::1]:40490] server disconnect 127.0.0.1:8081
[20:49:14.859][[::1]:38932] client connect
[20:49:14.860][[::1]:38932] server connect 127.0.0.1:8081
[::1]:38932: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: d6b2eb54-7335-4f0d-8a4f-a05d70093d47
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-06T15:19:14.858389027Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 157b
    Ce-Id: c386e8b7-b485-414b-b559-8fb8b66b86ff
    Ce-Source: listener
    Ce-Specversion: 1.0
    Ce-Subject: default.listener accepted c386e8b7-b485-414b-b559-8fb8b66b86ff
    Ce-Type: dev.tekton.event.triggers.accepted.v1
    Content-Length: 157
    Content-Type: application/json
    Date: Wed, 06 Sep 2023 15:19:14 GMT

    {
        "eventID": "c386e8b7-b485-414b-b559-8fb8b66b86ff",
        "eventListener": "listener",
        "eventListenerUID": "99f270e1-8c7d-4224-b90d-4dba28617399",
        "namespace": "default"
    }

@ggallen
Copy link
Author

ggallen commented Sep 7, 2023

@khrm, what exactly are you asking me to do? Does this branch have a fix?

@afrittoli
Copy link
Member

@khrm, what exactly are you asking me to do? Does this branch have a fix?

This branch changes the response code from 202 to 200, which is the only difference in behaviour that we could observe between the test app and the event listener. At this stage, it's still a bit unclear to me whether the issue lies in the event listener or in the broker implementation, so it's difficult to build a reproducer test.

@ggallen
Copy link
Author

ggallen commented Sep 7, 2023

But I thought I already provided enough information so that the problem could be reproduced easily. Is that not the case?

@khrm
Copy link
Contributor

khrm commented Sep 8, 2023

@ggallen Unless we set up the broker, we can't debug this easily. Can you try using the above branch?

@khrm
Copy link
Contributor

khrm commented Sep 8, 2023

Or you can use the below release yaml:

kubectl apply -f 
https://gist.githubusercontent.com/khrm/5c38684e501e223ed252d0e56389324a/raw/2f3aa9f8a0fe5ca6c9b95452890a6e2e9ab1c44a/gistfile0.txt

kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/previous/v0.25.0/interceptors.yaml

It's based on the above branch.

@khrm
Copy link
Contributor

khrm commented Sep 8, 2023

@ggallen If the issue is the status code, that branch release yaml might fix it. That's the only difference between ce-test and EventListener. Can you please test it?

@ggallen
Copy link
Author

ggallen commented Sep 11, 2023

@khrm, I applied your patches to my development environment and ran an Agogos build as I did when I initially found this problem. I see thousands of messages in the log still, notably ones like this:

{
  "level": "debug",
  "ts": "2023-09-11T20:08:22.402Z",
  "logger": "eventlistener",
  "caller": "sink/sink.go:177",
  "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"a12a5403-ec16-4e45-b75a-dfa2b0f21e29\",\"eventID\":\"5ec30608-68e0-4561-9ef2-99b777bc5810\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[5ec30608-68e0-4561-9ef2-99b777bc5810] Ce-Knativearrivaltime:[2023-09-11T20:08:22.314804512Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted 5ec30608-68e0-4561-9ef2-99b777bc5810] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-2062e4977b673a86a10eb89c2df2b2ae-35bc5eb289dfdbce-00] User-Agent:[Go-http-client/1.1]]",
  "commit": "668863c",
  "eventlistener": "agogos",
  "namespace": "gallen",
  "/triggers-eventid": "b131f417-2fb1-4cf0-94e1-56dd21ba8160",
  "eventlistenerUID": "a12a5403-ec16-4e45-b75a-dfa2b0f21e29",
  "/triggers-eventid": "b131f417-2fb1-4cf0-94e1-56dd21ba8160"
}

I think the fact that I am still seeing messages of type dev.tekton.event.triggers.accepted.v1. I would not expect to see any of those if this problem was fixed.

@khrm
Copy link
Contributor

khrm commented Sep 12, 2023

I would publish a branch cloud events sdk. Wait.

@khrm
Copy link
Contributor

khrm commented Sep 25, 2023

@ggallen This doesn't look like the issue with EL.

I tested with ce-test and it has looping too. The only thing is that the CE-test gives 400 responses after a while and this breaks the loop.

@khrm
Copy link
Contributor

khrm commented Sep 25, 2023

A workaround for this:

kc edit configmap config-features -n knative-eventing

Enable the new filtering in the configmap.

  new-trigger-filters: enabled

In the Trigger, I put the following:

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  annotations:
    eventing.knative.dev/creator: minikube-user
    eventing.knative.dev/lastModifier: minikube-user
  labels:
    eventing.knative.dev/broker: agogos
  name: agogos
  namespace: tests
  resourceVersion: "26811"
  uid: 408f1b62-876f-42c4-bd5b-19e1afa0e015
spec:
  broker: agogos
  filter: {}
  filters:
  - not:
      exact:
        type: dev.tekton.event.triggers.accepted.v1
  subscriber:
    uri: http://el-agogos.tests.svc.cluster.local:8080

@ggallen
Copy link
Author

ggallen commented Sep 25, 2023

@khrm, are you saying there's not a problem here somewhere?

While I have no doubt your workaround will work, it's just masking the problem. Somewhere, something still has to process all those messages to filter out what isn't wanted. And whatever it is is still going to get thousands of messages it shouldn't get.

Actually, you're cutting alot of that off with the filter. But it's still masking what I think is a real problem, and will just happen with the next EL.

@khrm
Copy link
Contributor

khrm commented Sep 25, 2023

@ggallen There won't be thousand of messages if it's filter first time.

Even without EventLisener, if I don't use filter, it start looping.

@ggallen
Copy link
Author

ggallen commented Sep 25, 2023

@khrm,

There won't be thousand of messages if it's filter first time.

Agreed, but it will still mean I'm going to get double the number of messages. Every time an accept message is sent, I get a message back that has to be filtered.

And since I'm using CloudEvents, I have to put this workaround on every EL. Any users of our software also have to also put it on any ELs that they may create.

It may not be an EL problem, but it still seems like a problem. It seems like it is actually a serious problem, regardless of where the problem lies. Don't you agree?

@khrm
Copy link
Contributor

khrm commented Sep 25, 2023

@khrm,

Agreed, but it will still mean I'm going to get double the number of messages. Every time an accept message is sent, I get a message back that has to be filtered.

Yes. That's true but that's how cloudevents work.

You send an event and you get an event back as response. Now you filter that response.

@afrittoli What could be the solution for this? Should we not send back the event for Cloudevents request? Or have a flag or annotation somewhere to disable sending back cloudevent as response?

@ggallen
Copy link
Author

ggallen commented Sep 25, 2023

That's true but that's how cloudevents work.

I can't believe this is how it's supposed to work. At least it's not how any other messaging system I have worked with works. Sending an ack should not result in an ack of the ack.

You send an event and you get an event back as a response. Now you filter that response.

If this is, in fact, true, even for an ack, then the EL should automatically filter all of those messages.

@khrm
Copy link
Contributor

khrm commented Sep 26, 2023

@ggallen It's not an ack only. It also contains information regarding what was event id generated during the processing.

	body := Response{
		EventListener:    r.EventListenerName,
		EventListenerUID: elUID,
		Namespace:        r.EventListenerNamespace,
		EventID:          eventID,
	}

@khrm
Copy link
Contributor

khrm commented Sep 26, 2023

@ggallen Can you use cloud event extension? A header with something like NO_ACK? Then, EventListener won't send back a response. Just a basic accepted response.
https://github.com/cloudevents/spec/blob/main/cloudevents/primer.md#cloudevent-extension-attributes

@afrittoli Is this the correct way to go about this?

@khrm
Copy link
Contributor

khrm commented Sep 26, 2023

It would be like a cloudevent extension.

@khrm
Copy link
Contributor

khrm commented Sep 26, 2023

So after looking into cloud event spec and SDK.
You can SetExtension.

And EventListener can GetExtension

Based on the extension value, EventListener can determine whether to send an event or not. We can name the extension as ack. If it's false, we won't send a cloudevent as a response. If it's true or not present, we will send an event.

@gvarsamis
Copy link

gvarsamis commented Sep 27, 2023

@afrittoli Can you please confirm whether the extension route is the way to go here as the correct pattern to use and that the EL is intentionally sending 2 ACKS. One for the message and another ACK for the ACK.
We are also a bit concerned with using an extension as according to the primer, they could get pulled at any time and this will create technical debt for us and we'd have to make changes to our codebase if this happens, so we'd rather adopt the correct usage pattern than use an extension if they are not the correct pattern.

@afrittoli
Copy link
Member

@khrm @gvarsamis Reading through the thread, my impression is that the event listener is behaving according to specification (perhaps with the exception of the status code 202 instead of 200).

The broker and even listener communicate through the CloudEvents HTTP Binding - which means communication follows a request/response pattern. The CloudEvent binding requires the HTTP client to produce a response that is valid according to the CloudEvents protocol - that means, if a payload is provided in the response, it must be a valid CloudEvent.

In the past, the event listener use to respond with a JSON payload - not in CloudEvents format - which caused issues with the broker - see #1439 for more context. The fix for that has been to format the previous JSON payload as a CloudEvent.

At this stage my perception is that the issue is no the broker side - it should not reply to an HTTP response with a new request. Once that HTTP request is sent to the event listener, the event listener replies to it as it should with an HTTP response. From an event listener point of view there is no connection between the first message from the broker and the second one.

What we could do is to reach out to the knative/cloudevents communities to clarify this behaviour with them.
In the meanwhile I see two options:

  • apply the filtering logic that @khrm suggested on the broker side
  • add an option in the event listener to respond to CloudEvents with an empty payload - but we would need to verify if that actually helps

@ggallen
Copy link
Author

ggallen commented Sep 27, 2023

@afrittoli, thanks for the information. But do you agree that this is not expected behavior? There is a bug here somewhere? I agree it must the the broker if the EL is responding as it should. So it seems the issue has been narrowed to the broker.

I would like to get the bug fixed as it seems like this could be an issue for anyone using CloudEvents.

We are currently using the in-memory channel. Would changing to Apache Kafka potentially solve this problem? Or is it likely more of a problem with the broker and not the channel?

I'm OK with the workaround, but would really like to get a solution. The problem with the workaround is that it will work for our one Trigger, but any user of our system who creates their own Trigger would not know to create this filter.

@khrm , could you package up the small test case, create an issue for the broker, and see what they say? Please include me on it so I can track its progress.

@khrm
Copy link
Contributor

khrm commented Sep 28, 2023

Sure. I would create a test case to reproduce this without EL.

@ggallen
Copy link
Author

ggallen commented Oct 4, 2023

@khrm, did you create an issue for this yet?

@khrm
Copy link
Contributor

khrm commented Oct 5, 2023

@ggallen No. I didn't get time. We released Operator this week and are having f2f.

@khrm
Copy link
Contributor

khrm commented Oct 12, 2023

@ggallen I have created the issue here: knative/eventing#7356

@khrm
Copy link
Contributor

khrm commented Oct 12, 2023

@ggallen I think you can contact people from the Red Hat OpenShift Serverless team regarding Knative Eventing.

Or some other upstream maintainers.

@khrm khrm removed this from the Triggers v0.26 milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants