Implement the shipper's gRPC API #34

cmacknz · 2022-05-11T15:42:31Z

rdner · 2022-07-12T15:55:30Z

@faec I have some questions to ask regarding the last shipper API change:

This are statements to double check that I understood the docs correctly:

accepted_count (in Shipper.Publish) is how many events from a single request went through queue.Publish without errors
accepted_index is a sum of all accepted_count values during all shipper publish requests
According to the queue Metrics we can calculate the following persisted_index = accepted_index - Metrics.UnackedConsumedEvents. Currently I run a background go routine that polls the queue metrics with an interval and updates the persisted_index with this formula.

Q1: I faced one edge-case I'm not sure about: what to do on the server when we failed to enqueue a single event? I'm talking about the time when Queue.Publish returns an error which is not queue.ErrQueueIsFull. Seems like I cannot return this error to the client since it would lead to data duplication. I think we should just log the error, stop processing the request and accept all the previously published events from this request. For example, we have 20 events, on 13th we have an error from Queue.Publish, so we accept 12 (AcceptedCount=12) and return the PublishReply normally.

Q2: What's the idea behind having PersistedIndex endpoint as a stream? Should I be pushing persisted_index updates every time this value is changed?

faec · 2022-07-12T20:42:04Z

Q1: A full queue is the only way an event can fail to enqueue -- the queue API itself (e.g. Publish) doesn't return an error. We could imagine other error conditions, e.g. in the disk queue the input event might fail to serialize, but that doesn't arise until after the client's Publish request has already returned. So, the client calling Publish should never get an error other than queue-full once it's gotten as far as adding something to the queue. Any errors after that stage should be reported internally within the shipper, not to the originating client.

Q2: That's probably an oversight -- I didn't intentionally make those different, and didn't realize what the stream qualifier did, so that was likely left over from the previous version of the call and can be removed.

3 doesn't look right, though -- UnackedConsumedEvents gives the number of events that have been read by a consumer but not yet acknowledged. This means that if a queue has 1000 events, and none of them have been read by a consumer (output worker) yet, then UnackedConsumedEvents will be 0, but this doesn't mean persisted_index should be 1000 (if it's the memory queue then none of those events have been persisted). I'm not sure I understand the need here, though -- why is there a background go routine to poll this value?

rdner · 2022-07-13T09:37:12Z

@faec

Regarding Q1 – that's what I also thought.

Q2: I've already implemented the streaming API there, after a closer look I think it can be beneficial for the clients. They can just subscribe to persistedIndex changes and listen until persistedIndex >= the last acceptedIndex they received and then ack their current batch and advance. I think from the consumer perspective it might be a good experience.

Regarding 3:

I think I'm missing something here, how does one get the persisted_index value then? I see no current way in the queue to get this information, therefore I started polling the queue metrics in intervals and updating the index in the background.

See #76 for more context.

faec · 2022-07-13T11:56:16Z

Hmm, you're right, I could see it making sense for the persisted index call to be a stream, let's try it that way.

The actual value of persisted_index requires support from the queue itself, which I'm working on -- I'll add you on the PR as soon as it's ready.

rdner · 2022-08-09T14:41:07Z

As we discussed face to face with @cmacknz, we can remove #63 from the acceptance criteria check list and leave it for future research.

cmacknz · 2022-08-24T15:26:08Z

Removed #97 as it doesn't block completion of this issue.

cmacknz · 2022-09-14T14:53:25Z

Closing this one, testing tracked as part of the 8.5 milestone.

cmacknz added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.4.0 labels May 11, 2022

cmacknz assigned rdner May 11, 2022

This was referenced May 11, 2022

[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

Closed

[Meta][Feature] Implement the memory queue and output pipeline #7

Closed

This was referenced Jun 21, 2022

Add a basic shipper integration test. #60

Closed

Add integration tests between the Beats shipper client and the shipper server elastic/beats#33205

Closed

rdner mentioned this issue Jul 13, 2022

Implement the shipper gRPC server #76

Merged

jlind23 added v8.5.0 and removed v8.4.0 labels Aug 22, 2022

cmacknz mentioned this issue Aug 22, 2022

[Meta] Shipper 8.5 - Experimental integration with Filebeat and Metricbeat #15

Closed

29 tasks

rdner assigned leehinman Sep 6, 2022

cmacknz added v8.6.0 8.6-candidate labels Sep 14, 2022

cmacknz closed this as completed Sep 14, 2022

leehinman mentioned this issue Sep 22, 2022

Build Tool to performance test shipper with & without disk queue #124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the shipper's gRPC API #34

Implement the shipper's gRPC API #34

cmacknz commented May 11, 2022 •

edited

Loading

rdner commented Jul 12, 2022

faec commented Jul 12, 2022

rdner commented Jul 13, 2022 •

edited

Loading

faec commented Jul 13, 2022

rdner commented Aug 9, 2022 •

edited

Loading

cmacknz commented Aug 24, 2022

cmacknz commented Sep 14, 2022

Implement the shipper's gRPC API #34

Implement the shipper's gRPC API #34

Comments

cmacknz commented May 11, 2022 • edited Loading

rdner commented Jul 12, 2022

faec commented Jul 12, 2022

rdner commented Jul 13, 2022 • edited Loading

faec commented Jul 13, 2022

rdner commented Aug 9, 2022 • edited Loading

cmacknz commented Aug 24, 2022

cmacknz commented Sep 14, 2022

cmacknz commented May 11, 2022 •

edited

Loading

rdner commented Jul 13, 2022 •

edited

Loading

rdner commented Aug 9, 2022 •

edited

Loading