Split large batches of documents if received 413 from Elasticsearch #29778

rdner · 2022-01-10T15:49:44Z

Describe the enhancement:

Currently, after seeing a 413 response from Elasticsearch the whole batch is dropped and the error is logged (#29368). Some of our customers would like to preserve at least some data from the batch instead of discarding the whole batch.

The proposal is:

When seeing a 413 response from Elasticsearch try to split the current batch (maybe in 2, or based on size of each document in the batch)
If the 413 response is seen again – repeat the process until:
- either all the smaller (split) batches are successfully sent to Elasticsearch or
- the initial batch is reduced to a single document that cannot pass the http.max_content_length threshold in Elasticsearch
If a batch contain only a single document that cannot be uploaded – drop the batch

Something similar was done in this PR logstash-plugins/logstash-output-elasticsearch#497

Please ensure that each of these actions are logged, in particular:

when the batch is dropped, please state in the log for info:

inform the smaller batch was dropped
how many iterations it took to reduce the size (if this is possible)
Any info from the batch that was dropped (ideally if we know what application or integration)

When the batch is being cut to size:

what is current size, and what is it being split into
what is the current configured max_bulk_size

Describe a specific use case for the enhancement or feature:

Some of our clients are more sensitive to data loss than others and this enhancement would allow to preserve more data in case of misconfiguration of http.max_content_length in Elasticsearch or bulk_max_size in beats. This would improve the situation in most of the cases but it would not completely solve the data loss problem.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-01-10T15:52:37Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

jlind23 · 2022-01-11T13:44:19Z

ping @nimarezainia for prioritization and awareness

nimarezainia · 2022-01-12T15:44:21Z

@rdner this sounds like the right thing to do and makes our product more robust. What would be the level of effort involved in getting this done?

Ideally the buffers would match dynamically so we wouldn't hit these issues but I know that is near impossible.

Could you please ensure that each of these actions are logged, in particular:

when the batch is dropped, please state in the log for info:

inform the smaller batch was dropped
how many iterations it took to reduce the size (if this is possible)
Any info from the batch that was dropped (ideally if we know what application or integration)

When the batch is being cut to size:

what is current size, and what is it being split into
what is the current configured max_bulk_size

rdner · 2022-01-17T09:01:43Z

@nimarezainia
We sure need to log every single step, I totally agree with that, I copied it into the description.

Regarding the estimation of effort, I'm quite new to the project, so it's hard for me to give a precise estimation on the effort, I would ask @faec for help here since we already touched on this topic once.

@cmacknz it might be worth considering to introduce this kind of behaviour into the shippers design too.

jlind23 · 2022-03-21T08:54:17Z

@cmacknz Should I "close" this one and focus on the shippers as you have already included it in the V2 implementation?

cmacknz · 2022-03-21T13:20:35Z

Let's keep the issue as it is a good description of the work to do. We could remove the release target and labels though. This will happen as part of the shipper work at some to be determined point in the future.

Foxboron · 2022-03-30T09:09:15Z

Is this not going to be fixed in the current beat implementation?

jlind23 · 2022-03-30T11:26:51Z

@Foxboron Even if here we talk about fixing it in the shippers it doesn't mean that it will not be fixed in standalone beats.
The work that will be done under the shipper project but doesn't exclude beats fix.

cmacknz · 2022-10-20T16:29:34Z

The current plan is to address this in Beats so the fix is available sooner, and then port it into shipper afterwards so we aren't tied to date when the shipper is ready to be released.

amitkanfer · 2023-01-05T10:08:14Z

Ideally the buffers would match dynamically so we wouldn't hit these issues but I know that is near impossible.

@nimarezainia why is it near impossible? is it because Agent / beat can send to multiple ES clusters with different size limits? I agree with the conclusion, just want to make sure i understand all the reasons for it.

nimarezainia · 2023-02-15T04:54:47Z

Ideally the buffers would match dynamically so we wouldn't hit these issues but I know that is near impossible.

@nimarezainia why is it near impossible? is it because Agent / beat can send to multiple ES clusters with different size limits? I agree with the conclusion, just want to make sure i understand all the reasons for it.

I believe I was told that the ES buffer size is not known to us. this may have changed. if there was an API for us to read that, perhaps our output can be set to match, minimizing drops. perhaps things have changed since that comment.

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 10, 2022

rdner added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 10, 2022

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 10, 2022

nimarezainia added 8.2-candidate 8.3-candidate and removed 8.2-candidate labels Jan 12, 2022

nimarezainia added 8.2-candidate and removed 8.3-candidate labels Feb 8, 2022

jlind23 added 8.3-candidate 8.2-candidate and removed 8.2-candidate 8.3-candidate labels Feb 9, 2022

cmacknz added v8.2.0 and removed 8.2-candidate labels Feb 10, 2022

cmacknz mentioned this issue Mar 18, 2022

[Meta][Project] Implement the Elastic Agent Data Shipper elastic/elastic-agent-shipper#3

Closed

3 tasks

jlind23 removed the v8.2.0 label Mar 21, 2022

cmacknz mentioned this issue Mar 22, 2022

[Meta] Elastic Agent Shipper Project elastic/elastic-agent-shipper#16

Open

100 tasks

jlind23 changed the title ~~Split large batches of documents if received 413 from Elasticsearch~~ [Design]Split large batches of documents if received 413 from Elasticsearch Jul 5, 2022

jlind23 added the 8.5-candidate label Jul 5, 2022

jlind23 assigned faec Jul 6, 2022

jlind23 added v8.5.0 and removed 8.5-candidate labels Jul 7, 2022

faec added the estimation:Week Task that represents a week of work. label Jul 14, 2022

jlind23 changed the title ~~[Design]Split large batches of documents if received 413 from Elasticsearch~~ Split large batches of documents if received 413 from Elasticsearch Jul 20, 2022

cmacknz mentioned this issue Jul 20, 2022

Implement an MVP of the Elasticsearch output elastic/elastic-agent-shipper#10

Closed

cmacknz added 8.6-candidate v8.6.0 and removed v8.5.0 labels Sep 14, 2022

faec mentioned this issue Feb 28, 2023

Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

Closed

cmacknz mentioned this issue Mar 1, 2023

Shipper output fails on large events/batches #34695

Closed

faec mentioned this issue Mar 7, 2023

Detect grpc size limit error to avoid infinite retry #34762

Merged

6 tasks

faec mentioned this issue Mar 23, 2023

Split large batches on error instead of dropping them #34911

Merged

6 tasks

faec closed this as completed in #34911 Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split large batches of documents if received 413 from Elasticsearch #29778

Split large batches of documents if received 413 from Elasticsearch #29778

rdner commented Jan 10, 2022 •

edited

Loading

elasticmachine commented Jan 10, 2022

jlind23 commented Jan 11, 2022

nimarezainia commented Jan 12, 2022

rdner commented Jan 17, 2022

jlind23 commented Mar 21, 2022

cmacknz commented Mar 21, 2022

Foxboron commented Mar 30, 2022

jlind23 commented Mar 30, 2022

cmacknz commented Oct 20, 2022

amitkanfer commented Jan 5, 2023

nimarezainia commented Feb 15, 2023

Split large batches of documents if received 413 from Elasticsearch #29778

Split large batches of documents if received 413 from Elasticsearch #29778

Comments

rdner commented Jan 10, 2022 • edited Loading

elasticmachine commented Jan 10, 2022

jlind23 commented Jan 11, 2022

nimarezainia commented Jan 12, 2022

rdner commented Jan 17, 2022

jlind23 commented Mar 21, 2022

cmacknz commented Mar 21, 2022

Foxboron commented Mar 30, 2022

jlind23 commented Mar 30, 2022

cmacknz commented Oct 20, 2022

amitkanfer commented Jan 5, 2023

nimarezainia commented Feb 15, 2023

rdner commented Jan 10, 2022 •

edited

Loading