-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk request retry silently drops pipeline when retrying individual response that failed #78792
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
This looks REALLY suspicious:
If the index request is the same object as is in the danged bulk request, no wonder the pipeline goes away, we set it to NOOP. |
@benwtrent - can you retry your test with a default pipeline and/or a final pipeline ? I am curious if that could be a work around. |
I definitely can tomorrow. Silently ignoring something is pretty bad |
Agreed. However, I think the only way an individual bulk item can return a 429 is if the ingest processor does this and I think enrich is the only processor that can do this when ingest rate dramatically outpaces the ablity to execute the underlying search for enrichment. In this case it would have to do this via reindex workflow (I can not think of other workflows that would re-use this an internal client). Normally a 429 would apply to entire bulk, and unsure if this bug presents in that case (if so, that is worse). |
Both |
I would also expect us to return 429 on individual items if (edit: actually not sure if the coordinating node can reject individual items, but definitely the primary can) |
@jakelandis enrich isn't the only processor (at least soon) that throws a 429 on individual requests: #78757 |
Thanks for the additional context and testing. This really helps to evaluate the impact of the bug. |
@jakelandis Any idea if this bug can be prioritised for 8.0? This becomes a pretty big failure for reindexing with inference ingest processors that can sometimes timeout due to resource (CPU) constraints. It's a scenario we'd like to have a good story around for GA with the new NLP/PyTorch model inference feature. |
Elasticsearch version (
bin/elasticsearch --version
):8.0.0 (probably earlier)
Description of the problem including expected versus actual behavior:
When an individual bulk action fails due to 429 it should still use the original indexing pipeline on retry.
Instead, the pipeline (at least during
reindex
) is dropped from the re-created bulk request.Steps to reproduce:
I have a local WIP processor that is throwing a 429 manually to recreate.
Provide logs (if relevant):
I added a bunch of logging lines (and the pipeline attached to the index request in the output) and here is what I got:
You can see from the first line
[V_JiV3wBwot2XBnG8O0d][classification]
pipelineclassification
was used.Then in the second line (this method, I log the
currentBulkRequest
), the pipeline from the request is gone[V_JiV3wBwot2XBnG8O0d][_none]
elasticsearch/server/src/main/java/org/elasticsearch/action/bulk/Retry.java
Line 130 in 68817d7
The text was updated successfully, but these errors were encountered: