Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail put-mapping requests sooner if they will exceed the field number limit #35564

Closed
ppf2 opened this issue Nov 14, 2018 · 7 comments · Fixed by #48817
Closed

Fail put-mapping requests sooner if they will exceed the field number limit #35564

ppf2 opened this issue Nov 14, 2018 · 7 comments · Fixed by #48817
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.

Comments

@ppf2
Copy link
Member

ppf2 commented Nov 14, 2018

6.4.1

We had a scenario when an index was already hitting the field number limit (index.mapping.total_fields.limit) and subsequent (high volume) indexing requests attempted to add new fields to this index. As a result, a lot of put_mapping tasks got generated. This caused the cluster
state to be held on in memory and became non-GC-able until these mapping updates eventually got rejected (and the coordinating node ran out of memory).

image

This is an enhancement request to handle this situation better. Is this something the real memory circuit breaker in 7.0 will help with?

@ppf2 ppf2 added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Nov 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@dakrone
Copy link
Member

dakrone commented Nov 14, 2018

This is an enhancement request to handle this situation better. Is this something the real memory circuit breaker in 7.0 will help with?

I think we should try to address the root cause, if possible, it'd be nice if we could check the limit for mappings prior to a put-mapping request being sent to the master. For instance, if the local node's cluster state contains over 1000 fields in the mapping (with the default limit being 1000), we know that even if the cluster state is behind the number of fields cannot decrease, so no need to send an update mapping request to the master node. The request can be rejected without overloading any other node.

@Bukhtawar
Copy link
Contributor

Hi @dakrone
Based on my understanding theTransportPutMappingAction is handled by the master which only checks for block and then goes ahead submitting a cluster state update task. Do you think it makes sense to reject it at master but before submitting the cluster state update tasks just as we check for blocks. I believe since the update task is serialized on the master and put-mapping has a priority HIGH, processing gets significantly delayed by the PutMappingExecutor(espl in cases when there are pending tasks with priority URGENT) allowing the heap build up. This would help even in cases where the local cluster state was lagging unaware of field limit breach.

@Bukhtawar
Copy link
Contributor

Hey @dakrone, I'll be more than happy to work on this PR. Please share your thoughts on the same

@Bukhtawar
Copy link
Contributor

@ppf2 @dakrone any thoughts on this?

@DaveCTurner DaveCTurner changed the title Coordinating node can run out of memory due to failed put mappings call Fail put-mapping requests sooner if they will exceed the field number limit May 12, 2019
@DaveCTurner
Copy link
Contributor

I think that the coordinating node no longer runs out of memory due to failed put-mappings calls in versions ≥7.0, so I have updated the title of this issue to reflect the remaining work mentioned in this comment.

@DaveCTurner DaveCTurner added the :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label May 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@danielmitterdorfer danielmitterdorfer removed the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Sep 6, 2019
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 1, 2019
Today if the primary discovers that an indexing request needs a mapping update
then it will send it to the master for validation and processing. If, however,
the put-mapping request is invalid then the master still processes it as a
(no-op) cluster state update. When there are a large number of indexing
operations that result in invalid mapping updates this can overwhelm the
master.

However, the primary already has a reasonably up-to-date mapping against which
it can check the (approximate) validity of the put-mapping request before
sending it to the master. For instance it is not possible to remove fields in a
mapping update, so if the primary detects that a mapping update will exceed the
fields limit then it can reject it itself and avoid bothering the master.

This commit adds a pre-flight check to the mapping update path so that the
primary can discard obviously-invalid put-mapping requests itself.

Fixes elastic#35564
DaveCTurner added a commit that referenced this issue Nov 5, 2019
Today if the primary discovers that an indexing request needs a mapping update
then it will send it to the master for validation and processing. If, however,
the put-mapping request is invalid then the master still processes it as a
(no-op) cluster state update. When there are a large number of indexing
operations that result in invalid mapping updates this can overwhelm the
master.

However, the primary already has a reasonably up-to-date mapping against which
it can check the (approximate) validity of the put-mapping request before
sending it to the master. For instance it is not possible to remove fields in a
mapping update, so if the primary detects that a mapping update will exceed the
fields limit then it can reject it itself and avoid bothering the master.

This commit adds a pre-flight check to the mapping update path so that the
primary can discard obviously-invalid put-mapping requests itself.

Fixes #35564
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 5, 2019
Today if the primary discovers that an indexing request needs a mapping update
then it will send it to the master for validation and processing. If, however,
the put-mapping request is invalid then the master still processes it as a
(no-op) cluster state update. When there are a large number of indexing
operations that result in invalid mapping updates this can overwhelm the
master.

However, the primary already has a reasonably up-to-date mapping against which
it can check the (approximate) validity of the put-mapping request before
sending it to the master. For instance it is not possible to remove fields in a
mapping update, so if the primary detects that a mapping update will exceed the
fields limit then it can reject it itself and avoid bothering the master.

This commit adds a pre-flight check to the mapping update path so that the
primary can discard obviously-invalid put-mapping requests itself.

Fixes elastic#35564
DaveCTurner added a commit that referenced this issue Nov 5, 2019
Today if the primary discovers that an indexing request needs a mapping update
then it will send it to the master for validation and processing. If, however,
the put-mapping request is invalid then the master still processes it as a
(no-op) cluster state update. When there are a large number of indexing
operations that result in invalid mapping updates this can overwhelm the
master.

However, the primary already has a reasonably up-to-date mapping against which
it can check the (approximate) validity of the put-mapping request before
sending it to the master. For instance it is not possible to remove fields in a
mapping update, so if the primary detects that a mapping update will exceed the
fields limit then it can reject it itself and avoid bothering the master.

This commit adds a pre-flight check to the mapping update path so that the
primary can discard obviously-invalid put-mapping requests itself.

Fixes #35564
Backport of #48817
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants