Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _reindex to ILM #42784

Open
kobybr opened this issue Jun 2, 2019 · 8 comments
Open

Add _reindex to ILM #42784

kobybr opened this issue Jun 2, 2019 · 8 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team

Comments

@kobybr
Copy link

kobybr commented Jun 2, 2019

Add _reindex to ILM policy. Would be beneficial to be able to merge daily indexes into weekly/monthly/yearly indexes in the warm and cold phases.

@martijnvg
Copy link
Member

I think rollups may be more appropriate here to merge daily indices into weekly/monthly/yearly indices. However integration between ILM and Rollup doesn't exist today.

@martijnvg martijnvg added :Data Management/ILM+SLM Index and Snapshot lifecycle management team-discuss labels Jun 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@kobybr
Copy link
Author

kobybr commented Jun 3, 2019

Rollups would be appropriate for those who do not care about losing granularity. There is still a need to merge multiple indexes in their entirety into on single index, which I believe can only be done with reindex.

@gwbrown
Copy link
Contributor

gwbrown commented Jun 3, 2019

If you're already using ILM, why not adjust your ILM policy (and index template if you need to change the number of shards) to roll over at the desired interval/size from the get-go, rather than reindexing into merged indices later? Is there something that's stopping you from doing that?

I hope this doesn't sound dismissive, I'm just trying to better understand the use case for this feature.

@kobybr
Copy link
Author

kobybr commented Jun 4, 2019

The use case is having a hot-warm cluster architecture with time-based indexes. Only interested in keeping several daily indexes on the hot nodes. When looking to keep, for example, 60 days worth or data on a warm node; it is better to have 8 weekly indexes versus 60 individual daily indexes - so rollover is not really an option. What I'm doing is not unique. I'm excited to see ILM incorporated into the elasticsearch core, but disappointed that it is missing one of the more useful features found in curator.

@gwbrown
Copy link
Contributor

gwbrown commented Jun 4, 2019

Thank you! That's very helpful in understanding your use case. We're taking a look at what's next for ILM right now and feedback like this is very useful.

@gwbrown
Copy link
Contributor

gwbrown commented Jun 27, 2019

We discussed this in the weekly core/features sync.

This feature would be useful for not only the case described here, but would also enable use cases which reduce the number of fields in an index to save storage space, or allow for sampling by only keeping a percentage of the documents in the index.

However, there are some significant concerns as well:

  • If the cluster (or even a single node) is shut down or restarted, this can interrupt the reindex which would make it impossible to continue with the lifecycle policy safely without administrator intervention. There is ongoing work to make reindexing more resilient, however (Reindex resiliency #42612)
  • Reindexing can be a very resource-intensive operation, and currently ILM has no means of restricting when reindexing would occur, which could lead to ILM starting a reindex during a period of high load and causing serious performance issues. There is planned work to improve this and limit ILM actions to certain windows, however (Add the ability to specify when ILM step execution should occur #37325)
  • There are some questions around the destination index - should it be considered an extension of the source index in some way, similar to shrink? Or a completely different index? The first option works better for the field reduction/sampling use cases and would make it easier to use, but the latter would make more sense for the use case described above of consolidating many smaller periodic indices into a single larger one. Both options have significant downsides.
  • This is more philosophical, but this may not be in line with the current philosophy of ILM: it relates more to managing the lifecycle of the data in the index, rather than just managing the lifecycle of the index as a whole. This may or may not be a reason to consider a different way of going about this, but at least warrants further discussion.

Given those complexities, we're not necessarily opposed to implementing this, but there are improvements to ILM which take priority over this which we're going to work on first. We'll re-evaluate this feature once the work on reindexing resiliency and ILM scheduling is further along and we're more certain this is something we could offer safely.

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@dcolazin
Copy link

Bumping this as it is quite an interesting feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

6 participants