-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ILM Phase Execution on Index Count, Aggregate Size, or FIFO #47764
Comments
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
Just came here to post this exact issue. Index count based phases would be incredibly useful Index rollover happens on xGB, this is great. My indices are all always the same size. Setting retention based on days is causing all kinds of problems for me. |
Hi, has there been any progress on this issue? I'm curious why @jakelandis added the 'high hanging fruit' label, could someone elaborate on why this is difficult? It looks like there's been a couple of other issues posted relating to this as well #49392 #52308 |
@hamishforbes It is high-hanging fruit because of the architecture of ILM, which is oriented around managing a single index at a time, but the request here is to manage a group of indices (e.g., whose name share a common prefix). Since that's a fundamental rearchitecture/requires an investment in new infrastructure in the codebase, there isn't a quick win here. |
Ah I see, because an ILM policy can apply to multiple groups of indices. That makes sense, thanks for the insight! |
Just wanted to add a +1 to this. Defining an ILM policy solely on size is absolutely critical for inconsistent workloads. There are many examples/scenarios, but one I personally experience is how hard it is to size intake for network-related data. If I have one site that I can do a rough estimation, hardly additional sites will follow the same principles (number of people, type of traffic, site function (datacenter vs office) and many other factors). The possibility of FIFO would ease things even further. At this point I know of many deployments that still haven't found a balance between amount of data to keep and availability, so deployments are actually wasting resources but getting rid of data too soon with the fear of a sudden intake causing downtime. |
That’s interesting and definitely seems like a viable alternative! :)
Unfortunately, as a customer of Elastic Cloud, Curator is not an option
(unless I have it running somewhere else, which kind of defeats the purpose
of EC in the first place).
…On Fri, 24 Apr 2020 at 08:50, Hamish Forbes ***@***.***> wrote:
FWIW I have since disabled the delete phase in my ILM policy and switched
back to elasticsearch-curator using an index prefix and count for retention
Here'e a graph of % free space across my logging cluster, I don't think I
need to point out which day I made the switch on :)
[image: Screenshot 2020-04-24 at 08 47 32]
<https://user-images.githubusercontent.com/1282135/80187881-48f38280-8608-11ea-849d-45bb9a8f4b5f.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#47764 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFZM7CHOHH7UQDA6TK5DSTDROFADLANCNFSM4I6ZJUMQ>
.
|
+1 ILM needs to handle the overall index lifecycle instead of a single index at a time. Curator is an option, but just another workaround for functionality that should exist as a core component. |
At least, having a condition based on index count would be great, and not so complicated I think. |
hey! what is the state of this issue? curious to know and help if I can |
Apart from a "me too" on this request, let me also add my use case for this functionality. Similar, but slightly different. We have one collection of time based log index series that are important, there must always be space available in the cluster to ingest new logs for these series. There are also other less-important log index series in the cluster. I want to set a hard limit on the size of the less-important indices, to make sure that a bad-behaving less-important service can not fill up disk space and cause the important indices to go read-only. Number of indices in a series, or the cumulative size, in bytes or documents, of all indices in a series - any one would do. When this limit has been reached - let ILM execute an action like "delete the oldest index" or "reject writes to the indices". |
I'm getting started on ILM with our ECE deployment, and I was surprised to find that I am unable to phase change based on the number of indices sitting behind a data stream in Hot Tier. We have around 40 data streams in our legacy cluster which uses date-based suffixes on yearly, monthly, weekly and daily rotation strategies. I managed to classify these into 6 different ILM policies based on size of index for rollover and number of indices to keep in hot and frozen tiers. However, if I am limited by age, I will need to create 40 different ILM policies to get a similar effect. I am fine not basing retention on age, as the limiting factor of the cluster is the storage; using age to define retention seems short sighted when there are practical limits to RAM to Storage ratios on licensed capacity. By using size, we can predictably create safe limits for data streams that will stay within the storage confinements of our architecture. |
Pinging @elastic/es-data-management (Team:Data Management) |
There are some interesting use cases here for sure! I can see where it's not always about retention days because that does not typically answer how much data from a storage perspective is being retained. As a security engineer of data resources in the stack, I may have new log sources or ones that are unpredictable in data consumption. Because of this, I would like to set a max storage consumption on a data stream that does allow the oldest index or indices to be removed after the respective threshold in GB is reached. |
ILM phases outside of hot rely exclusively on
min_age
for execution. There is currently no way to execute phases on any other criteria, which leaves Elasticsearch susceptible to out-of-space emergencies when indexes increase slowly over time. Age-based execution may be advantageous to policy (keep abc logs for xyz months), but it is not useful for resource maximization (I want to use 90% of disk space).Executing phases based on the count of indexes or aggregate sizes promotes better resource usage. I’m more interested in keeping as many indexes as my infrastructure will allow. I see a few ways to achieve that.
Execute phases based on index count. This model would allow you to define fixed index counts within each policy. The advantage being that this is easy. For example: I’d like to rollover hot after 10GB, and keep 9 indexes in warm. This policy would never grow past 100GB.
Execute phases based on aggregate size. This model would allow you to define cumulative index sizes within a phase. The advantage being that this is also easy, but covers more corner cases than a simple count. For example: I’d like to rollover hot after 10GB or 2 days, and keep 90GB of indexes in warm. This policy would keep as much data as possible within the aggregate bounds defined. Perhaps the daily indexes grow to 10GB, but the weekend indexes grow to only 4GB, this would ensure you keep as much data in the policy as possible.
Execute phases based on FIFO. At a high level, remove the oldest indexes within the cluster on a first in first out basis. You define an operating threshold with a cluster and enforce a delete phase when you reach it. The advantage being that this is truly disaster-proof (i.e. no more
read_only_allow_delete
!!). For example: My 1TB cluster should remove the oldest index when my indexes use more than 90% disk space or 900GB.The text was updated successfully, but these errors were encountered: