Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ILM action to add/remove aliases #47881

Open
gwbrown opened this issue Oct 10, 2019 · 19 comments
Open

Add ILM action to add/remove aliases #47881

gwbrown opened this issue Oct 10, 2019 · 19 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team

Comments

@gwbrown
Copy link
Contributor

gwbrown commented Oct 10, 2019

Given that the alias used for set of indices managed for ILM includes all the indices (assuming usage of the is_write_index flag), which may include frozen indices, or indices on nodes with very slow disks, it may be useful to maintain an alias which queries only the non-frozen indices.

We could add an ILM action to add and remove aliases at a certain point in the policy. For example, a policy with this action might look like the following:

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_size": "50G"
          }
        }
      },
      "cold": {
          "min_age": "30d",
          "actions": {
              "aliases": {
                  "remove": ["my-ilm-alias"],
                  "add": ["my-ilm-alias-cold"]
              },
              "freeze": {}
          }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

This would effectively just make a call to the Index Aliases API to add and remove the indices from the index as appropriate.

@gwbrown gwbrown added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management team-discuss labels Oct 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@matt-davis-elastic
Copy link
Contributor

matt-davis-elastic commented Dec 5, 2019

Currently, we are exploring changes to the way Aliases are handled and that could lead to some difficulty here. We don't currently have any users asking for this feature. I will keep an eye on this to see if anything comes up.

@iorfix
Copy link

iorfix commented Apr 14, 2020

I think it would be a very useful feature.
I made a lot of queries on very fresh data (alerting and observability), and it would be useful to have aliases pointing only at hot data

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@CamiloSierraH
Copy link

+1

@swiftmas
Copy link

swiftmas commented Nov 5, 2020

This would be a hugely useful feature. In order to cut down full cluster searches and drop response times we use an alias on our "recent" data. This limits the number of indexes which are being queried and has greatly dropped our response times. once 90 days pass we want to be able to remove the alias or set a new one. This could also possibly be solved by being able to query only nodes in a specific ilm state (hot, warm, cold... etc) but aliases are the most versatile option.

@raybog
Copy link

raybog commented Nov 17, 2020

+1

@jugsofbeer
Copy link

I've requested this functionality before, we would definitely use it on a very large dataset.

Almost would like the ability to run custom actions in a phase if the ILM app functionality doesnt immediately cater to something.

@yobosov
Copy link

yobosov commented Jan 14, 2021

+1

2 similar comments
@trueserjo
Copy link

+1

@Yan-Son
Copy link

Yan-Son commented Jan 18, 2021

+1

@lvoff
Copy link

lvoff commented Jan 18, 2021

This feature is very needed!

@aaminin
Copy link

aaminin commented Jan 19, 2021

+1

@dakrone
Copy link
Member

dakrone commented Jan 28, 2021

We discussed this today, and came up with a solution that might meet some (most?) of the use cases, see #68135 for more information about this. It would work for both aliases and data streams (this proposal would not work for data streams, since we do not want to have aliases directly to data stream backing indices).

If you have a use case that doesn't involve wanting to query data based on a particular lifecycle, let us know. Or if you have comments on the other proposal please let us know on that issue.

@mingyitianxia
Copy link

We expect a similar functional implementation:
First: for the hot phase, the index_hot_alias alias is automatically created when rollover is performed;
Second: for the warm phase, the index_warm_alias alias is automatically created when rollover is performed;
Third: For the cold phase, the index_cold_alias alias is automatically created during rollover;
.......

@Evesy
Copy link

Evesy commented May 10, 2022

Our use case is similar to what has been described above, we have time-based daily indices with the current day residing on hot nodes, and previous days being moved to warm nodes.

This data is pretty much solely viewed through Kibana, and we have an index-* index pattern that allows querying across all days, as well as an index-today alias that is tied to the current days index, that we encourage users to use if they only care about recent data.
Historically we have used curator and part of our curator jobs was to remove the alias from the index when it is moved to warm nodes, and the new index has the alias added as part of the index template.

The proposed solution in #68135 is great for queries direct to the Elastic API but doesn't nicely accomodate queries being done through the Kibana UI

@jsbarber
Copy link

The #68135 solution leaves much to be desired (assuming I understand it). Instead of re-assigning an alias in one place, it's "go find all 500 queries you're making and add tier-knowledge metadata to the query criteria for each one".

Right now, we've got a ton of different code bits doing queries to a set of indices via an alias. We aren't currently using ILM due to needing to support older ES versions. Instead, we currently have some scripts that remove indices older than N days from the alias (and moves them from hot-to-warm, force-merge, warm-to-cold, deletes, etc). We'd like to take advantage of ILM for doing all this, but this issue will be a big stumbling block. If we can't update the alias at a point along the ILM way, then instead of querying three days worth of data, we'll be querying all 180 days worth (which we're retaining for potential forensic purposes but don't usually look at).

The main benefit of aliases IMO is that they abstract specific knowledge about the index structure and management from users. Using the #68135 method is going the other way: every querying user needs to know what's in hot and what's not.

An alternate idea: Would it be possible to create a new kind of "query alias" that could be configured to incorporate index level metadata into queries against it? So that, say, whenever I execute a query on the alias, it automatically incorporates a _tier == hot criterion, or index-creation-date >= 'now-3d'. It seems like that could be implemented against indices or data streams equally.

@JackJudge01
Copy link

JackJudge01 commented Feb 28, 2023

index-creation-date >= 'now-3d'

This would work very nicely.
The problem with #68135 is not enough granularity. Most folks who use ILM will probably find the bulk of their data sitting in either the warm or cold tiers. So you've only narrowed down the search to say 95% of their data as opposed to 100%. It doesn't really help that much.

The functionality to add alias' at different stages in an ILM policy does seem like pretty fundamental functionality.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@wasserman
Copy link
Contributor

Could this be handled by using an alias filter on @timestamp?

https://www.elastic.co/guide/en/elasticsearch/reference/current/aliases.html#filter-alias

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "my-index-2099.05.06-000001",
        "alias": "my-alias",
        "filter": {
          "bool": {
            "filter": [
              {
                "range": {
                  "@timestamp": {
                    "gte": "now-1d/d",
                    "lt": "now/d"
                  }
                }
              },
              {
                "term": {
                  "user.id": "kimchy"
                }
              }
            ]
          }
        }
      }
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests