Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILM: parse origination date from index name #46755

Merged
merged 20 commits into from
Sep 25, 2019

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Sep 16, 2019

Introduce the index.lifecycle.parse_origination_date setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
indexName-{dateFormat}-optional_digits will fail before being created.
The origination date will be parsed when initialising a lifecycle for an
index and it will be set as the index.lifecycle.origination_date for
that index.

A user set value for index.lifecycle.origination_date will always
override a possible parsable date from the index name.

Together with #46561 closes #42449

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

@andreidan andreidan added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Sep 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@dakrone
Copy link
Member

dakrone commented Sep 19, 2019

Thanks for opening this for discussion Andrei! I have some ideas for how to implement it.

the user simply enables/disables the parsing, without having the option to configure the date format in the index name. Should we be more configurable and allow the user to set the date format? Or even more to provide the full index name pattern? (personal opinion here is to be more restrictive as this feature is not about legacy indexes but newly created ones which must obey the pattern)

I think to start we should support the yyyy.MM.dd format that you mentioned. We can always expand later to customization of the format if we have to, but it would be harder to go the other way around.

One thing I notice from your PR is parsing ^.*-(.+)$, which would support index-2019.09.18, but I think we will also want to support index-2019.09.18-000231 for indices that are using rollover (so the same, just ^.*-(.+)(-[\d]+)?$ I think (I did not double check the regex))

the way the "parse origination date" and explicit "origination date" interact is that the parse setting has priority if both are configured. I think if we're going this way it should actually be the other way around and have the explicit setting supersede the parse setting. Or maybe having both settings configured should yield an error?

I agree that the origination_date set in the settings should always have priority. If it's not set, and parse_origination_date is set, we should attempt to automatically set the origination date setting to what the parsed value is. We can do this I think in InitializePolicyContextStep#performAction when we update the cluster state with the new index metadata (we can update the settings at the same time)

How lenient should we be to parsing errors (ie. can't find "-" in the index name or the date parsing fails)?

We should be extremely strict about parsing errors, so that we avoid any unintended behavior around the age of the index, especially since this can lead to data being deleted with an unintended retention period.

We can do this by registering an IndexEventListener that implements the beforeIndexAddedToCluster method. In this method, we could validate that the index name correctly matched the regex for parsing the date out of the name. If it doesn't match, we can throw an exception and prevent the index from being created (rather than leniently creating it without parsing the date).

Introduce the `index.lifecycle.parse_origination_date` setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
`indexName-{dateFormat}-optional_digits` will fail before being created.
The origination date will be parsed when initialising a lifecyle for an
index and it will be set as the `index.lifecycle.origination_date` for
that index.

A user set value for `index.lifecycle.origination_date` will always
override a possible parsable date from the index name.
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan marked this pull request as ready for review September 23, 2019 15:33
@andreidan
Copy link
Contributor Author

andreidan commented Sep 23, 2019

Adding the initial draft PR description here for completeness (and to provide context to @dakrone 's comment)

Initial draft for parsing the origination date from the index name.

This is meant solely as support for discussion on how to implement
this feature.

A few assumptions that we should discuss:

  • the user simply enables/disables the parsing, without having the
    option to configure the date format in the index name. Should we
    be more configurable and allow the user to set the date format? Or
    even more to provide the full index name pattern? (personal opinion
    here is to be more restrictive as this feature is not about legacy
    indexes but newly created ones which must obey the pattern)
  • the way the "parse origination date" and explicit "origination date"
    interact is that the parse setting has priority if both are configured.
    I think if we're going this way it should actually be the other way
    around and have the explicit setting supersede the parse setting.
    Or maybe having both settings configured should yield an error?
  • How lenient should we be to parsing errors (ie. can't find "-" in the
    index name or the date parsing fails)?

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone September 23, 2019 15:51
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank Andrei, I think this will be a great feature to have! I left some comments and questions for you on this.

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone September 25, 2019 13:52
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on this Andrei! I played with it locally and it works very well.

@andreidan andreidan merged commit c363d27 into elastic:master Sep 25, 2019
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Sep 25, 2019
* ILM: parse origination date from index name

Introduce the `index.lifecycle.parse_origination_date` setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
`indexName-{dateFormat}-optional_digits` will fail before being created.
The origination date will be parsed when initialising a lifecycle for an
index and it will be set as the `index.lifecycle.origination_date` for
that index.

A user set value for `index.lifecycle.origination_date` will always
override a possible parsable date from the index name.

(cherry picked from commit c363d27)
Signed-off-by: Andrei Dan <[email protected]>
andreidan added a commit that referenced this pull request Sep 25, 2019
* ILM: parse origination date from index name (#46755)

Introduce the `index.lifecycle.parse_origination_date` setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
`indexName-{dateFormat}-optional_digits` will fail before being created.
The origination date will be parsed when initialising a lifecycle for an
index and it will be set as the `index.lifecycle.origination_date` for
that index.

A user set value for `index.lifecycle.origination_date` will always
override a possible parsable date from the index name.

(cherry picked from commit c363d27)
Signed-off-by: Andrei Dan <[email protected]>

* Drop usage of Map.of to be java 8 compliant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support ilm phase timings based on index naming date_math rather than creation_date
4 participants