-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Elasticsearch Rollover API to manage indices #1242
Comments
cc @jaegertracing/elasticsearch |
I just read https://www.elastic.co/blog/managing-time-based-indices-efficiently - while the primitives make sense, the process itself is absolutely horrifying: 7 or more steps, any of which can fail, with undefined wait periods between them. At least our daily indices require almost no maintenance, just the delete job with a single step. I would only consider the rollover pattern if it's fully supported by the curator. If it is, I think it's a good direction, but it sounds like we'd still need to provide a tool to generate the curator yaml files with all the actions. |
What steps exactly do you mean? I the linked example is using even more complicated deployment model with hot/cold nodes. Curator already supports rollover https://www.elastic.co/guide/en/elasticsearch/client/curator/current/rollover.html. In addition to that I am also interested in adding elastic/curator#1278 to its API. To make rollover work the only required steps are:
The date based indices will be still supported. Rollover will be an additional feature for users which can benefit from it. |
I was referring to the steps in the blog post. Rollover is just one step, all others have to do with managing the index aliases, relocating index to warm nodes, compressing it, etc. Calling the rollover API only triggers index rollover once in a while, it's not sufficient for managing the whole thing via aliases. I am not opposed to the approach, as long as curator provides the necessary automation for managing the aliases after the rollover. |
Adding questions from weekly meeting:
|
If it helps: https://sematext.com/blog/field-stats-plugin-elasticsearch/ (Github repo for the plugin linked in there) |
thanks for the pointer @otisg I think we would like to stay with only official ES distribution if it is possible. The |
The #1197 introduces My design is that an external component would remove indices from the alias to mimic the behavior of Any more thoughts on this from @jaegertracing/elasticsearch ? |
seems like |
Yes, But we should provide an alternative solution to that... I have added this functionality to |
the @pavolloffay Would it be possible to use a Possible example:
|
@masteinhauser I think using
Can you please explain why is it critical to you? Do you deploy multiple query services with a different There is only one PR related to rollover: #1197, see the first comment and section |
Yep, we already see that exact behavior with our
Unfortunately, we have far too many defects filed from production speakers and customers that get worked on outside the default I'm not sure how Kibana does this, but I do know it handles far more data over a larger timeframe much better than the Jaeger Query searches seem to. (We use Kibana to figure out all of our Traces, and then use those TraceIDs to pull up Jaeger's view of the spans) Oh, apologies, I'll take a look at #1197 once again to re-familiarize myself. Thanks for the reference! |
https://discuss.elastic.co/t/filter-indices-for-range-query-in-time-based-indices/149913 mentions that range query could be used with a large number of indices, that ES does some optimizations to avoid going through all indices. One way or another this will be done separately with some perf tests. |
ES 6.5 and 7.0.0 (I was able to test this with 7 only) supports rollover policies https://www.elastic.co/guide/en/elasticsearch/reference/6.x//using-policies-rollover.html. It means that rollover conditions are set in a policy and ES automatically creates new index - no need to periodically call The following example will create a new index every 5s and delete if older than 20s. To make this wor per seconds we have to modify cluster setting
|
Heads up https://www.elastic.co/guide/en/elasticsearch/reference/6.7/index-lifecycle-management-api.html is enterprise x-pack feature so we cannot use it in OSS. The only improvement we can do is time range queries #1361. Maybe there is an OSS plugin which provides index lifecycle management, then the deployment will not require to run |
Is ILM now not a basic feature of elastic 7+ now? |
Yes, It seems no longer be listed under x-pack https://www.elastic.co/guide/en/elasticsearch/reference/7.x/index-lifecycle-management-api.html |
I have been trying to find more info about that comment, were you able to confirm how this |
Actually I wasn't able to find any concrete docs. There is a PR that implements wildcard index for query - depending only on time range. #1969 Our (for now) internal results show that it is slower than providing a complied list of indices to query. |
@pavolloffay - I have been trying to use ILM to manage the jaeger rollovers and deletion - Instead of having a cron job hitting rollover api to manually perform rollover - as specified in this blog (https://medium.com/jaegertracing/using-elasticsearch-rollover-to-manage-indices-8b3d0c77915d). To achieve the same, I am creating override index templates (for span and service) before running the init. Then run esrollover.py init to creating span,service templates ,aliases and first indices (span-00001 and service-00001)
jaeger-ILM-Policy is created before hand.
In the override template I add a alias "jaeger-span-read" which will make sure all the indices created by jaeger would have the read alias. And I use "jaeger-span-write" as index_rollover_alias. I see the initial rollover (rollover from hot) working fine. I am having a challenge, when it tries to perform checks after initial rollover (to delete), it fails as the initial index or previous index no longer is part of index_rollover_alias (jaeger-span-write). I wanted to understand the rationale of using two different alias for reading and writing, we could have used one alias and used "is_write_index". I see the same mentioned in one of the above comments for archive-index. |
What component is causing the issue?
IIRC it was done for ES5. The ES5 does not support |
@pavolloffay - Thanks for reverting quickly. As we dont associate is_write_index to initial index. After first rollover, span-0001 is removed from jaeger-span-write alias (which is ilm_rollover_alias). When ilm polls the span-0001 index for further lifecycle events it complains:
If we add is_write_index true while creating span-0001 - I suspect this would work. I am going to give it a try and update. |
thanks @bhiravabhatla. It would be great to put a guide/docs or blog post on this topic if you are interested. |
Will do @pavolloffay, Thank you!. I think we can add the is_write_index true while adding indices to the write alias here by passing extra_settings here - jaeger/plugin/storage/es/esRollover.py Line 124 in af985ae
Correct me if I am wrong |
Hi @pavolloffay - Was able to implement the same, made few tweaks to esRollover.py. Pushed the updated image here - https://github.com/bhiravabhatla/jaeger-index-rollover-with-ilm. Have tested it with example application, I could see that Jaeger is able to read from read-alias and ILM is able to rollover and delete indices as specified in config. Note - Have not tested for archive indices. Summary: -- Create a ILM policy for jaeger in elastic search. In below sample for demo, I have kept max_age and delete after in minutes. Sample : -- Run Init to create the initial set of aliases and templates. I am creating override templates[with different name and order=1] - as when jaeger starts up, it creates/updates the templates with name - jaeger-service and jaeger-span.
-- Start Jaeger with es.use-aliases=true Note - By default indices.lifecycle.poll_interval is set to 10m, for testing, we would have to set it to something less say 10s
|
@pavolloffay - Could you please share feedback on above. One thing I could have done was to parameterise jaeger ILM policy names in the templates |
In the jaeger index templates? We should make the ILM work with the upstream Jaeger if possible without requiring users to do changes. I don't have experience with ILM configuration so I cannot really comment if it's good or not. Perhaps somebody from @jaegertracing/elasticsearch can have a look on the approach mentioned above? @bhiravabhatla would you be interested documentig this in jaegertracing.io or writing a medium post? |
I agree we should make this work with upstream Jaegar. The above can be looked as a workaround to use ILM with current jaeger capabilities. In future, I think we can have a flag --es.use-ILM or something similar and create the index template accordingly from jaeger itself - open to discussions on this.
Sure. I can, let me know the process. |
Would you be also intereted in submitting a PR to do this? The docs are hosted here https://github.com/jaegertracing/documentation/blob/master/content/docs/next-release/deployment.md#elasticsearch you can create a PR against that. The blog is hosted on medium https://medium.com/jaegertracing. If you prefer the blog I can add you to the medium Jaeger org so that you can submit a publication there - I will need your medium account. |
@pavolloffay - I actually have drafted a blog in my medium account. Have not published yet. My medium account https://medium.com/@bhiravabhatla |
Have not used golang before, I am interested - but would need some help. :) |
np we can help you with golang :). I have sent you an invite on medium to join jaegertracing. |
Thank you :).
Thank you @pavolloffay - I have submitted the draft. |
Requirement - what kind of business use case are you trying to solve?
Use ES Rollover API to manage retention. It's an alternative to date based indices currently used in Jaeger. We could make it as an optional feature.
Before running jaeger we have to create write(read) alias:
The command creates index
jaeger-span-000001
and aliasjaeger-span
.Now collector can write to
jaeger-span
alias. Once the index is too large an external service can rollover new index. This API has to be called periodically and once conditions are met (during the call). ES will create a new index.The command creates index
jaeger-span-000002
which is put into aliasjaeger-span
. Note that the old indexjaeger-span-000001
stays in alias if"is_write_index": true
(supported only in ES > 6.4).ES < 6.4
When using ES < 6.4. We have to also use a read alias because the main alias
jaeger-span
can contain only one index.This command creates read alias
jaeger-span-read
which points tojaeger-span
index (the write index).When calling rollover we have to specify the alias names. A newly created index will be put into the alias.
https://www.elastic.co/guide/en/elasticsearch/reference/6.5/indices-rollover-index.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/indices-rollover-index.html
https://www.elastic.co/blog/managing-time-based-indices-efficiently
Proposal - what do you suggest to solve the problem or improve the existing situation?
Introduce flag which will use a single index (alias) to read and write.
Any open questions to address
The text was updated successfully, but these errors were encountered: