-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] index template priority issue causing the server to crash on upgrading from 2.7 to 2.9 or 2.10 #1771
Comments
Hi Team, Thanks! |
@Bogendra-Betapudi Looks bad. Are these repro steps correct, ie. does this happen on a vanilla installation of OpenSearch? |
@dblock - yes Daniel, this happens even on a plain vanilla installation. Here is what I did:
Please let me know if any further info is needed from our end to expedite this. |
This is caused by one type of index/template, right? A specific mapping? Edit your repro above with the exact minimal commands for the index/template that exhibits this behavior (ss4o_metrics_template above)? Just trying to narrow it down. |
Hi @dblock , will update the reproduce steps to include minimal commands and just the relevant stack trace. Meanwhile, please find below the research and work around I have done so far to get past this:
and this resulted in creating a new template instead of updating the priority of the existing one as shown below via the cat command:
As you can see, its failing with plain vanilla installation too. Would appreciate if there's a way to unblock us while a fix is being worked upon. |
@dblock - Have updated the steps to reproduce with the minimal steps and snippet of the error log. Also, provided the list of things tried so far to get pas this. |
I see two problems. It looks like you tried to edit a template, but that didn't work and another template was created? I can't reproduce that one, here's what I did.
Reading the error carefully it looks like you have a template called tl;dr, @Bogendra-Betapudi is the problem the typo in trying to update the existing template that causes a new template to be created inadvertedly? Assuming it is, one expect the error to happen when the second template was being created (via PUT), which is what I see happen in 2.9.
In general bugs get triaged and assigned at some point soon, but you'll be faster served by digging through the problem and trying to fix it if you have time. We sincerely appreciate your help. Related, I found https://forum.opensearch.org/t/java-lang-illegalargumentexception-index-template-how-critical/15306/16, opensearch-project/OpenSearch#837 and opensearch-project/OpenSearch#8926. |
Note that this in your example looked suspicious.
It has
So how do we get into the state where you have two templates with the same name? @Bogendra-Betapudi curl repro steps? |
I tried with 2.7.0.
Lo and behold, I got two templates, but they do have a different name.
So looks like the template was renamed in 2.8.0 or 2.9.0? I think upgrade is a red herring, that's expected to fail if you have two templates with the same priority for the same index. So the issue is how we got into that state before the upgrade. @Bogendra-Betapudi help narrow this down, try my repro steps in 2.8.0 (I'm on a plane and docker pull will take forever :)), I'd like to know whether this was renamed in 2.8 or 2.9. Also @bowenlan-amzn maybe you recognize this bug? |
I don't recognize this. Did a quick search, seem it's from this repo So @YANG-DB probably can help here. |
@dblock - Thanks for checking. I tried reproducing the same with the 2.8 as suggested. Please find below the detailed steps:
My understanding is there's a change to the names of the default templates (by adding an 's') and they are causing an issue due to the same priority. The work around would be to be able to create these templates with a different priority or be able to update the priority of the existing ones (which didn't work as I showed above). |
Thanks @bowenlan-amzn @dblock @Bogendra-Betapudi |
Likely caused by this PR which renamed the index and got picked up in 2.9, matching the repro in 2.8 -> 2.9. |
In general I'd think we should be clear if we just delete these templates because we don't use them and they were superceded by integrations loading them (also released in 2.9), but I didn't write the original classes before refactoring so I'm not sure what their behavior is if the templates are cleared. @YANG-DB and I are looking at deleting this code for 2.12. |
@dblock We can move the issue to observability. Issue seems to affect default config, but it recovers. Steps:
|
Thanks @YANG-DB and @Swiddis for looking into this. Any plans on providing a fix for someone using the downloaded binaries from the official download site? We download the zip/archive for the windows to be used in our application and dont use it as part of the docker. How do we go about addressing such usages? |
The best solution I can find without a patch release is to delete both of the ss4o templates before upgrading. This is safe since the templates are otherwise unused, and avoids any # As step 4.5 for the above process
curl -XDELETE "https://opensearch-node1:9200/_index_template/ss4o_*_template" I still wonder why it's intermittently starting successfully, though. In a linked issue I see both people who see the exception without startup being hindered, and others where it's blocking startup entirely. I'm not sure what differentiates them. I also have to wonder what tests we need to write to avoid future occurrences, I'd have expected something like this to be revealed by testing. |
Will be resolved in 2.12 onwards following #1770, what's left is verifying a workaround for 2.9-2.11. |
@Swiddis - Thanks for looking into this. For now, will wait for the 2.12 to be available and would try to use the work around if needed before the release. |
Marking as completed since there's a workaround available and the bug is resolved in 2.12. Workaround: delete the conflicting templates, they're unused/deprecated system templates from Integrations. Specifically the singular forms ( |
Describe the bug
When upgrading the Open search server from 2.7 to 2.9 or 2.10, the server crashes in loop and tries to restart and in the logs we notice an error w.r.to the index templates and priority.
Upon further investigation, found this thread where one of the other user also ran into the same issue. But in my case, the server doesn't start up as it tries to restart in a loop and crashes eventually.
To Reproduce
Steps to reproduce the behavior:
[2023-10-09T14:54:01,342][ERROR][o.o.b.Bootstrap ] [localhost] Exception
java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics--] matching patterns from existing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics--]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority
Note that this is happening even with plain vanilla installation. The steps I tried are here: https://github.com/opensearch-project/observability/issues/1771
Expected behavior
The upgrade should be successful and the server should be up and running without any errors/issues.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
I tried to get past this by trying to update the priority of the existing template but that resulted in creating a new template instead. This is a blocker as once we try to upgrade, it will fail due to this issue and we can't go back to using the older version as well as the index context has been updated with the latest Open search version.
The text was updated successfully, but these errors were encountered: