-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all shards failed [type=search_phase_execution_exception]\nSearch service in Jaeger Query UI #2976
Comments
Similar to #2718 |
This should not happen since we added creation of index templates to Jaeger startup. Did this happen with clean Jaeger installation or after the upgrade? |
@raz08 if it is resolved please add more details how it was resolved and close the issue. |
I got the same error after jaeger 1.22 -> 1.25, and the bad request is It works after clean old data by |
@meilihao are these two Jaeger versions using the same ES cluster? |
@pavolloffay I'm not sure. i used ES by docker with jaeger 1.22 before, today I removed docker and use ES by apt with jaeger 1.25. I don't remember that |
I have this issue with version |
This was the only way I was able to get our deployment working again. Sadly costs data but we were lucky it was just our dev environment. |
On each Jaeger start the collector makes a request to the ES to create index templates. |
We are facing the same issue . do we have any suggestion.solution ? we tried deleting indexes and redeploy jaeger operator, didn't work. |
facing the same issue with 1.28 |
Having the same issue with 1.38.0 after upgrading to a new opensearch instance. |
Facing the same issue on all version from 1.30.0 to 1.38.0 |
I had to roll back to v1.21 to get it to work again. |
Is there any update on this? I get the same error. |
Is there any update on this? I get the same error. |
Any updates about this? |
Still happening with jaeger 1.49. Edit: Only option I found that seems to fix it, was mentioned in kubegems/kubegems#413:
|
In my case, I have Jaeger conneting with AWS opensearch. I had this issue first: #3571 (comment), so I added ES_CREATE_INDEX_TEMPLATES = "false" flag. Then I experienced the same issue with Jaeger query mentioned in this thread, I uploaded both jaeger-span-7.json and jaeger-service-7.json (with a minor modification from this repo to remove the micros) manually to opensearch, based on opensearch documentation: https://opensearch.org/docs/latest/im-plugin/index-templates/. Next I applied the workaround mentioned above to finally made it work. I don't think either issue is fixed. |
Just occurred on 1.50 as well, right after we deployed Jaeger to production a month ago. Incredibly embarrassing (for us). I get the feeling that most "fixes" mentioned here and in related issues merely mask the problem, since they tend to involve deleting the previous data. Unfortunately, deleting the traces is not an option for us, since they also serve auditing purposes. I'll look at the source to see what kind of queries Jaeger sends towards ES, because the indexes can be queried from Kibana just fine, both new and old. The new traces also continue to be written successfully. |
Got it. It's the service and operation queries that fail on the UI. The service query is the following:
And the error is:
Index templates created by Jaeger seem to be applied normally to the indexes, the only deviation we have from the default config is an additional index template for adding a custom ILM policy (since we don't use aliases). As for the error message itself, there seems to be little point to adding "fielddata" to the template, since we already have a keyword, and by changing EDIT: This is unbelievable. On our test environment, where the Jaeger UI still works, the same query succeeds without issues. The index mappings seem identical. The data itself seems structurally identical as well. And if I substitute |
I managed to fix the problem. Here's what has been the cause for us; I hope it'll be useful for others as well in the future. First of all, this is apparently not a bug in Jaeger, but an ES configuration issue. ElasticSearch can apply only a single template to a newly created index. This is based on the templates' priority, and the templates created by Jaeger are put into the "legacy templates" category in ES 7, which unfortunately means that they have the absolute lowest priority among the templates. If you define any other template with an index pattern that overlaps these Jaeger templates, that will be used instead, and you will likely end up with an index with missing or incorrect mapping settings. In our case, the template that was used instead of Jaeger's had dynamic mapping enabled, which means that ES autocreates mapping definitions from the incoming data. This is how our service definition indexes had a mapping for the serviceName field mentioned above. But unfortunately, the autocreated mapping differs from the format Jaeger expects the serviceName to be present in: it wants the field to be a keyword, while in the autocreated mapping, serviceName is a text field, with a sub-property named (and typed as) keyword. The aggregation ES query Jaeger uses to get the service names requires a keyword field, which is what causes the UI to report an error. What is truly insidious about this issue is that, if you introduce the index template that overrides Jaeger's while some Jaeger indexes already exist, the problem does not manifest itself immediately. This is because Jaeger usually queries multiple indexes at once, based on the value of the es.max-span-age parameter. As long as even one index in Jaeger's "query window" contains the expected mappings, the UI will seemingly function as normal; in the background, part of the service/operation queries will fail, but as long as at least one index returns meaningful results, Jaeger will not complain. If there is one thing Jaeger could perhaps do better in such a situation, it's to at least report a warning if some shards return an error during the query, to let users know that something is amiss. This will enable them to find the issue a lot more easier than the Jaeger UI suddenly breaking multiple days, or even weeks after the problematic index template was introduced. TL;DR: Make sure Jaeger's index templates are not overridden. If they are, the UI won't fail straight away, but it will eventually. |
@pip25 thanks for a great analysis and write-up. Note that recently we added an ability to specify which priorities to use when creating Jaeger indices (ESv8 only):
Is there something else we could add to alleviate this specific issue? |
@yurishkuro Thanks, we're currently stuck on ESv7, but that is good to know.
As I wrote in the above wall of text :), it may make this configuration problem easier to spot if Jaeger did not silently swallow query errors in cases when only some shards fail (and thus some meaningful result is in fact returned). In such cases, some kind of warning message would be useful. |
Hi @pip25 |
@ksai2389 If your issue is what I've described above, you need to delete the problematic ES indexes with the wrong mappings set, then disable/modify the index templates that introduced the wrong mappings in the first place. If only Jaeger's templates can be applied to Jaeger's indexes, from then on your queries should be working. |
This is a config issue with the backend not a problem with Jaeger, closing out. |
Describe the bug
In Jaeger UI there is an error "all shards failed" and it is not loading traces in UI
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Jaeger Query UI should provide appropriate message in UI and it should auto recover once elastic is in green state
Screenshots
{"level":"error","ts":1616554296.4920769,"caller":"app/http_handler.go:410","msg":"HTTP handler, Internal Server Error","error":"Search service failed: elastic: Error 503 (Service Unaviled [type=search_phase_execution_exception]","errorVerbose":"elastic: Error 503 (Service Unavailable): all shards failed [type=search_phase_execution_exception]\nSearch service
Version (please complete the following information):
What troubleshooting steps did you try?
Try to follow https://www.jaegertracing.io/docs/latest/troubleshooting/ and describe how far you were able to progress and/or which steps did not work.
Additional context
Does upgrading Jaeger helps in fixing this issue? if not how to solve this issue?
The text was updated successfully, but these errors were encountered: