-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watcher: Mysterious rolling upgrade failure #33185
Comments
Pinging @elastic/es-core-infra |
Pasting from an old issue (in old xpack repo) with a similar error. I think it was a concurrency issue for the other one but was never resolved.
In that issue, @DaveCTurner followed up with:
Not sure if it's the same issue (manual executions racing), but if there are concurrency issues with this code it might be manifesting in the above test too. |
And another one today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2197/consoleText in master intake |
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: elastic#33185
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: #33185
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: #33185
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: elastic#33185
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: elastic#33185
Another instance in 7.0 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+artifactory/133/console |
This fails on old_cluster but mixed_cluster and upgraded_cluster depend on watches set in old_cluster so that can't be muted on its own Relates: #33185
Another failure in 7.0 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.0+artifactory/136/testReport/junit/org.elasticsearch.upgrades/UpgradeClusterClientYamlTestSuiteIT/test__p0_old_cluster_60_watcher_CRUD_watch_APIs_/ so I backported the muting to 7.0 1948702 There are only 2 watcher tests in the yml suite both are muted leaving only WatcherRestartIT for |
Un-muted this test on PR #42377 to obtain additional logs. If (when?) this test fails again please obtain the following information before muting the test:
|
This failed in a PR build: Jenkins build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-bwc/6473/
logs: |
In today's
It looks like GitHub interpreted the phrase |
This test is believed to be fixed by elastic#43939 closes elastic#33185
set watcher logger to debug level. These tests haven't run in such a long time, we first need to get a better picture how/if these tests fail today. See elastic#33185
set watcher logger to debug level. These tests haven't run in such a long time, we first need to get a better picture how/if these tests fail today. See #33185
set watcher logger to debug level. These tests haven't run in such a long time, we first need to get a better picture how/if these tests fail today. Backport of elastic#51478 See elastic#33185
After merging in the pr that enables the watcher rolling upgrade tests, these tests haven't yet failed. I'm going to also enable these tests in the 7 dot x branch. |
Today the first real failure occurred:
Last response:
Relevant build logs on node executing watch:
Build url: https://gradle-enterprise.elastic.co/s/hbiysvknha5wi/ |
I think that after the watch gets created when running against old cluster then we should wait until .watches index is at least yellow and watcher is started. This way we avoid executing a watch before watcher is ready to execute. |
In the rolling upgrade tests, watcher is manually executed, in rare scenarios this happens before watcher is started, resulting in the manual execution to fail. Relates to elastic#33185
The latest failure happened a couple of times in the last week. I've opened #52139 to address it. |
In the rolling upgrade tests, watcher is manually executed, in rare scenarios this happens before watcher is started, resulting in the manual execution to fail. Relates to #33185
…ic#52139) In the rolling upgrade tests, watcher is manually executed, in rare scenarios this happens before watcher is started, resulting in the manual execution to fail. Relates to elastic#33185
I'm closing this issue, these commits (^), seem to have stabilised this test. |
This rolling upgrade build failed fairly mysteriously. This is what the failure looks like:
This is in the "old" cluster so the cluster state is empty, but we still get an error as though the watch was running.
The text was updated successfully, but these errors were encountered: