-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in WatcherRestartIT.testWatcherRestart #69918
Comments
Pinging @elastic/es-core-features (Team:Core/Features) |
If I recall I think the suspicion here was the this failing Watcher test was causing the other upgrade tests to fail as well. It seems we still are having failings in 7.12 BWC tests against 7.9 clusters even with the watcher test muted. |
Yes, those test failures now appear independent of the watcher test failure. |
There's a number of tests failing with this though, so it seems it might still be something generally watcher-esque.
This seems to be busted in both 7.x and 7.12 branches and specific to the BWC tests against the 7.9.x series. |
This is watcher related. I will mute this assertions in A fix was pushes yesterday around when watcher templates are updated. |
* The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Closes elastic#69918
* The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Should resolve #69918
After merging #69998, it looks like the |
Observations from: https://gradle-enterprise.elastic.co/s/ciwpar45g6ev6/
|
here is the latest failure around not being able to start the watcher nodes: https://gradle-enterprise.elastic.co/s/h6mmehbdy6yao/console-log?task=:x-pack:qa:rolling-upgrade:v7.9.2%23oneThirdUpgradedTest I will open a separate issue for other more or less related bwc test failures. |
When running in a mixed cluster with node with version pre 7.8 and post 7.8 then attempting the install composable index templates from any node can cause many error logs indicating that put composable and component index template apis don't exist. These APIs are always redirect to the elected master and as long as this node is on a pre 7.8 version then attempting to install these templates will always fail. Relates to elastic#69918
The So please can the fixes or mutes be backported to the 7.12 branch too? |
@droberts195 I'm on this test failure issue. I will backport #69998 to the 7.12 and 7.11 branches. |
* The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Should resolve elastic#69918
* The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Should resolve elastic#69918
The real next issue that have to fix is that when upgrading from some versions (7.0.1, 7.2.0, 7.4.0, 7.5.1), starting the upgraded node (on both 7.13, 7.12 or 7.11.3 versions) fails because when loading the cluster state, there is somehow a rollover action in an ilm policy with no conditions set. We check for this in the constructor of the |
Backporting #69998 to 7.12 branch. * The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Should resolve #69918
Backporting #69998 to 7.11 branch. * The WatcherIndexTemplateRegistry moved from legacy templates to composable index templates in version 7.10.0 and not 7.9.0 * The WatcherIndexTemplateRegistry#validate(...) method should only care whether a template for watcher history indices exist and not whether that template is a composable index template or a legacy template. This shouldn't matter whether to determine if watcher can be started on a node. The content of the templates didn't change in a breaking manner (since version 6.8.0). Should resolve #69918
We think that we figured out why the rollover validation error occurs during startup, #69995 added a the new To address this issue, instead of serializing no conditions, the |
…upgrade. If node doesn't support maxPrimaryShardSize then serialize maxPrimaryShardSize as maxSize. This should fix a problematic situation if an older node doesn't support maxPrimaryShardSize and this is the only condition specified then the older node ends up with a instance without any conditions. This could lead to upgrade failures, new nodes not able to start because local cluster state can't be read. Relates to elastic#69918
…de. (#70057) When running in a mixed cluster with node with version pre 7.8 and post 7.8 then attempting the install composable index templates from any node can cause many error logs indicating that put composable and component index template apis don't exist. These APIs are always redirect to the elected master and as long as this node is on a pre 7.8 version then attempting to install these templates will always fail. Relates to #69918
…de. (elastic#70057) When running in a mixed cluster with node with version pre 7.8 and post 7.8 then attempting the install composable index templates from any node can cause many error logs indicating that put composable and component index template apis don't exist. These APIs are always redirect to the elected master and as long as this node is on a pre 7.8 version then attempting to install these templates will always fail. Relates to elastic#69918
…de. (#70080) Backport of #70057 to 7.x branch When running in a mixed cluster with node with version pre 7.8 and post 7.8 then attempting the install composable index templates from any node can cause many error logs indicating that put composable and component index template apis don't exist. These APIs are always redirect to the elected master and as long as this node is on a pre 7.8 version then attempting to install these templates will always fail. Relates to #69918
The 7.12 bwc ci job finally completed successfully. The 7.11 bwc ci job would have to, but unfortunately ran in two network related errors while downloading artifacts. The 7.x bwc should also complete successfully when #70076 is merged and backported to the 7.x branch. |
…upgrade (#70076) If node doesn't support maxPrimaryShardSize then serialize maxPrimaryShardSize as maxSize. This should fix a problematic situation if an older node doesn't support maxPrimaryShardSize and this is the only condition specified then the older node ends up with a instance without any conditions. This could lead to upgrade failures, new nodes not able to start because local cluster state can't be read. Relates to #69918
…upgrade (elastic#70076) If node doesn't support maxPrimaryShardSize then serialize maxPrimaryShardSize as maxSize. This should fix a problematic situation if an older node doesn't support maxPrimaryShardSize and this is the only condition specified then the older node ends up with a instance without any conditions. This could lead to upgrade failures, new nodes not able to start because local cluster state can't be read. Relates to elastic#69918
…upgrade (#70076) (#70128) If node doesn't support maxPrimaryShardSize then serialize maxPrimaryShardSize as maxSize. This should fix a problematic situation if an older node doesn't support maxPrimaryShardSize and this is the only condition specified then the older node ends up with a instance without any conditions. This could lead to upgrade failures, new nodes not able to start because local cluster state can't be read. Relates to #69918
Also 7.11 bwc ci job is now happy. Looks like all watcher and ilm fatal upgrade errors have been solved. I will close this issue. Please open a new issue if any of the failures mentioned occur again. |
…de. (#70057) When running in a mixed cluster with node with version pre 7.8 and post 7.8 then attempting the install composable index templates from any node can cause many error logs indicating that put composable and component index template apis don't exist. These APIs are always redirect to the elected master and as long as this node is on a pre 7.8 version then attempting to install these templates will always fail. Relates to #69918
Build scan: https://gradle-enterprise.elastic.co/s/pgbuawfavusl2
Repro line:
./gradlew ':x-pack:qa:rolling-upgrade:v7.9.1#oneThirdUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.WatcherRestartIT" -Dtests.method="testWatcherRestart" -Dtests.seed=3DE603A8143C17BA -Dtests.security.manager=true -Dtests.bwc=true -Dtests.locale=fr-CA -Dtests.timezone=Africa/Tripoli -Druntime.java=8
Reproduces locally?: No
Applicable branches: 7.x, 7.11, 7.12
Failure history: Failing pretty frequently as of the morning of 3/3/21
Failure excerpt:
The text was updated successfully, but these errors were encountered: