Archive unknown or invalid settings on updates #28888

jasontedor · 2018-03-03T02:14:22Z

Today we can end up in a situation where the cluster state contains unknown or invalid settings. This can happen easily during a rolling upgrade. For example, consider two nodes that are on a version that considers the setting foo.bar to be known and valid. Assume one of these nodes is restarted on a higher version that considers foo.bar to now be either unknown or invalid, and then the second node is restarted too. Now, both nodes will be on a version that consider foo.bar to be unknown or invalid yet this setting will still be contained in the cluster state. This means that if a cluster settings update is applied and we validate the settings update with the existing settings then validation will fail. In such a state, the offending setting can not even be removed. This commit helps out with this situation by archiving any settings that are unknown or invalid at the time that a settings update is applied. This allows the setting update to go through, and the archived settings can be removed at a later time.

Relates #28609

Today we can end up in a situation where the cluster state contains unknown or invalid settings. This can happen easily during a rolling upgrade. For example, consider two nodes that are on a version that considers the setting foo.bar to be known and valid. Assume one of these nodes is restarted on a higher version that considers foo.bar to now be either unknown or invalid, and then the second node is restarted too. Now, both nodes will be on a version that consider foo.bar to be unknown or invalid yet this setting will still be contained in the cluster state. This means that if a cluster settings update is applied and we validate the settings update with the existing settings then validation will fail. In such a state, the offending setting can not even be removed. This commit helps out with this situation by archiving any settings that are unknown or invalid at the time that a settings update is applied. This allows the setting update to go through, and the archived settings can be removed at a later time.

s1monw

I left some minors. newlines are out of control here.

s1monw · 2018-03-03T17:54:58Z

server/src/main/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdater.java

@@ -48,15 +54,35 @@ synchronized Settings getPersistentUpdate() {
        return persistentUpdates.build();
    }

-    synchronized ClusterState updateSettings(final ClusterState currentState, Settings transientToApply, Settings persistentToApply) {
+    synchronized ClusterState updateSettings(


get your newlines under control my friend

s1monw · 2018-03-03T17:57:14Z

server/src/main/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdater.java

+         *  - validate the incoming settings update combined with the existing known and valid settings
+         *  - merge in the archived unknown or invalid settings
+         */
+        final Tuple<Settings, Settings> partitionedTransientSettings =


can we have 2 methods Settings getValidSettings(Settings) and Settings getInvalidSetting(Settings) that would make it simpler to see what is valid and not instead of a tuple?

I pushed a change. I kept the tuple but clarified its usage. I hope this helps.

s1monw · 2018-03-03T17:58:31Z

server/src/main/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdater.java

+            final Map.Entry<String, String> e,
+            final IllegalArgumentException ex,
+            final Logger logger) {
+        logger.warn(


man newlines?!

I thought about painting it yellow. 😛

jasontedor · 2018-03-06T17:02:59Z

@s1monw I pushed some changes addressing your feedback; would you take another look?

jasontedor · 2018-03-07T20:14:05Z

test this please

* master: [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (elastic#28927) Remove BytesRef usage from XContentParser and its subclasses (elastic#28792) [DOCS] Correct typo in configuration (elastic#28903) Fix incorrect datemath example (elastic#28904) Add a usage example of the JLH score (elastic#28905) Wrap stream passed to createParser in try-with-resources (elastic#28897) Rescore collapsed documents (elastic#28521) Fix (simple)_query_string to ignore removed terms (elastic#28871) [Docs] Fix typo in composite aggregation (elastic#28891) Try if tombstone is eligable for pruning before locking on it's key (elastic#28767)

bleskes · 2018-03-08T09:02:17Z

server/src/main/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdater.java

+                (e, ex) -> logInvalidSetting(settingsType, e, ex, logger));
+        return Tuple.tuple(
+                Settings.builder()
+                        .put(settingsWithUnknownOrInvalidArchived.filter(k -> k.startsWith(ARCHIVED_SETTINGS_PREFIX) == false))


I find this distinction a bit confusing - why do we keep the "old" invalid settings but remove the new ones? In the next iteration will just be including the newly archived setting in the previous ones. I'm inclined to say - either remove all archived setting (both new and old) when validating or keep them all. WDYT?

@bleskes Consider a settings update to remove archived persistent settings:

{ "persistent": { "archived.*": null } }

If there are existing persistent archived settings in the cluster state, these should be removed by this update. The strategy that we use to handle unknown or invalid settings is to add them at the end with the archived prefix. If we do the same for existing archived settings, the application of the above settings update will not remove them and they will remain in the cluster state. Does that help?

I think I understand you to say that you want the invalid settings not to be removed by the incoming request you gave. If so, nothing we do here will be "clean" (for example, after you called that command there may be archived settings in the cluster state, which is not what people expect). Not sure it's worth all the complexity. If you prefer it this way (I presume you do), I'm fine with keeping as is but please write an explicit test to demonstrate this behavior.

for example, after you called that command there may be archived settings in the cluster state, which is not what people expect

I disagree with this. If I list my cluster settings and I see archived.foo, archived.bar, and invalid.setting, I do not expect PUT archived.*: null (lazy) to remove invalid.setting. Hence, the choice that I made here.

I pushed a test in 3298664.

bleskes · 2018-03-08T12:43:11Z

server/src/test/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdaterTests.java

+        final Settings toApply = Settings.builder().put("dynamic.setting", "value").build();
+        final boolean applyTransient = randomBoolean();
+        final ClusterState clusterStateAfterUpdate;
+        if (applyTransient) {


I took me quite a while to understand what's going on - on the one hand the persistent vs transient question is handled by parameters to this method (please rename the function parameter if we keep it ;)) and on the other hand we have a boolean here that deals with it directly which may be inconsistent. On top of it all, the rest API and the transport layer allows changing both at once - something we don't test. I think the functional programming got a bit out of hand here, probably due to multiple iterations. Can we have a straight forward test that works with both persistent and transient (randomly combining them)?

I pushed d2bc7fd.

bleskes · 2018-03-08T12:47:52Z

server/src/main/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdater.java

+                (e, ex) -> logInvalidSetting(settingsType, e, ex, logger));
+        return Tuple.tuple(
+                Settings.builder()
+                        .put(settingsWithUnknownOrInvalidArchived.filter(k -> k.startsWith(ARCHIVED_SETTINGS_PREFIX) == false))


I think I understand you to say that you want the invalid settings not to be removed by the incoming request you gave. If so, nothing we do here will be "clean" (for example, after you called that command there may be archived settings in the cluster state, which is not what people expect). Not sure it's worth all the complexity. If you prefer it this way (I presume you do), I'm fine with keeping as is but please write an explicit test to demonstrate this behavior.

* master: (28 commits) Maybe die before failing engine (elastic#28973) Remove special handling for _all in nodes info Remove Booleans use from XContent and ToXContent (elastic#28768) Update Gradle Testing Docs (elastic#28970) Make primary-replica resync failures less lenient (elastic#28534) Remove temporary file 10_basic.yml~ Use different pipeline id in test. (pipelines do not get removed between tests extending from ESIntegTestCase) Use fixture to test the repository-gcs plugin (elastic#28788) Use String.join() to describe a list of tasks (elastic#28941) Fixed incorrect test try-catch statement Plugins: Consolidate plugin and module loading code (elastic#28815) percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query. Build: Remove rest tests on archive distribution projects (elastic#28952) Remove FastStringReader in favor of vanilla StringReader (elastic#28944) Remove FastCharArrayReader and FastCharArrayWriter (elastic#28951) Continue registering pipelines after one pipeline parse failure. (elastic#28752) Build: Fix ability to ignore when no tests are run (elastic#28930) [rest-api-spec] update doc link for /_rank_eval Switch XContentBuilder from BytesStreamOutput to ByteArrayOutputStream (elastic#28945) Factor UnknownNamedObjectException into its own class (elastic#28931) ...

bleskes

Thx @jasontedor for the extra iteration. I left optional comments.

bleskes · 2018-03-13T21:12:15Z

server/src/test/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdaterTests.java

+        for (final Setting<String> invalidSetting : invalidSettings) {
+            if (existingPersistentSettings.keys().contains(invalidSetting.getKey())) {
+                assertThat(
+                        clusterStateAfterUpdate.metaData().persistentSettings().keySet(),


check that it doesn't exist in the non archived form? alternatively check the total count of settings?

I pushed e85ad01.

bleskes · 2018-03-13T21:12:29Z

server/src/test/java/org/elasticsearch/action/admin/cluster/settings/SettingsUpdaterTests.java

+        final Settings.Builder transientToApply = Settings.builder();
+        for (final Setting<String> dynamicSetting : dynamicSettings) {
+            if (randomBoolean()) {
+                persistentToApply.put(dynamicSetting.getKey(), "value");


should we skip some and see they don't change?

I pushed e85ad01.

Today we can end up in a situation where the cluster state contains unknown or invalid settings. This can happen easily during a rolling upgrade. For example, consider two nodes that are on a version that considers the setting foo.bar to be known and valid. Assume one of these nodes is restarted on a higher version that considers foo.bar to now be either unknown or invalid, and then the second node is restarted too. Now, both nodes will be on a version that consider foo.bar to be unknown or invalid yet this setting will still be contained in the cluster state. This means that if a cluster settings update is applied and we validate the settings update with the existing settings then validation will fail. In such a state, the offending setting can not even be removed. This commit helps out with this situation by archiving any settings that are unknown or invalid at the time that a settings update is applied. This allows the setting update to go through, and the archived settings can be removed at a later time.

jasontedor · 2018-03-13T21:35:53Z

Thanks @bleskes and @s1monw.

Today we can end up in a situation where the cluster state contains unknown or invalid settings. This can happen easily during a rolling upgrade. For example, consider two nodes that are on a version that considers the setting foo.bar to be known and valid. Assume one of these nodes is restarted on a higher version that considers foo.bar to now be either unknown or invalid, and then the second node is restarted too. Now, both nodes will be on a version that consider foo.bar to be unknown or invalid yet this setting will still be contained in the cluster state. This means that if a cluster settings update is applied and we validate the settings update with the existing settings then validation will fail. In such a state, the offending setting can not even be removed. This commit helps out with this situation by archiving any settings that are unknown or invalid at the time that a settings update is applied. This allows the setting update to go through, and the archived settings can be removed at a later time.

* master: Add search slowlog level to docs (elastic#29040) Add docs for error file configuration (elastic#29032) Archive unknown or invalid settings on updates (elastic#28888)

jasontedor added review :Core/Infra/Settings Settings infrastructure and APIs v7.0.0 v6.3.0 v6.2.3 labels Mar 3, 2018

jasontedor requested review from s1monw and bleskes March 3, 2018 02:14

jasontedor added 2 commits March 2, 2018 18:17

Formatting

293895c

Fix parameter name

b3781b4

s1monw requested changes Mar 3, 2018

View reviewed changes

jasontedor added 3 commits March 6, 2018 08:37

Fix newlines

d0cf14a

Fix formatting

bf7f356

Clarify tuple

9e55fac

jasontedor requested a review from s1monw March 6, 2018 17:02

s1monw approved these changes Mar 8, 2018

View reviewed changes

bleskes reviewed Mar 8, 2018

View reviewed changes

jasontedor added 5 commits March 12, 2018 06:04

Clarify function

2a1dc17

Remove randomization

e5ceaac

Simplify test

d2bc7fd

Add test

3298664

jasontedor added the v6.1.4 label Mar 12, 2018

Fix typo

e428728

bleskes approved these changes Mar 13, 2018

View reviewed changes

Even stronger testing

e85ad01

jasontedor merged commit 4dc3ada into elastic:master Mar 13, 2018

jasontedor removed the v6.1.4 label Mar 13, 2018

jasontedor deleted the unknown-or-invalid-settings-updates branch March 13, 2018 21:36

jasontedor mentioned this pull request Mar 17, 2018

Archived settings prevent updating other settings #28026

Closed

jasontedor added v6.2.4 and removed v6.2.3 labels Mar 18, 2018

clintongormley added the >bug label Apr 18, 2018

mayya-sharipova mentioned this pull request Oct 17, 2018

Discontinue archiving broken cluster settings #28253

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

gwbrown mentioned this pull request Dec 3, 2020

[Settings] Allow Deletion of unknown settings #28609

Closed

penghuo mentioned this pull request Jun 23, 2021

unknown setting [archived.opendistro.sql.cursor.enabled] error when upgrading opensearch-project/sql#136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archive unknown or invalid settings on updates #28888

Archive unknown or invalid settings on updates #28888

jasontedor commented Mar 3, 2018

s1monw left a comment

s1monw Mar 3, 2018

s1monw Mar 3, 2018

jasontedor Mar 6, 2018

s1monw Mar 3, 2018

jasontedor Mar 6, 2018

jasontedor commented Mar 6, 2018

jasontedor commented Mar 7, 2018

bleskes Mar 8, 2018

jasontedor Mar 8, 2018

bleskes Mar 8, 2018

jasontedor Mar 12, 2018

jasontedor Mar 12, 2018

bleskes Mar 8, 2018

jasontedor Mar 12, 2018

bleskes Mar 8, 2018

bleskes left a comment

bleskes Mar 13, 2018

jasontedor Mar 13, 2018

bleskes Mar 13, 2018

jasontedor Mar 13, 2018

jasontedor commented Mar 13, 2018

Archive unknown or invalid settings on updates #28888

Archive unknown or invalid settings on updates #28888

Conversation

jasontedor commented Mar 3, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor commented Mar 6, 2018

jasontedor commented Mar 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor commented Mar 13, 2018