-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archived settings prevent updating other settings #28026
Comments
We discussed this during Fix-it-Friday and agreed that we should not archive unknown and broken cluster settings. Instead, we should fail to recover the cluster state. The solution for users in an upgrade case would be to rollback to the previous version, address the settings that would be unknown or broken in the next major version, and then proceed with the upgrade. |
The solution does not seem to apply for transient settings. I'm getting acknowledgement from ES, but the invalid setting stays. (in my case |
@otrosien how were you able to keep transient settings between versions? Did you do a rolling upgrade from 5.6 to 6.x? |
@otrosien 's teammate here. @mayya-sharipova Yes, we did a rolling upgrade of Elasticsearch. after the upgrade, the transient settings remained, but trying to either remove the unsupported setting or change any other setting in the transient set throws the error:
For us the problem is not "archival" of bad settings, but the complete inability to edit transient settings now that they contain one unsupported setting. We can update any persistent settings because those were empty before the upgrade, but for the settings that exist in our transient settings, the transient versions take precedence according to documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/cluster-update-settings.html#_precedence_of_settings We would expect to have a bugfix release of Elasticsearch, which allows this cleanup without requiring a full cluster restart. At this point, the only option we have is to create a new cluster in parallel, index to it, and change DNS settings. This is extremely expensive, because our cluster is large(ish), with 100s of data nodes. service disruption by way of a full-cluster restart is not an option for us. |
|
@mayya-sharipova we tried all variations of removing that setting. Apparently it was not moved to |
Having the same issue in #28524 Were unable to rollback, so a force reset solution would be nice. Since its a production cluster we also dont want to shutdown for this... |
If the official solution is what @jasontedor said, this should really make it to the documentation on rolling upgrade procedure |
This should not be the official solution for this. Getting hell lot of errors downgrading / rollbacking ending in:
|
There is a misunderstanding here. This comment that is being referred to as the "official solution" is not a solution. It is a proposal for how we should change Elasticsearch so that users can not end up in the situation that is causing so many problems here. It requires code changes to implement that solution and a new release carrying that solution. |
Thanks @jasontedor for the clarification. |
@faxm0dem The current workaround is to remove archived settings by |
If you have dedicated master nodes, we were able to workaround this by downgrading them to a previous version (5.6.1 in our case) and then removing the offending settings, then re-upgrading. |
Oh very cool thanks! @scratchy can you try this? |
I work with @waltrinehart. We were able to apply the downgrade master workaround for The only real solution that we see right now is a software patch allowing us to remove this setting and move forward. |
We found that shutting down all of our master nodes simultaneously and starting them back up was sufficient to clear the The cluster still required initializing all the shards even though the data nodes stayed up. This isn't possible for everyone though, so I think an alternative path without such disruption is still needed. Follow cluster recovery we saw the setting was properly archived and could be removed. Confirms that it is an issue that crops up during rolling upgrades. |
We integrated a change (#28888) that will automatically archive any unknown or invalid settings on any settings update. This prevents their presence for failing the request and once archived they can be deleted. |
@jasontedor do you know when this will be released? |
Currently unknown or invalid cluster settings get archived. For a better user experience, we stop archving broken cluster settings. Instead, we will fail to recover the cluster state. The solution for users in an upgrade case would be to rollback to the previous version, address the settings that would be unknown or invalid the next major version, and then proceed with the upgrade. Closes elastic#28026
I'm no expert, but I'm suffering from this bug/situation right now and, if you're looking for QA feedback: this has put our production deployment in a very precarious state. |
I am running ES 6.3.0 and I executed:
and restarted the full cluster. That did it for me. |
The situation described in the OP is still true today (e.g. for upgrades from snapshots built from Do we still consider this a bug? We could say that if you upgrade your cluster without addressing all the deprecation warnings first then there is a risk that some things may not work for you. In this case it's |
We discussed this today and agreed that we are happy with the behaviour as it stands, so this can be closed. |
Hey team, sorry to dig up an old issue but we just hit this during cloud-observability upgrade (from 6.8 to 7.8). Some of our clusters have setting
which is apparently not supported in 7.x and hence got When upgrade succeeds, those settings leave cluster basically unusable (at least, on Elastic Cloud) |
@chingis-elastic that this was not caught ahead of the upgrade sounds like it might be a bug somewhere in the deprecation or upgrade assistance areas. Would you open a new issue for it to make sure that gets investigated? Closed issues like this don't normally see any further activity. |
It looks like if you:
Then you get an error back about the archived setting not being a valid setting. You can clear the archived setting with
PUT _cluster/settings { "persistent": { "archived.*": null } }
but you must do this before updating any other settings. It feels like you should be able to deal with the archived settings at your leisure.I put together a test that reproduces this by adding this to FullClusterRestartIT.
The text was updated successfully, but these errors were encountered: