Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover cluster after multiple pod failures #366

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'main' into feature/parallel-recovery
Signed-off-by: Sebastian Woehrl <[email protected]>
swoehrl-mw committed Jan 31, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit e63770db71b98474776f00c40729f98eaa03f3bf
8 changes: 4 additions & 4 deletions docs/userguide/main.md
Original file line number Diff line number Diff line change
@@ -682,6 +682,10 @@ Internally you should use self-signed certificates (you can let the operator gen

The operator contains several features that automate management tasks that might be needed during the cluster lifecycle. The different available options are documented here.

### Cluster recovery

This operator automatically handles common failure scenarios and restarts crashed pods, normally this is done in a one-by-one fashion to maintain quorum and cluster stability. In case the operator detects several crashed or missing pods (for a nodepool) at the same time it will switch into a special recovery mode and start all pods at once and allow the cluster to form a new quorum.

### Rolling Upgrades

The operator supports automatic rolling version upgrades. To do so simply change the `general.version` in your cluster spec and reapply it:
@@ -879,10 +883,6 @@ The last thing that you have to do is to add that security configuration to your
generate: true
```

## Cluster recovery

This operator automatically handles common failure scenarios and restarts crashed pods, normally this is done in a one-by-one fashion to maintain quorum and cluster stability. In case the operator detects several crashed or missing pods (for a nodepool) at the same time it will switch into a special recovery mode and start all pods at once and allow the cluster to form a new quorum.

Changing the admin password after the cluster has been created is possible via the same way. You must update your securityconfig (in the `securityconfig-secret`) and the content of the `admin-credentials-secret` to both reflect the new password. Note that currently the operator cannot make changes in the securityconfig itself. As such you must always update the securityconfig in the secret with the new password and in addition provide it via the credentials secret so that the operator can still access the cluster.

### Custom Dashboards user
11 changes: 7 additions & 4 deletions opensearch-operator/pkg/helpers/constants.go
Original file line number Diff line number Diff line change
@@ -3,10 +3,13 @@ package helpers
import "os"

const (
DashboardConfigName = "opensearch_dashboards.yml"
DashboardChecksumName = "checksum/dashboards.yml"
ClusterLabel = "opster.io/opensearch-cluster"
NodePoolLabel = "opster.io/opensearch-nodepool"
DashboardConfigName = "opensearch_dashboards.yml"
DashboardChecksumName = "checksum/dashboards.yml"
ClusterLabel = "opster.io/opensearch-cluster"
NodePoolLabel = "opster.io/opensearch-nodepool"
OsUserNameAnnotation = "opensearchuser/name"
OsUserNamespaceAnnotation = "opensearchuser/namespace"
DnsBaseEnvVariable = "DNS_BASE"
)

func ClusterDnsBase() string {
You are viewing a condensed version of this merge commit. You can view the full changes here.