Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config-rolling-update-strategy-for-operator #2015

Merged
merged 2 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs-2.0/nebula-operator/11.rolling-update-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# NebulaGraph cluster rolling update strategy

NebulaGraph clusters use a distributed architecture to divide data into multiple logical partitions, which are typically evenly distributed across different nodes. In distributed systems, there are usually multiple replicas of the same data. To ensure the consistency of data across multiple replicas, NebulaGraph clusters use the Raft protocol to synchronize multiple partition replicas. In the Raft protocol, each partition elects a leader replica, which is responsible for handling write requests, while follower replicas handle read requests.

When a NebulaGraph cluster created by NebulaGraph Operator performs a rolling update, a storage node temporarily stops providing services for the update. For an overview of rolling updates, see [Performing a Rolling Update](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/). If the node hosting the leader replica stops providing services, it will result in the unavailability of read and write operations for that partition. To avoid this situation, by default, Operator migrates the leader replicas to other unaffected nodes during the rolling update process of a NebulaGraph cluster. This way, when a storage node is being updated, the leader replicas on other nodes can continue processing client requests, ensuring the read and write availability of the cluster.

The process of migrating all leader replicas from one storage node to the other nodes may take a long time. To better control the rolling update duration, Operator provides a field called `enableForceUpdate`. When it is confirmed that there is no external access traffic, you can set this field to `true`. This way, the leader replicas will not be migrated to other nodes, thereby speeding up the rolling update process.

## Rolling update trigger conditions

Operator triggers a rolling update of the NebulaGraph cluster under the following circumstances:

- The version of the NebulaGraph cluster changes.
- The configuration of the NebulaGraph cluster changes.

## Specify a rolling update strategy

In the YAML file for creating a cluster instance, add the `spec.storaged.enableForceUpdate` field and set it to `true` or `false` to control the rolling update speed.

When `enableForceUpdate` is set to `true`, it means that the partition leader replicas will not be migrated, thus speeding up the rolling update process. Conversely, when set to `false`, it means that the leader replicas will be migrated to other nodes to ensure the read and write availability of the cluster. The default value is `false`.

!!! caution

When setting `enableForceUpdate` to `true`, make sure there is no traffic entering the cluster for read and write operations. This is because this setting will force the cluster pods to be rebuilt, and during this process, data loss or client request failures may occur.

Configuration example:

```yaml
...
spec:
...
storaged:
enableForceUpdate: true // When set to true, it speeds up the rolling update process.
...
```


4 changes: 4 additions & 0 deletions docs-2.0/nebula-operator/9.upgrade-nebula-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,3 +204,7 @@ You have created a NebulaGraph cluster with Helm. For details, see [Create a Neb
1 vesoft/nebula-metad:{{nebula.tag}}
3 vesoft/nebula-storaged:{{nebula.tag}}
```

## Accelerate the upgrade process

The upgrade process of a cluster is a rolling update process and can be time-consuming due to the state transition of the leader partition replicas in the Storage service. You can configure the `enableForceUpdate` field in the cluster instance's YAML file to skip the leader partition replica transfer operation, thereby accelerating the upgrade process. For more information, see [Specify a rolling update strategy](11.rolling-update-strategy.md).
4 changes: 3 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -703,13 +703,15 @@ nav:
- Deploy clusters:
- Deploy clusters with Kubectl: nebula-operator/3.deploy-nebula-graph-cluster/3.1create-cluster-with-kubectl.md
- Deploy clusters with Helm: nebula-operator/3.deploy-nebula-graph-cluster/3.2create-cluster-with-helm.md
- Connect to NebulaGraph databases: nebula-operator/4.connect-to-nebula-graph-service.md
- Configure clusters:
- Custom configuration parameters for a NebulaGraph cluster: nebula-operator/8.custom-cluster-configurations/8.1.custom-conf-parameter.md
- Reclaim PVs: nebula-operator/8.custom-cluster-configurations/8.2.pv-reclaim.md
#ent
- Balance storage data after scaling out: nebula-operator/8.custom-cluster-configurations/8.3.balance-data-when-scaling-storage.md
- Upgrade NebulaGraph clusters: nebula-operator/9.upgrade-nebula-cluster.md
- Connect to NebulaGraph databases: nebula-operator/4.connect-to-nebula-graph-service.md
- Specify a rolling update strategy: nebula-operator/11.rolling-update-strategy.md
#ent
- Backup and restore: nebula-operator/10.backup-restore-using-operator.md
- Self-healing: nebula-operator/5.operator-failover.md
- FAQ: nebula-operator/7.operator-faq.md
Expand Down