diff --git a/docs/faqs/index.md b/docs/faqs/index.md index 6e50cfe360..247adfcaf6 100644 --- a/docs/faqs/index.md +++ b/docs/faqs/index.md @@ -41,10 +41,10 @@ Not yet, but that functionality is currently being worked on. You can check if your nodes are snapshotting by looking at the `Background tasks` tab in the [M3DB Grafana dashboard](https://grafana.com/dashboards/8126). - **How do you list all available API endpoints?** -See [M3DB openhttps://m3db.io/openapi +See [M3DB OpenAPI](https://m3db.io/openapi). - **What is the recommended way to upgrade my M3 stack?** -TBA +See the [Upgrading M3](../operational_guide/upgrading_m3.md) guide. - **When graphing my Prometheus data in Grafana, I see gaps. How do I resolve this?** This is due to M3 having a concept of `null` datapoints whereas Prometheus does not. To resolve this, change `Stacking & Null value` to `Connected` under the `Visualization` tab of your graph. diff --git a/docs/operational_guide/bootstrapping_crash_recovery.md b/docs/operational_guide/bootstrapping_crash_recovery.md index b350622b64..d817dfafc6 100644 --- a/docs/operational_guide/bootstrapping_crash_recovery.md +++ b/docs/operational_guide/bootstrapping_crash_recovery.md @@ -97,7 +97,7 @@ In the general case, the node will use only the `filesystem` and `commitlog` boo Additionally, if it is a brand new placement where even the `peers` bootstrapper cannot fulfill the bootstrap, this will be detected by the `uninitialized_topology` bootstrapper which will succeed the bootstrap. -### filesystem,peers,uninitialized_topology (default) +### filesystem,peers,uninitialized_topology Everytime a node is restarted it will attempt to stream in all of the the data for any blocks that it has never flushed, which is generally the currently active block and possibly the previous block as well. This mode can be useful if you want to improve performance or save disk space by operating nodes without a commitlog, or want to force a repair of any unflushed blocks. This mode can lead to violations of M3DB's consistency guarantees due to the fact that commit logs are being ignored. In addition, if you lose a replication factors worth or more of hosts at the same time, the node will not be able to bootstrap unless an operator modifies the bootstrap consistency level configuration in etcd (see `peers` bootstrap section above). Finally, this mode adds additional network and resource pressure on other nodes in the cluster while one node is peer bootstrapping from them which can be problematic in catastrophic scenarios where all the nodes are trying to stream data from each other. diff --git a/docs/operational_guide/etcd.md b/docs/operational_guide/etcd.md index f348cc7f03..0d6ee98c70 100644 --- a/docs/operational_guide/etcd.md +++ b/docs/operational_guide/etcd.md @@ -16,7 +16,7 @@ Both `M3` and `etcd` are complex distributed systems, and trying to operate both Instead, we recommend running an external `etcd` cluster that is isolated from the `M3` stack so that performing operations like node adds, removes, and replaces are easier. -While M3 relies on `etcd` to provide strong consistency, the perations we use it for are all low-throughput so you should be able to operate a very low maintenance `etcd` cluster. [A 3-node setup for high availability](https://github.com/etcd-io/etcd/blob/v3.3.11/Documentation/faq.md#what-is-failure-tolerance) should be more than sufficient for most workloads. +While M3 relies on `etcd` to provide strong consistency, the operations we use it for are all low-throughput so you should be able to operate a very low maintenance `etcd` cluster. [A 3-node setup for high availability](https://github.com/etcd-io/etcd/blob/v3.3.11/Documentation/faq.md#what-is-failure-tolerance) should be more than sufficient for most workloads. ## Configuring an External etcd Cluster diff --git a/docs/operational_guide/repairs.md b/docs/operational_guide/repairs.md index 74bc2fa6ff..a191bd1b08 100644 --- a/docs/operational_guide/repairs.md +++ b/docs/operational_guide/repairs.md @@ -4,7 +4,7 @@ ## Overview -Background repairs enable M3DB to eventually reach a consistent state such that all nodes have identical view +Background repairs enable M3DB to eventually reach a consistent state such that all nodes have identical view. An M3DB cluster can be configured to repair itself in the background. If background repairs are enabled, M3DB nodes will continuously scan the metadata of other nodes. If a mismatch is detected, affected nodes will perform a repair such that each node in the cluster eventually settles on a consistent view of the data. A repair is performed individually by each node when it detects a mismatch between its metadata and the metadata of its peers. Each node will stream the data for the relevant series, merge the data from its peers with its own, and then write out the resulting merged dataset to disk to make the repair durable. In other words, there is no coordination between individual nodes during the repair process, each node is detecting mismatches on its own and performing a "best effort" repair by merging all available data from all peers into a new stream.