Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update source-monitoring-metrics.md #1395

Merged
merged 5 commits into from
Jun 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

After the Storage service is scaled out, you can decide whether to balance the data in the Storage service.

The scaling out of the Nebula Graph's Storage service is divided into two stages. In the first stage, the status of all pods is changed to `Ready`. In the second stage, the commands of `BALANCE DATA``BALANCE LEADER` are executed to balance data. These two stages decouple the scaling out process of the controller replica from the balancing data process, so that you can choose to perform the data balancing operation during low traffic period. The decoupling of the scaling out process from the balancing process can effectively reduce the impact on online services during data migration.
The scaling out of the Nebula Graph's Storage service is divided into two stages. In the first stage, the status of all pods is changed to `Ready`. In the second stage, the commands of `BALANCE DATA` and `BALANCE LEADER` are executed to balance data. These two stages decouple the scaling out process of the controller replica from the balancing data process, so that you can choose to perform the data balancing operation during low traffic period. The decoupling of the scaling out process from the balancing process can effectively reduce the impact on online services during data migration.

You can define whether to balance data automatically or not with the parameter `enableAutoBalance` in the configuration file of the CR instance of the cluster you created.

Expand Down
52 changes: 48 additions & 4 deletions docs-2.0/reuse/source-monitoring-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
| `num_query_errors_leader_changes` | The number of the raft leader changes due to query errors. |
| `num_query_errors` | The number of query errors. |
| `num_reclaimed_expired_sessions` | The number of expired sessions actively reclaimed by the server. |
| `num_rpc_sent_to_metad_failed` | The number of failed RPC requests that the Graphd service sends to the Metad service. |
| `num_rpc_sent_to_metad_failed` | The number of failed RPC requests that the Graphd service sent to the Metad service. |
| `num_rpc_sent_to_metad` | The number of RPC requests that the Graphd service sent to the Metad service. |
| `num_rpc_sent_to_storaged_failed` | The number of failed RPC requests that the Graphd service sent to the Storaged service. |
| `num_rpc_sent_to_storaged` | The number of RPC requests that the Graphd service sent to the Storaged service. |
Expand All @@ -25,7 +25,7 @@
| `optimizer_latency_us` | The latency of executing optimizer statements. |
| `query_latency_us` | The average latency of queries. |
| `slow_query_latency_us` | The average latency of slow queries. |
| `num_queries_hit_memory_watermark` | The number of queries that reached the memory watermark. |
| `num_queries_hit_memory_watermark` | The number of queries reached the memory watermark. |

### Meta

Expand All @@ -39,6 +39,12 @@
| `transfer_leader_latency_us` | The latency of transferring the raft leader. |
| `num_agent_heartbeats` | The number of heartbeats for the AgentHBProcessor.|
| `agent_heartbeat_latency_us` | The average latency of the AgentHBProcessor.|
| `replicate_log_latency_us` | The latency of replicating the log record to most nodes by Raft. |
| `num_send_snapshot` | The number of times that Raft sends snapshots to other nodes. |
| `append_log_latency_us` | The latency of replicating the log record to a single node by Raft. |
| `append_wal_latency_us` | The Raft write latency for a single WAL. |
| `num_grant_votes` | The number of times that Raft votes for other nodes. |
| `num_start_elect` | The number of times that Raft starts an election. |

### Storage

Expand Down Expand Up @@ -81,8 +87,36 @@
| `num_kv_remove_errors` | The number of execution errors for the RemoveProcessor.|
| `num_kv_remove` | The number of executions for the RemoveProcessor.|
| `forward_tranx_latency_us` | The average latency of transmission.|
| `scan_edge_latency_us` | The average latency of executions for the ScanEdgeProcessor. |
| `num_scan_edge_errors` | The number of execution errors for the ScanEdgeProcessor. |
| `num_scan_edge` | The number of executions for the ScanEdgeProcessor. |
| `scan_vertex_latency_us` | The latency of executions for the ScanVertexProcessor. |
| `num_add_edges` | The number of times that edges are added. |
| `num_add_edges_errors` | The number of errors when adding edges. |
| `num_add_vertices` | The number of times that vertices are added. |
| `num_start_elect` | The number of times that Raft starts an election. |
| `num_add_vertices_errors` | The number of errors when adding vertices. |
| `num_delete_vertices_errors` | The number of errors when deleting vertices. |
| `append_log_latency_us` | The latency of replicating the log record to a single node by Raft. |
| `num_grant_votes` | The number of times that Raft votes for other nodes. |
| `replicate_log_latency_us` | The latency of replicating the log record to most nodes by Raft. |
| `num_delete_tags` | The number of times that tags are deleted. |
| `num_delete_tags_errors` | The number of errors when deleting tags. |
| `num_delete_edges` | The number of edge deletions. |
| `num_delete_edges_errors` | The number of errors when deleting edges |
| `num_send_snapshot` | The number of times that snapshots are sent. |
| `update_vertex_latency_us` | The latency of executions for the UpdateVertexProcessor. |
| `append_wal_latency_us` | The Raft write latency for a single WAL. |
| `num_update_edge` | The number of executions for the UpdateEdgeProcessor. |
| `delete_tags_latency_us` | The average latency of deleting tags. |
| `num_update_edge_errors` | The number of execution errors for the UpdateEdgeProcessor. |
| `num_get_neighbors` | The number of executions for the GetNeighborsProcessor. |
| `num_get_prop_errors` | The number of execution errors for the GetPropProcessor. |
| `num_delete_vertices` | The number of times that vertices are deleted. |
| `num_lookup` | The number of executions for the LookupProcessor. |

### Space-level

### Graph space

| Parameter | Description |
| ---------------------------------------------- | ----------------------------------------- |
Expand All @@ -98,4 +132,14 @@
| `num_aggregate_executors` | The number of executions for the Aggregation operator. |
| `num_sort_executors` | The number of executions for the Sort operator. |
| `num_indexscan_executors` | The number of executions for index scan operators. |
| `num_oom_queries` | The number of queries that caused memory to run out. |
| `num_oom_queries` | The number of queries that caused memory to run out. |
| `num_auth_failed_sessions_bad_username_password` | The number of sessions where authentication failed due to incorrect username and password. |
| `num_auth_failed_sessions` | The number of sessions in which login authentication failed. |
| `num_opened_sessions` | The number of sessions connected to the server. |
| `num_queries_hit_memory_watermark` | The number of queries reached the memory watermark. |
| `num_reclaimed_expired_sessions` | The number of expired sessions actively reclaimed by the server. |
| `num_rpc_sent_to_metad_failed` | The number of failed RPC requests that the Graphd service sent to the Metad service. |
| `num_rpc_sent_to_metad` | The number of RPC requests that the Graphd service sent to the Metad service. |
| `num_rpc_sent_to_storaged_failed` | The number of failed RPC requests that the Graphd service sent to the Storaged service. |
| `num_rpc_sent_to_storaged` | The number of RPC requests that the Graphd service sent to the Storaged service. |
| `slow_query_latency_us` | The average latency of slow queries. |