-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: timeseries: add per-node CPU and memory usage charts #23733
Conversation
Why did you remove the aggregate CPU time? Both are useful but at different times. We either need to keep both or have an option to switch between them. Ideally we'd show the aggregated view by default with some sort of alert to call attention to the per-node view if there is a significant discrepancy. |
Release note (admin ui change): add per-node CPU time graph
Release note (admin ui change): add per-node resident set size graph
Removed it because I didn't really see the utility of aggregated CPU (it doesn't help you debug which nodes are overloaded), but just added it back in and updated the screenshots — I guess it's helpful to see how much CPU you're using overall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but had a suggestion for tooltips.
<LineGraph | ||
title="Memory Usage (per-node RSS)" | ||
sources={nodeSources} | ||
tooltip={`Per-node Resident Set Size.`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've explained what RSS means anywhere else, including the aggregated Memory Usage chart where RSS is just referred to as "Total Memory".
I think (per-node Total Memory) is fine for the title, while the tooltip could be Total Memory used by the CockroachDB process on each node."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that something fancier would be nice, but is definitely out-of-scope for now.
One suggestion about the titles: "per node" and "per type" or something, since they're both (potentially) aggregated along different dimensions.
tooltip={`CPU usage per node, in CPU seconds per second. E.g. if a node has four | ||
cores all at 100%, the value will be 4.`} | ||
> | ||
<Axis units={AxisUnits.Count} label="cpu seconds / second"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend that we keep the units consistent between the two CPU charts.
@@ -17,7 +17,62 @@ export default function (props: GraphDashboardProps) { | |||
</LineGraph>, | |||
|
|||
<LineGraph | |||
title="Memory Usage" | |||
title="CPU Time (per-node)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd you swap the charts?
title="Memory Usage" | ||
title="CPU Time (per-node)" | ||
sources={nodeSources} | ||
tooltip={`CPU usage per node, in CPU seconds per second. E.g. if a node has four |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should either sum the user and system time or specifically call out that this is just user time. I think I prefer that we sum them, since system time is nontrivial.
Superseded by work under #27085 |
In #23379, @petermattis asked for the ability to switch from aggregated to per-node charts, to see what nodes are overloaded. That would be a fairly general change to our TS code; in the meantime it seems useful to at least be able to see per-node CPU and memory to get a basic sense of what nodes are overloaded.
Before:
CPU time aggregated across the cluster:
Memory usage by type, but not by node:
After: