-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/ui: add first stage of redesigned tracez page #88181
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @benbardin)
-- commits
line 8 at r1:
Can you please list the differences between this new page and the old one?
-- commits
line 8 at r1:
Why add a new page instead of modifying the existing one directly?
-- commits
line 14 at r1:
In the screenshot the Start Time is listed as "Today at 9:30 PM". Let's get at least second granularity in there. Also, if possible, I'd get rid of the "Today at" if the date is the current one.
-- commits
line 14 at r1:
How come the duration says 0 in the screenshot? For spans that are currently open (which normally are all the spans that are listed on this page), let's make the duration be the time when the snapshot was captured minus the start time. And let's give this one millisecond granularity.
-- commits
line 14 at r1:
How come you split the "Expandable Tags" in a different column from "Tags"? I think it'd probably be better to dump them in the same column, like currently. We need to get this table to be a lot denser.
-- commits
line 14 at r1:
Does each tag render on its own row? Let's keep them on one row, like currently.
-- commits
line 14 at r1:
Any chance we can make the horizontal space between rows in the table a lot smaller? And anything else we can do to generally make things denser.
-- commits
line 14 at r1:
Did the stack trace go away, and the ability to toggle recording? Are these going elsewhere?
The recording toggle had the byproduct that it told you whether a particular span in a snapshot still exists when the page was generated - which is a useful feature, albeit not exposed in the right way. Can we get some sort of a "live" marker next to the span's operation name?
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 38 at r1 (raw file):
const API_PREFIX = "_admin/v1"; export function listTracingSnapshots(): Promise<ListTracingSnapshotsResponseMessage> {
I also see these functions in ui/workspaces/db-console/src/util/api.ts
. What's the story?
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 44 at r1 (raw file):
null, null, "30M",
this is a 30 minute timeout? Is this better than no timeout?
If we keep it, can we type it as "30m"? Otherwise I thought it's megabytes or something, definitely not minutes.
e4ff424
to
f5cb487
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @andreimatei)
Previously, andreimatei (Andrei Matei) wrote…
Why add a new page instead of modifying the existing one directly?
So I can build it incrementally over a few PRs. I'll swap it in to the old route when it's done.
Previously, andreimatei (Andrei Matei) wrote…
Can you please list the differences between this new page and the old one?
done
Previously, andreimatei (Andrei Matei) wrote…
How come you split the "Expandable Tags" in a different column from "Tags"? I think it'd probably be better to dump them in the same column, like currently. We need to get this table to be a lot denser.
We could put them in the same column, but I don't think that's great. We'd still need to have one-tag-per-row, because we can't expand multiple tags if they're all in one row...in which case, we're worse off, since we have one column that's longer instead of two columns that are shorter.
I tried taking out the blue badge background, which saves a bunch of space. Let me know what you think!
Previously, andreimatei (Andrei Matei) wrote…
Does each tag render on its own row? Let's keep them on one row, like currently.
The figma calls for giving them their own row, and I think it's easier to work with this way. "all on one row" works well for a a small number of short tags, but I think is harder to manage when there's a large number of tags, or larger tags.
Previously, andreimatei (Andrei Matei) wrote…
Any chance we can make the horizontal space between rows in the table a lot smaller? And anything else we can do to generally make things denser.
sure! done
Previously, andreimatei (Andrei Matei) wrote…
How come the duration says 0 in the screenshot? For spans that are currently open (which normally are all the spans that are listed on this page), let's make the duration be the time when the snapshot was captured minus the start time. And let's give this one millisecond granularity.
Unexpected behavior in the format function I was using, thanks for catching! Rolled my own now. Duration is always captured_at - start_time, spans don't have a native duration field (I think?)
Previously, andreimatei (Andrei Matei) wrote…
In the screenshot the Start Time is listed as "Today at 9:30 PM". Let's get at least second granularity in there. Also, if possible, I'd get rid of the "Today at" if the date is the current one.
I'll just use standard formatting then!
Previously, andreimatei (Andrei Matei) wrote…
Did the stack trace go away, and the ability to toggle recording? Are these going elsewhere?
The recording toggle had the byproduct that it told you whether a particular span in a snapshot still exists when the page was generated - which is a useful feature, albeit not exposed in the right way. Can we get some sort of a "live" marker next to the span's operation name?
The stack trace will be added in a later diff. (See commit description)
You sure about the recording feature? I thought we looked at it and realized it wasn't actually fully implemented right now. Am I misremembering? Send me a screenshot?
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 38 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I also see these functions in
ui/workspaces/db-console/src/util/api.ts
. What's the story?
I think new components are generally built in cluster-ui, in case we want to use them in cloud console one day. Once we swap this component in for the old one, I'll delete the old components and API calls.
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 44 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
this is a 30 minute timeout? Is this better than no timeout?
If we keep it, can we type it as "30m"? Otherwise I thought it's megabytes or something, definitely not minutes.
I copied this bit. Seems weird, but maybe somebody built a trace once that took > 30s?
Regardless, we're stuck with the capital-M (grpc library insists) but I added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @benbardin)
We'd still need to have one-tag-per-row, because we can't expand multiple tags if they're all in one row...
Why not? I would expand in-line. We can invent some fancy visual cues like making all the tags in expanded group have a slightly-shaded background color or something, if need be. I think tags can be quite dense on this snapshot page. On the span details page they can be a lot more aired out, I'd say.
The way it looks now, the tags that were supposed to be "hidden" are the thing that stands out most on the page.
I tried taking out the blue badge background, which saves a bunch of space. Let me know what you think!
A big improvement.
Previously, benbardin (Ben Bardin) wrote…
The figma calls for giving them their own row, and I think it's easier to work with this way. "all on one row" works well for a a small number of short tags, but I think is harder to manage when there's a large number of tags, or larger tags.
Don't put too much stock in the Figma mock.
Particularly with "a large number of tags" you definitely don't want them one per row. I think high information density should be the number one feature of this page, trumping most other things. We want to encourage the use of tags, but conn_type: hostnossl
does not deserve to take up vertical space.
Previously, benbardin (Ben Bardin) wrote…
sure! done
Nice!
The stack trace will be added in a later diff. (See commit description)
If the stack trace will go on this snapshot page, consider adding it now if it's not too much trouble. I think the right time to place that is while we're thinking about these columns.
If we get a span snapshot page, we could perhaps delegate the stack trace exclusively there. But a great thing about the stack trace is that its searchable - so you can filter the spans by its contents. I don't think I want to let go of that. I'm thinking maybe the stack trace expands under the operation name, with a tiny button.
You sure about the recording feature? I thought we looked at it and realized it wasn't actually fully implemented right now. Am I misremembering? Send me a screenshot?
Am I sure about what specifically? The toggle is disabled for a span that no longer exists at the moment when the snapshot is loaded. What I'm interested in is having some marker for spans that still exist; I think that'll turn out pretty useful. And also some marker for spans that are recording would be very good, I think. I'm thinking some tiny icons around the operation name.
Actually toggling recording on/off doesn't need to happen from the snapshot page; I think we can delegate that to the span details page if indeed we're getting such a page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @benbardin)
I think high information density should be the number one feature of this page, trumping most other things. We want to encourage the use of tags, but conn_type: hostnossl does not deserve to take up vertical space.
BUT HOW WILL I KNOW IF ITS SSL 😂
Yeah, you're not wrong about that. But I think I come at this a little differently - to me usability is the number one "feature". Information density obviously plays a huge role here, but so does clarity of presentation. If I were to say that both values have value, and that they're in tension here - would we be aligned on that framing of the question?
What bothers me about a multiple-tags-per-column design is it presents three dimensions of data - rows, columns, and group-colors. I think that's pretty hard to take in at a glance, and it's also easy to miss things even when studying the cell closely.
Single-column rows, on the other hand, are very easy to take in, very easy to observe specific information. While they can take up more space, agreed, we can mitigate verbosity by leaning more on tag groups (now that support is better). hostnossl could for sure be hidden, for instance.
Basically, I look at these screenshots and think it's already much denser than the current view - and I'd rather spend that "extra" oomph on clarity rather than further density :) What do you think? Have I convinced? @adityamaru , any thoughts? I can set up some time next week to chat over this
a great thing about the stack trace is that its searchable - so you can filter the spans by its contents
I don't think it works like that now? At least, when I try to "filter" on the main page by a token that appears only in a stack trace, nothing comes up. Given that - the mocks do call for adding stack trace in a special column on the span table (you can take a peek at the figma). But looking at it now, I think it's weird to shove that column so far off the right side of the screen. I'd rather have a "get stack trace" on the span details page. What do you think? I'm glad to put the snapshot here if it belongs. But if search is indeed not including stack traces now, I don't want to add search to MVP (clearly we haven't missed it!)
The toggle is disabled for a span that no longer exists at the moment when the snapshot is loaded. What I'm interested in is having some marker for spans that still exist; I think that'll turn out pretty useful
Ah, I was confused. I think we missed that functionality - would have been a nice thing to catch in design review, but we'll figure it out now. Pushing toggle recording to the span details page makes a lot of sense to me. I've added some icons to the main table - pencil for "recording" and eye for "is active"! (See additional screenshot) What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @benbardin)
Yeah, you're not wrong about that. But I think I come at this a little differently - to me usability is the number one "feature". Information density obviously plays a huge role here, but so does clarity of presentation. If I were to say that both values have value, and that they're in tension here - would we be aligned on that framing of the question?
I would be aligned, except I don't buy that giving each tag a separate row adds any clarity of presentation. I can read horizontally, not only vertically.
What bothers me about a multiple-tags-per-column design is it presents three dimensions of data - rows, columns, and group-colors. I think that's pretty hard to take in at a glance, and it's also easy to miss things even when studying the cell closely.
How about we display all tags that are not part of a group on one line, and give each tag group a separate row? That would add some separation between groups, where it can be valuable. HTML skills permitting, each tag group could start expanded to the point where it takes up one line of the cell without wrapping, and when the expand button is clicked, it'd be allowed to wrap to more lines.
a great thing about the stack trace is that its searchable - so you can filter the spans by its contents
I don't think it works like that now?
This is how it used to be before the page was reactified; we must have lost it. It was quite useful for finding the spans you were interested in. Although it wasn't all-around great - you don't always want to include the trace in a search, and, with the trace being generally hidden, it was sometimes confusing why you were getting certain results when you didn't want to search in the stack. So, ideally, the search in stack traces would be separate from the search in op name and tags.
Anyway, I'm fine with showing the trace in the span details page for now and thinking about the search experience later.
I've added some icons to the main table - pencil for "recording" and eye for "is active"! (See additional screenshot) What do you think?
Thanks!
What are the semantics of the pencil? It shows whether the span was recording at the time of the snapshot, or currently? If it's the latter, then it applies exclusively to live spans, so I'm thinking we should combine the two icons into a single one: I'm thinking for example a green circle for live, and a red circle for recording.
Nit: If they stay as separate icons, can we vertically align the pencils with one another and the eyes with one another? So, reserve space for the icons on every row even when they're not present. FWIW, I don't think either the eye or the pencil are the most suggestive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @benbardin)
Previously, andreimatei (Andrei Matei) wrote…
a great thing about the stack trace is that its searchable - so you can filter the spans by its contents
I don't think it works like that now?
This is how it used to be before the page was reactified; we must have lost it. It was quite useful for finding the spans you were interested in. Although it wasn't all-around great - you don't always want to include the trace in a search, and, with the trace being generally hidden, it was sometimes confusing why you were getting certain results when you didn't want to search in the stack. So, ideally, the search in stack traces would be separate from the search in op name and tags.
Anyway, I'm fine with showing the trace in the span details page for now and thinking about the search experience later.I've added some icons to the main table - pencil for "recording" and eye for "is active"! (See additional screenshot) What do you think?
Thanks!
What are the semantics of the pencil? It shows whether the span was recording at the time of the snapshot, or currently? If it's the latter, then it applies exclusively to live spans, so I'm thinking we should combine the two icons into a single one: I'm thinking for example a green circle for live, and a red circle for recording.
Nit: If they stay as separate icons, can we vertically align the pencils with one another and the eyes with one another? So, reserve space for the icons on every row even when they're not present. FWIW, I don't think either the eye or the pencil are the most suggestive.
Great - let's revisit search later then. Should be easy enough to add a checkbox for "Search Stack"
And I really like the colored circles idea - I made the change and updated the (first only) screenshot in the PR description.
Previously, andreimatei (Andrei Matei) wrote…
Yeah, you're not wrong about that. But I think I come at this a little differently - to me usability is the number one "feature". Information density obviously plays a huge role here, but so does clarity of presentation. If I were to say that both values have value, and that they're in tension here - would we be aligned on that framing of the question?
I would be aligned, except I don't buy that giving each tag a separate row adds any clarity of presentation. I can read horizontally, not only vertically.
What bothers me about a multiple-tags-per-column design is it presents three dimensions of data - rows, columns, and group-colors. I think that's pretty hard to take in at a glance, and it's also easy to miss things even when studying the cell closely.
How about we display all tags that are not part of a group on one line, and give each tag group a separate row? That would add some separation between groups, where it can be valuable. HTML skills permitting, each tag group could start expanded to the point where it takes up one line of the cell without wrapping, and when the expand button is clicked, it'd be allowed to wrap to more lines.
I definitely like moving tag groups to their own column, but I'm not sure I love this approach on the whole, either. I think we may just disagree on what this should look like :) I'll loop in Tommy to decide.
302af4a
to
226f19d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made substantial revisions, as discussed offline. Take another look? Thank you!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @benbardin)
Hey David! Could you help specifically with typescript review? (If there's other stuff you want to talk too about I'm game!) Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, thanks!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)
-- commits
line 22 at r5:
One thing that I think would be great would be if this page let you select the node you want to interact with.
One way to do it is through the general proxy dropdown under Advanced Debug, but that's not very ergonomic. About that, I wonder how the caching that the page does interacts with that proxying functionality; do the caches end up invalidated somehow when a different node is selected?
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 38 at r1 (raw file):
Previously, benbardin (Ben Bardin) wrote…
I think new components are generally built in cluster-ui, in case we want to use them in cloud console one day. Once we swap this component in for the old one, I'll delete the old components and API calls.
let's perhaps ask @sjbarag if this is reasonable. I dream of a day when there's no db-console
vs cluster-ui
distinction. But also I don't know what a "workspace" is, so I don't actually understand what we're talking about.
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 44 at r1 (raw file):
Previously, benbardin (Ben Bardin) wrote…
I copied this bit. Seems weird, but maybe somebody built a trace once that took > 30s?
Regardless, we're stuck with the capital-M (grpc library insists) but I added a comment.
I'm thinking no timeout would be better than a 30 minute timeout, just because 30 minutes seems too arbitrary to me.
Otherwise this "list snapshots" should be a fast RPC, so I would also think that 30s should be sufficient. I think there are arguments for a timeout of this smaller magnitude. But anyway, my vote would stay for no timeout for all these RPCs.
Btw, do you know what happens to an inflight RPC when the browser tab is closed or refreshed? Is the server-side ctx canceled? I feel like I've tried to answer this once myself, but I forgot the conclusion.
pkg/ui/workspaces/cluster-ui/src/icon/circleFilled.tsx
line 19 at r5 (raw file):
export function CircleFilled(props: IconProps): React.ReactElement { const { className } = props;
does this line still make sense now that IconProps
has multiple fields?
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 107 at r5 (raw file):
const onSnapshotSelected = (item: string) => { history.location.pathname = "/debug/tracez_v2/snapshot/" + item; history.push(history.location);
I can't wait for the back button to do reasonable things
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 129 at r5 (raw file):
); return { name: "Snapshot " + id + ": " + time,
would it be easy to add the node id in here? Or perhaps not in this list, but next to the page's "Snapshots" title. I think it'd be useful if this page highlighted the fact that it's displaying data from one specific node.
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 154 at r5 (raw file):
</div> <section className={cx("section")}> <Loading
I don't know what I'm talking about, but does having a single Loading
element mean that the whole page will disappear for a second when a new snapshot is selected from the dropdown, instead of just the table disappearing?
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/spanTable.tsx
line 246 at r5 (raw file):
title="No spans to show" icon={<Nodes />} message="Spans provide debug information."
nit: consider dropping this message
pkg/ui/workspaces/db-console/src/app.tsx
line 333 at r5 (raw file):
<Route exact path="/debug/tracez_v2/snapshot/:snapshotID"
nice
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
"snapshots", snapshotsKey, moment.duration(10, "s"),
What's the thinking around this expiration time?
I'm also curious if/how we invalidate this cache when a new snapshot is taken. Or perhaps that functionality doesn't exist yet?
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 465 at r5 (raw file):
snapshotsKey, moment.duration(10, "s"), moment.duration(1, "minute"),
how does this timeout interact with the longer RPC timeout? Would it be better to not specify it?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 2 at r5 (raw file):
// Copyright 2022 The Cockroach Authors. //
for my edification, what does it mean that this file is in db-console
, and there's also a spanshotPage.tsx
in cluster-ui
?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 38 at r5 (raw file):
[ (state: AdminUIState) => state.cachedData.snapshots, (_state: AdminUIState, key: string) => key,
I'm curious why there's an underscore in _state
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 47 at r5 (raw file):
return null; } return snapshots[key];
this key is always "list"
IIUC. Can we get rid of it to make it clear that there's only one set of snapshots (unless we make the page talk to all the nodes)?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 84 at r5 (raw file):
const snapshotError = snapshotState ? snapshotState.lastError : null; return {
can we put keys on this literal's fields? Isn't it dangerous otherwise, if a new field is added for example?
Also consider not introducing the variables above, and doing everything inline in this return
.
49880a4
to
f011608
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @andreimatei, @dhartunian, and @sjbarag)
Previously, andreimatei (Andrei Matei) wrote…
We'd still need to have one-tag-per-row, because we can't expand multiple tags if they're all in one row...
Why not? I would expand in-line. We can invent some fancy visual cues like making all the tags in expanded group have a slightly-shaded background color or something, if need be. I think tags can be quite dense on this snapshot page. On the span details page they can be a lot more aired out, I'd say.
The way it looks now, the tags that were supposed to be "hidden" are the thing that stands out most on the page.I tried taking out the blue badge background, which saves a bunch of space. Let me know what you think!
A big improvement.
Done.
Previously, andreimatei (Andrei Matei) wrote…
Nice!
Done.
Previously, andreimatei (Andrei Matei) wrote…
One thing that I think would be great would be if this page let you select the node you want to interact with.
One way to do it is through the general proxy dropdown under Advanced Debug, but that's not very ergonomic. About that, I wonder how the caching that the page does interacts with that proxying functionality; do the caches end up invalidated somehow when a different node is selected?
Ooh, good catch. Nothing clever happens, and the page breaks exactly the way you'd expect.
The good news is I think I can fix this, and build a more robust node selector for this page generally. But I think that will be a lot easier to review in a dedicated diff. Would you be comfortable checking this code in provisionally, with the shared understanding that (a) not breaking over node proxying is a hard requirement for "launch", and (b) ergonomic node-switching is a fast-follow, barring unexpected technical complexities?
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 44 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm thinking no timeout would be better than a 30 minute timeout, just because 30 minutes seems too arbitrary to me.
Otherwise this "list snapshots" should be a fast RPC, so I would also think that 30s should be sufficient. I think there are arguments for a timeout of this smaller magnitude. But anyway, my vote would stay for no timeout for all these RPCs.
Btw, do you know what happens to an inflight RPC when the browser tab is closed or refreshed? Is the server-side ctx canceled? I feel like I've tried to answer this once myself, but I forgot the conclusion.
Sure, removing timeouts!
I can't answer authoritatively what happens to the RPC. But I'd imagine that we can't do anything too clever for a non-streamed request. That is, if a user closes the tab, I don't think the server can learn about it until it tries to write the response to the connection and fails. I'd further imagine that it just swallows the error silently, but again, not really sure!
pkg/ui/workspaces/cluster-ui/src/icon/circleFilled.tsx
line 19 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
does this line still make sense now that
IconProps
has multiple fields?
Hm, I think not. Removed.
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 107 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I can't wait for the back button to do reasonable things
:)
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 129 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
would it be easy to add the node id in here? Or perhaps not in this list, but next to the page's "Snapshots" title. I think it'd be useful if this page highlighted the fact that it's displaying data from one specific node.
easy, no, but doable yes. Discussed in the other comment.
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 154 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't know what I'm talking about, but does having a single
Loading
element mean that the whole page will disappear for a second when a new snapshot is selected from the dropdown, instead of just the table disappearing?
I think the Loading
just wraps whatever's in the render function? The SpanTable, in this case.
pkg/ui/workspaces/db-console/src/app.tsx
line 333 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nice
thank you!
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
What's the thinking around this expiration time?
I'm also curious if/how we invalidate this cache when a new snapshot is taken. Or perhaps that functionality doesn't exist yet?
The cache will be manually invalidated when a user hits the "take snapshot" button! (It exists in the current flow, but not yet in this one)
I hadn't thought about the expiration times, that's a good callout. I removed the timeout below entirely, as you suggested elsewhere.
Thinking through the invalidation timeouts:
- a GetSnapshot call is expensive and will never change (unless the node reboots, I suppose?). So I just changed this to 5m. A snapshot will not stick around forever, but switching back and forth should be easy.
- a GetSnapshots call is cheap, and will rarely change outside the context of hitting the take snapshot button...unless the user hits it in another tab. So I feel like 10s is fine here, though I'm happy to change if you have a different intuition, or later down the line.
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 465 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
how does this timeout interact with the longer RPC timeout? Would it be better to not specify it?
addressed above
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 2 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
for my edification, what does it mean that this file is in
db-console
, and there's also aspanshotPage.tsx
incluster-ui
?
I think this is true, but I'm not certain either:
- cluster-ui is the open source library defining shared components across cloud and core DB.
- the db-console/src/views directory defines (naturally) the views in the db console.
- Some of these views are written from scratch here; others are basically a wrapper around a component from cluster-ui, with boilerplate to hook up the API/redux calls.
Does that make sense?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 38 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm curious why there's an underscore in
_state
In general, I think by JS convention, a prefixing underscore means the parameter is unused.
That said, I don't really have a strong feel for how redux fits together like this, or why state is used in one function but not the other.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 47 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
this key is always
"list"
IIUC. Can we get rid of it to make it clear that there's only one set of snapshots (unless we make the page talk to all the nodes)?
yup, great idea. done!
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 84 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can we put keys on this literal's fields? Isn't it dangerous otherwise, if a new field is added for example?
Also consider not introducing the variables above, and doing everything inline in this
return
.
Eh? These are wrapped in {}
, not []
, which I think does what you mean? This is the advantage of not defining inline. Unless I misunderstand! But perhaps David can weigh in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sprinkled comments, but feel free to merge as far as I'm concerned, cause I'm speaking out of turn.
But I'd really like David and Sean to take a look, and perhaps opine on db-console vs cc-console things. I'll ping them offline.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)
Previously, benbardin (Ben Bardin) wrote…
Ooh, good catch. Nothing clever happens, and the page breaks exactly the way you'd expect.
The good news is I think I can fix this, and build a more robust node selector for this page generally. But I think that will be a lot easier to review in a dedicated diff. Would you be comfortable checking this code in provisionally, with the shared understanding that (a) not breaking over node proxying is a hard requirement for "launch", and (b) ergonomic node-switching is a fast-follow, barring unexpected technical complexities?
OK.
Is this kind of bad interaction between proxying and react caching a general problem for all our pages?
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
a GetSnapshot call is expensive and will never change (unless the node reboots, I suppose?)
Well the span liveness data (the red vs green bubbles) changes, doesn't it? So if there's caching involved and a page refresh doesn't do what I expect it to do, I'd consider an explicit refresh button.
Are you sure the caching is worth it? It definitely makes the code more complex and adds things you need to think about.
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 465 at r5 (raw file):
Previously, benbardin (Ben Bardin) wrote…
addressed above
ah i had actually misunderstood this to be the timeout for the RPC performed when the cache is missed, not a cache TTL
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 2 at r5 (raw file):
Previously, benbardin (Ben Bardin) wrote…
I think this is true, but I'm not certain either:
- cluster-ui is the open source library defining shared components across cloud and core DB.
- the db-console/src/views directory defines (naturally) the views in the db console.
- Some of these views are written from scratch here; others are basically a wrapper around a component from cluster-ui, with boilerplate to hook up the API/redux calls.
Does that make sense?
I can't say, but I've asked @sjbarag to take a look.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 84 at r5 (raw file):
Previously, benbardin (Ben Bardin) wrote…
Eh? These are wrapped in
{}
, not[]
, which I think does what you mean? This is the advantage of not defining inline. Unless I misunderstand! But perhaps David can weigh in here.
I was asking if this can be written with a syntax more similar to go's keyed literals:
return SnapshotPageStateProps {
sort: sortSetting.selector(state),
...
}
Is that a thing?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 37 at r6 (raw file):
type SnapshotState = Pick<AdminUIState, "cachedData", "snapshots">; const selectSnapshotsState = (state: SnapshotState) => {
I think this function would benefit from spelling out its return type.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 38 at r6 (raw file):
const selectSnapshotsState = (state: SnapshotState) => { if (!state.cachedData.snapshots) {
how is this different from return state.cachedData.snapshots
?
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 45 at r6 (raw file):
}; const selectSnapshotState = createSelector(
I don't know what I'm talking about, but does a selector like this provide any value? Would we get any more re-renderings if we got rid of it and simply did
const snapshotID = getMatchParamByName(props.match, "snapshotID");
snapshot = state.cachedData.snapshot[snapshotID]
in mapsStatToProps
? My understanding is that, if state.cachedData.snapshot[snapshotID]
repeatedly returns an object with the same identity (which one would assume is true), then that would be just as good.
Feel free to ignore if what you've done here is "the react way", but boy is this a mindfuck.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 47 at r6 (raw file):
const selectSnapshotState = createSelector( [ (state: AdminUIState) => state.cachedData.snapshot,
I think spelling out the type on this would be good, unless it's some magic non-spellable one.
It took me a good amount of time to (think that I) understand what's going on here. The snapshot[snapshotID]
below confused me greatly - how can you index into a "snapshot"? I think maybe the snapshot
param below should be renamed to snapshotsCache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for implementing this using some better structure @benbardin. I just have a few small nits.
Adding this page in cluster-ui
could enable us to add it to the superuser dashboard in CC, for instance. could be cool for TSE/SRE to use this directly in cloud when debugging issues. Might help with phasing out debug zip reliance 🤔
I'd rather move over to this new page sooner rather than later and then we can improve performance if that becomes an issue.
Reviewed 3 of 13 files at r1, 1 of 8 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @andreimatei, @benbardin, and @sjbarag)
pkg/ui/workspaces/cluster-ui/src/api/tracezApi.ts
line 38 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's perhaps ask @sjbarag if this is reasonable. I dream of a day when there's no
db-console
vscluster-ui
distinction. But also I don't know what a "workspace" is, so I don't actually understand what we're talking about.
the workspace
thing is a bit of red herring, it's a remnant of our use of yarn
workspaces but now it's just directories with no meaning.
As for adding components to cluster-ui
, the advantage there is that we can then embed them into CC Console directly. I am a bit skeptical that we would expose this directly to customers in cloud anytime soon, but there's no harm in building it this way from the start.
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 10 at r6 (raw file):
// by the Apache License, Version 2.0, included in the file // licenses/APL.txt. import { InlineAlert } from "@cockroachlabs/ui-components";
nit: line break between license and code
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/spanTable.tsx
line 132 at r6 (raw file):
}; const formatDuration = (d: moment.Duration): string => {
we don't have an existing time formatter to use? or could we just something built into moment?
Begins to implement https://www.figma.com/file/LUptGAxt5x1idq6eZLA50o/BulkIO-Traces?node-id=59%3A7958 When complete, the new pages will offer: - A robust route structure and full redux integration for better navigation. - Denser information display. - More information display, i.e. start time/duration, and expanding mutliple tag groups. - A new span details page, which will include child span information. For now, this page is behind a new route with no entrypoint. It implements the core route structure, redux selectors and API calls. Left for later PRs are Take Snapshot, the Span Details page, pagination, search filtering, and View Raw JSON. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @andreimatei, @dhartunian, and @sjbarag)
Previously, andreimatei (Andrei Matei) wrote…
OK.
Is this kind of bad interaction between proxying and react caching a general problem for all our pages?
I think it's likely an issue for every page that depends on node proxying, but I don't know if they all use react caching. I'll poke around as I work and see if I can get a better handle on the scale of the problem.
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/spanTable.tsx
line 132 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
we don't have an existing time formatter to use? or could we just something built into moment?
We have an existing one, but it doesn't do millisecond precision; moment turns out to be surprisingly bad at durations. As far as I can tell it lacks support for anything >24h, which seems risky here.
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
Well the span liveness data (the red vs green bubbles) changes
i am convinced the caching is not worth it.
Strictly speaking, this is an "invalidation period", not a "caching timeout" - the default is to save indefinitely. I'll set both invalidation periods to 1s, which I think will be functionally 👌 for our purposes.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 84 at r5 (raw file):
const snapshotID = getMatchParamByName(props.match, "snapshotID");
snapshot = state.cachedData.snapshot[snapshotID]
oh, I see! Yes - Looking through the directory, we use both ways - I'll go with your suggestion.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 37 at r6 (raw file):
snapshot = state.cachedData.snapshot[snapshotID]
I checked with david, he thinks you're probably right and we should just delete the selectors. Done.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 38 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
how is this different from
return state.cachedData.snapshots
?
Done.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 45 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't know what I'm talking about, but does a selector like this provide any value? Would we get any more re-renderings if we got rid of it and simply did
const snapshotID = getMatchParamByName(props.match, "snapshotID"); snapshot = state.cachedData.snapshot[snapshotID]
in
mapsStatToProps
? My understanding is that, ifstate.cachedData.snapshot[snapshotID]
repeatedly returns an object with the same identity (which one would assume is true), then that would be just as good.Feel free to ignore if what you've done here is "the react way", but boy is this a mindfuck.
Done.
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 47 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think spelling out the type on this would be good, unless it's some magic non-spellable one.
It took me a good amount of time to (think that I) understand what's going on here. Thesnapshot[snapshotID]
below confused me greatly - how can you index into a "snapshot"? I think maybe thesnapshot
param below should be renamed tosnapshotsCache
.
Done.
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
Previously, benbardin (Ben Bardin) wrote…
Well the span liveness data (the red vs green bubbles) changes
i am convinced the caching is not worth it.
Strictly speaking, this is an "invalidation period", not a "caching timeout" - the default is to save indefinitely. I'll set both invalidation periods to 1s, which I think will be functionally 👌 for our purposes.
Is caching for a second better than not caching at all? Cause it certainly is a lot more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @benbardin, @dhartunian, and @sjbarag)
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is caching for a second better than not caching at all? Cause it certainly is a lot more complicated.
btw, does refreshing the page blow away these caches? I assume that it didn't, but now I'm not so sure.
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to get to this! There's a few concerning things in the SnapshotsPage
component from a performance perspective that won't be apparent until there's at least 100 snapshots in the redux store, so we should take care of those before we get too much further with the implementation.
Great start @benbardin — React, Redux, and TypeScript have a pretty big learning curve, and adding in protobuf-generated types in a large project like CRDB makes it especially tricky 😃
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 41 at r7 (raw file):
snapshotError: Error | null; snapshotLoading: boolean; }
nit: consider using optional fields instead of a mandatory but nullable fields — it'll let you use something like fooSnapshotDadta?.snapshots
instead of having to &&
or use || null
when assigning props for this component.
export interface SnapshotPageStateProps {
sort: SortSetting;
snapshotsError?: Error;
snapshotsLoading: boolean;
snapshots: ListTracingSnapshotsResponseMessage;
snapshot: GetTracingSnapshotResponseMessage;
snapshotError?: Error;
snapshotLoading: boolean;
}
Code quote:
export interface SnapshotPageStateProps {
sort: SortSetting;
snapshotsError: Error | null;
snapshotsLoading: boolean;
snapshots: ListTracingSnapshotsResponseMessage;
snapshot: GetTracingSnapshotResponseMessage;
snapshotError: Error | null;
snapshotLoading: boolean;
}
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 69 at r7 (raw file):
// Sort Settings. const ascending = (searchParams.get("ascending") || undefined) === "true"; const columnTitle = searchParams.get("columnTitle") || "";
You can skip all of this by providing types in RouteComponentProps
above (or by using the useParams()
hook from react-router-dom
):
type UrlParams = Partial<Record<"ascending" | "columnTitle", string>>;
export type SnapshotPageProps = SnapshotPageStateProps &
SnapshotPageDispatchProps &
RouteComponentProps<UrlParams>;
export const SnapshotPage: React.FC<SnapshotPageProps> = props => {
const { history, match, /* ... */ } = props;
const ascending = match.props.ascending === "true";
const columnTitle = match.props.columnTitle || "";
This'll avoid having to re-parse history.location.search
for every render of this component, especially since react-router is going to do that for us either way :).
Code quote:
const searchParams = new URLSearchParams(history.location.search);
// Sort Settings.
const ascending = (searchParams.get("ascending") || undefined) === "true";
const columnTitle = searchParams.get("columnTitle") || "";
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 75 at r7 (raw file):
useEffect(() => { refreshSnapshots(); });
How often do we want this function to run? A useEffect()
with no dependency array will be re-run after every render, which is probably far more often than we need:
https://reactjs.org/docs/hooks-effect.html#example-using-hooks
Code quote:
useEffect(() => {
refreshSnapshots();
});
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 78 at r7 (raw file):
useEffect(() => { if (snapshotIDStr || !snapshots) {
Where would props.snapshots
ever be null or undefined? You declared it as a required property in the SnapshotPageProps
type above, so the || !snapshots
clause is either unnecessary or the SnapshotPageProps
type needs to change 😃
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 117 at r7 (raw file):
}; const snapshotItems = !snapshots
Watch out! This'll happen for every render cycle, which is probably not what you want to do if there's going to be lots of snapshots. That's a lot of repeated calculations! Consider using useMemo()
to have this only recompute when snapshots
or snapshots.snapshots
change (assumes snapshots
is indeed not-nullable; see above):
const snapshotItems = useMemo(() => snapshots.snapshots.map(snapshotInfo => {
const id = snapshotInfo.snapshot_id.toString();
const time = TimestampToMoment(snapshotInfo.captured_at).format(
"MMM D, YYYY [at] HH:mm:ss",
);
return {
name: "Snapshot " + id + ": " + time,
value: id,
};
}, [ snapshots, snapshots.snapshots ]);
Docs: https://reactjs.org/docs/hooks-reference.html#usememo
Code quote:
const snapshotItems = !snapshots
? []
: snapshots.snapshots.map(snapshotInfo => {
const id = snapshotInfo.snapshot_id.toString();
const time = TimestampToMoment(snapshotInfo.captured_at).format(
"MMM D, YYYY [at] HH:mm:ss",
);
return {
name: "Snapshot " + id + ": " + time,
value: id,
};
});
pkg/ui/workspaces/cluster-ui/src/tracez/snapshot/snapshotPage.tsx
line 144 at r7 (raw file):
snapshotItems.find(option => option["value"] === snapshotIDStr)[ "name" ]}
Watch out! This is an O(n) lookup through the array of snapshots that will happen on every re-render of this component (or its children), which will be quite expensive when there's lots of snapshots, just to find the name of the currently-selected one. If you use the useMemo()
approach above, we can use the existing iteration across those snapshots to pluck out the selected name, which would turn that useMemo
into:
const [snapshotItems, snapshotName] = useMemo(() => {
let selectedName = "";
const items = snapshots.snapshots.map(snapshotInfo => {
const id = snapshotInfo.snapshot_id.toString();
const time = TimestampToMoment(snapshotInfo.captured_at).format(
"MMM D, YYYY [at] HH:mm:ss",
);
const out = {
name: "Snapshot " + id + ": " + time,
value: id,
};
if (id === snapshotIDStr) {
selectedName = out.name;
}
return out;
});
return [items, selectedName];
}, [ snapshots, snapshots.snapshots, snapshotIDStr ]);
and the usage here would be as simple as {snapshotName}
Code quote:
{snapshotIDStr &&
snapshotItems.length &&
snapshotItems.find(option => option["value"] === snapshotIDStr)[
"name"
]}
pkg/ui/workspaces/db-console/src/redux/apiReducers.ts
line 464 at r5 (raw file):
Is caching for a second better than not caching at all? Cause it certainly is a lot more complicated.
Yep, it's an in-memory per-session cache from what I can tell: https://github.com/cockroachdb/cockroach/blob/e5930cf5c3e4f315930f771f88a17558af4ad694/pkg/ui/workspaces/db-console/src/redux/cachedDataReducer.ts
pkg/ui/workspaces/db-console/src/views/tracez_v2/snapshotPage.tsx
line 2 at r5 (raw file):
cluster-ui is the open source library defining shared components across cloud and core DB.
Correct!
the db-console/src/views directory defines (naturally) the views in the db console.
Also correct!
Some of these views are written from scratch here; others are basically a wrapper around a component from cluster-ui, with boilerplate to hook up the API/redux calls.
Also also correct. Eventually it'd be nice to not have to duplicate our selector logic in both places. @dhartunian I imagine this means eventually using redux slices per "vertical slice of functionality" and probably naming them after cluster IDs (so that there's minimal work to do on the CC Console side), but that's a discussion for later :)
89929: storage: delete deprecated multiIterator r=erikgrinaker a=msbutler Use the new NewPebbleSSTIterator() instead. Fixes #87943 Release note: None 89979: pkg/ui: performance fixes on redesigned tracez page r=benbardin a=benbardin Followup to #88181 Part of #83679 Release note: None Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Ben Bardin <[email protected]>
Begins to implement
https://www.figma.com/file/LUptGAxt5x1idq6eZLA50o/BulkIO-Traces?node-id=59%3A7958
When complete, the new pages will offer:
navigation.
tag groups.
For now, this page is behind a new route with no entrypoint.
It implements the core route structure, redux selectors and
API calls.
Left for later PRs are Take Snapshot, the Span Details page, pagination,
search filtering, and View Raw JSON.
Part of #83679
Release note: None