Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: capture info about hardware and OS in diagnostics #28676

Merged
merged 1 commit into from
Aug 16, 2018

Conversation

dt
Copy link
Member

@dt dt commented Aug 15, 2018

Information about cores and memory commonly used to run Cockroach can help us size caches, tune recommendations, etc.

First draft of what we discussed. Will need to think about testing a bit though.

@dt dt added the do-not-merge bors won't merge a PR with this label. label Aug 15, 2018
@dt dt requested review from vilterp, BramGruneir and a team August 15, 2018 22:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

message HardwareInfo {
CPUInfo cpu = 1 [(gogoproto.nullable) = false];
MemInfo mem = 2 [(gogoproto.nullable) = false];
float loadavg = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that it's the 15 minute average

@@ -32,6 +32,7 @@ message DiagnosticReport {
map<string, int64> error_counts = 7;
map<int64, config.ZoneConfig> zone_configs = 8 [(gogoproto.nullable) = false];
map<string, int32> feature_usage = 9 [(gogoproto.nullable) = false];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint: remove empty line

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, we'll just have to make sure that there are values present and not default ones for everything except for the virtualization stuff I think. Just make sure it runs properly on TC and we can get a linux user to test on their machine too.

The proto needs a wee bit of cleanup, and we should take a look at the cloud stuff in another PR.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@dt dt force-pushed the hardware branch 3 times, most recently from 95fcd04 to d223a16 Compare August 16, 2018 18:05
Information about the machines we’re running on, such as number of cores
memory or OS version will help us identify commonly used deployment
Configurations that we can then consider when testing and tuning, e.g.
picking size caches, writing deployment recommendations, etc.

Release note(general change): Include hardware and OS information in the anonymous diagnostics reporting sent to Cockroach Labs if enabled.
@dt dt removed the do-not-merge bors won't merge a PR with this label. label Aug 16, 2018
@dt
Copy link
Member Author

dt commented Aug 16, 2018

as discussed, moved virtualization to Hardware since it can mess with the detected hardware numbers (e.g. seeing 4 x 1core cpus inside my docker builder session), added a few sanity-check level tests, and think this is RFAL.

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@dt
Copy link
Member Author

dt commented Aug 16, 2018

bors r+

craig bot pushed a commit that referenced this pull request Aug 16, 2018
28676: server: capture info about hardware and OS in diagnostics r=dt a=dt

Information about cores and memory commonly used to run Cockroach can help us size caches, tune recommendations, etc.

First draft of what we discussed. Will need to think about testing a bit though.

Co-authored-by: David Taylor <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 16, 2018

Build succeeded

@craig craig bot merged commit b815cc4 into cockroachdb:master Aug 16, 2018
@vilterp
Copy link
Contributor

vilterp commented Aug 16, 2018

Ben noted on a different PR that the CPU info returned by gopsutil isn't consistent across platforms in how it reports sockets vs cores vs threads: #27662 (review)

So we should use caution while interpreting those numbers, or find another library for CPU info (I don't know of one offhand). However, I think the runtime.NumCPU() field we're capturing here is an accurate measure of what we really care about, which is how many goroutines can we run at once.

@dt
Copy link
Member Author

dt commented Aug 16, 2018

@vilterp yep, we have numcpu in there first. the other stuff may or may not be interesting, but I'm inclined to trust numcpu over the sockets/cores numbers given my observations and what ben said there.

@vilterp
Copy link
Contributor

vilterp commented Aug 16, 2018

Yes, we should primarily look at numcpu.

@dt dt deleted the hardware branch August 16, 2018 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants