Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: create perf artifacts concept and collect them #38889

Merged

Conversation

ajwerner
Copy link
Contributor

Before the roachtest rewrite in #30977 we used to collect the logs directory
from remote hosts unconditionally. We also used to write stats.json histogram
files to the logs directory. This combination enabled the collection of
histogram files for roachperf. Since the new roachtest does not unconditionally
copy logs to the test runner, we have not gotten any perf artifacts since the
change.

This PR introduces a new concept, perf artifacts, which live in a different
directory, the perfArtifactsDir. If a testSpec indicates that it
HasPerfArtifacts then that directory is copied to the test runner's
artifactsDir for that test from each node in the cluster upon success.

This change also necessitates a small change to roachperf to look in this new
"perf" directory.

Fixes #38871.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

Needs https://github.com/cockroachdb/roachperf/pull/38 to land in order to properly parse the new artifacts.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/cmd/roachtest/kv.go, line 67 at r1 (raw file):

				sequential = " --sequential"
			}
			t.ArtifactsDir()

Was this intentional?


pkg/cmd/roachtest/test.go, line 74 at r1 (raw file):

	// directory from all of the nodes if it is non-empty.
	// It will not cause an error if the perf artifacts directory does not exist.
	HasPerfArtifacts bool

I'm a little surprised that we don't just try to pull from this directory unconditionally. It seems like an easy thing to miss and have to debug every few months. Are you worried that doing so would be too expensive? Is it because of the logging you discuss below?

Before the roachtest rewrite in cockroachdb#30977 we used to collect the logs directory
from remote hosts unconditionally. We also used to write stats.json histogram
files to the logs directory. This combination enabled the collection of
histogram files for roachperf. Since the new roachtest only copies logs
to the test runner in the case of failure, we have not gotten any perf
artifacts since the change.

This log policy is sane for logs and other debug artifacts like heap profiles
that exist there but has been a bit of a bummer as we've missed out on the
dopamine rush due to seeing charts go up and to the right in bespoke D3 glory.
The truth is that we weren't all that excited about the previous behavior with
regards to perf data either. Sometimes tests would fail and their lousy results
would mess with our pretty pictures
(https://github.com/cockroachdb/roachperf/issues/37).

This PR introduces a new concept, perf artifacts, which live in a different
directory, the perfArtifactsDir ("perf") and has the opposite fetching policy
from the logs; the "perf" directory is copied from each remove host into the
test's artifactsDir only upon successful completion of the test.

One downside of this as implemented is that it can spam the test's log (but
fortunately not roachtest's logs anymore, <3 @andreimatei). cockroachdb#38890 attempts to
address this somewhat. It shouldn't been too big of a problem because it only
happens on tests that already passed.

This change also necessitates a small change to roachperf to look in this new
"perf" directory (and respect the rewrite's test artifact directory structure).

Fixes cockroachdb#38871.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/add-perf-artifacts-to-roachtest branch from 0492bb9 to 3cf2306 Compare July 16, 2019 02:28
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/cmd/roachtest/kv.go, line 67 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was this intentional?

No. Removed.


pkg/cmd/roachtest/test.go, line 74 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm a little surprised that we don't just try to pull from this directory unconditionally. It seems like an easy thing to miss and have to debug every few months. Are you worried that doing so would be too expensive? Is it because of the logging you discuss below?

Done. The error was a big part of the reason. I thought about typing code to have the testSpec tell the runner which nodes should have perf artifacts but it didn't seem super clean to generalize as tests like in tpcc determine their load nodes when they are run based on the shape of the cluster. I made #38890 to make it somewhat less egregious, plus it's only after tests pass.

A more drastic thing that could work is to change roachprod get to use rsync instead of scp and would enable fancier features like --ignore-missing-args. Not for this PR.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

@ajwerner
Copy link
Contributor Author

TFTR!

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Jul 16, 2019
38848: roachpb: further clarify correctness of observed timestamps r=nvanbenschoten a=nvanbenschoten

This commit expands on the explanation added in b7cc205. Specifically,
it focuses on the inability to bound a transaction's uncertainty interval
below the starting timestamp of the lease that it is executing a request
on. This limitation was added in 85d4627. I tend to forget about it once
every few months and get very confused how this all works.

Release note: None

38889: roachtest: create perf artifacts concept and collect them r=nvanbenschoten a=ajwerner

Before the roachtest rewrite in #30977 we used to collect the logs directory
from remote hosts unconditionally. We also used to write stats.json histogram
files to the logs directory. This combination enabled the collection of
histogram files for roachperf. Since the new roachtest does not unconditionally
copy logs to the test runner, we have not gotten any perf artifacts since the
change.

This PR introduces a new concept, perf artifacts, which live in a different
directory, the perfArtifactsDir. If a testSpec indicates that it
HasPerfArtifacts then that directory is copied to the test runner's
artifactsDir for that test from each node in the cluster upon success.

This change also necessitates a small change to roachperf to look in this new
"perf" directory.

Fixes #38871.

Release note: None

38895: roachprod: disable ssh logging r=ajwerner a=tbg

This was never fully introduced and there is currently no impetus to
push it over the finish line. It would be useful to diagnose issues
that are readily reproducible, but we haven't seen any of those.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 16, 2019

Build succeeded

@craig craig bot merged commit 3cf2306 into cockroachdb:master Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachperf: tpc-c data not present for July
3 participants