-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: roachtest: metamorphic ARM64 and FIPS clusters #104691
Merged
srosenberg
merged 2 commits into
cockroachdb:release-22.2
from
srosenberg:backport22.2-103710
Jun 14, 2023
Merged
release-22.2: roachtest: metamorphic ARM64 and FIPS clusters #104691
srosenberg
merged 2 commits into
cockroachdb:release-22.2
from
srosenberg:backport22.2-103710
Jun 14, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously, roachtests which benchmark performance (cf. correctness) were indistinguishable from correctness tests. That is, a performance test is like any other test with the exception of _optionally_ writing stats.json under 'Test.PerfArtifactsDir'; these artifacts are automatically exported to a gcs bucket, used in conjunction with the roachperf dashboard. Having no direct way to distinguish a performance test from a correctness test has several challenges. E.g., performance tests may require a specific machine type or architecture; background workloads like incremental backup may cause a performance regression; new metamorphic configurations like arm64 and fips may require a "bake-in" time before performance tests can be enabled. In future, the test runner may make specialized decisions (e.g., don't reuse a cluster) when executing a performance test. Thus, we need a (standard) mechanism to enumerate all performance tests. Given their specific requirements, the test author must explicitly opt in, by setting TestSpec.Benchmark to 'true'. This PR applies the above change retroactively, i.e., setting 'TestSpec.Benchmark' for all _known_ performance tests, including those which _assert_ on performance instead of exporting stats.json. It also fixes `roachtest list --bench` and `roachtest bench`, which were out-of-date, albeit not actively used. Epic: none Release note: None
srosenberg
requested review from
herkolategan and
renatolabs
and removed request for
a team
June 10, 2023 01:14
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
srosenberg
force-pushed
the
backport22.2-103710
branch
2 times, most recently
from
June 10, 2023 16:52
5abaa81
to
c9b482a
Compare
herkolategan
approved these changes
Jun 12, 2023
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
srosenberg
force-pushed
the
backport22.2-103710
branch
from
June 13, 2023 23:32
c9b482a
to
1fa4fac
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 2/2 commits from #103710.
/cc @cockroachdb/release
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code.
As of this PR, we add the capability to execute any roachtest in a cluster, configured with either
ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args:
metamorphic-arm64-probability
andmetamorphic-fips-probability
. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration.In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both
probabilities apply to compatible tests only.
Note, the two probabilties don't have to add up to 1. E.g.,
metamorphic-arm64-probability==0.4
,metamorphic-fips-probability==0.2
denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20%of the time; i.e., ~12% of all clusters will use FIPS.
Note, the values '0' and '1' are absolute. Setting both
to '0' is tantamount to the behavior before this PR.
Setting either to '1' enforces all clusters
are provisioned with either ARM64 or FIPS.
A test can specify its required architecture, in which
case, it takes precedence over metamorphic settings.
This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to
denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging.
We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing.
Epic: none
Release note: None
Release justification: ci/test only change
Resolves: #94957
Informs: #94986
[1] #99224
[2] #103243