Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: run fuzz tests in parallel and generate coverage report #4960

Merged
merged 12 commits into from
Dec 23, 2024

Conversation

jouho
Copy link
Contributor

@jouho jouho commented Dec 6, 2024

Release Summary:

Resolved issues:

N/A

Description of changes:

There are two main changes in this PR:

  1. Generate coverage report from scheduled fuzz test.

Use LLVM's coverage tool to generate a coverage report for the scheduled fuzz test. The report is in HTML format, allowing viewers to see line coverage, function coverage, and branch coverage for the s2n-tls library. This is enabled only for the scheduled fuzz job, which uses improved corpus files. Fuzz tests that run against each PR do not use the improved corpus files, so they do not accurately represent the coverage we achieve.

  1. Changes fuzz tests to run in parallel

Currently we are running each fuzz test with multi-threads. We have data to indicate it does not effectively increase code coverage. Therefore we should run them in parallel with single core, which achieves as much code coverage as running them with multi-threads.

This is what a sample coverage looks like (generated locally):
image

  1. Refactor runFuzzTest.sh into smaller components

Currently runFuzzTest.sh script has too many logics living inside of it. This makes it very hard to understand what the script is actually doing, and is not very maintainable. I have broken it into smaller components which are:

  1. runFuzzTest.sh: responsible for running the actual fuzz test
  2. fuzz_corpus_download.sh: responsible for downloading corpus files from S3
  3. fuzz_corpus_upload.sh: responsible for uploading improved corpus files to S3
  4. fuzz_coverage_report.sh: responsible for generating coverage information

Call-outs:

We need to make few changes to s2nFuzzBatchScheduled job on CodeBuild after this PR merge.

  1. Change the job from "batch build" to "single build" as it is no longer a batch job (we run all tests in parallel). Without this change, the job will fail because updated buildspec no longer specifies batch build definition, and build project is still configured as a batch build. Edit: the Batch status are apparently not editable. We need to delete the old one and create a new one to make this change. This will be a good time to also change its name from s2nFuzzBatchScheduled to s2nFuzzScheduled or similar
  2. Configure report upload. At the end of scheduled run, we generate a folder containing all the HTML files for coverage report (defined as artifacts in buildspec_fuzz_scheduled.yml). We need to set the upload location on build project configuration.
  3. Change the compute to have at least 36 vCPUs. In order to run all the fuzz tests in parallel for 8 hours, we need one vCPU for each fuzz test we have. Currently there are 33 fuzz tests, therefore we need to upgrade the compute to use at lease 36 vCPUs, preferably 72.

The file diff is a bit large, but most of it is moving existing logics into different places so hopefully it's not too difficult to read.

Testing:

Running the new scheduled fuzz test with shorter runtime using this PR: Link to CodeBuild job
Overriding regular "per PR" fuzz test using this PR: Link to CodeBuild job

Finished setting up a new scheduled job for uploading coverage report. Coverage report is now available at: https://dx1inn44oyl7n.cloudfront.net/fuzz-coverage-report/index.html

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Dec 6, 2024
@jouho jouho changed the title make fuzz tests run in parallel and generate coverage report run fuzz tests in parallel and generate coverage report Dec 6, 2024
@jouho jouho changed the title run fuzz tests in parallel and generate coverage report CI: run fuzz tests in parallel and generate coverage report Dec 6, 2024
run all fuzz tests

fix syntax error

run tests in parallel

modify coverage summary

modify coverage output

update comment

simplify output

test shorter runtime

remove comments
@jouho jouho force-pushed the fuzz-tests-in-parellel-and-generate-coverage branch from 25b9682 to 224a017 Compare December 6, 2024 23:20
@jouho jouho marked this pull request as ready for review December 6, 2024 23:23
@jouho jouho requested a review from dougch as a code owner December 6, 2024 23:23
@jouho jouho requested a review from jmayclin December 6, 2024 23:23
Copy link
Contributor

@dougch dougch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor callout; CodeBuild job names and batch status, to date, have not been update-able; (the job will need to be deleted and recreated)

@@ -77,12 +33,20 @@ phases:
cmake . -Bbuild \
-DCMAKE_PREFIX_PATH=/usr/local/$S2N_LIBCRYPTO \
-DS2N_FUZZ_TEST=on \
-DFUZZ_TIMEOUT_SEC=27000
-DFUZZ_TIMEOUT_SEC=27000 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this an environment variable makes it easy to over-ride later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I had to make changes to this line to test for shorter duration. Will change this to an env var

# --timeout: override ctest's default timeout of 1500
- cmake --build build/ --target test -- ARGS="-L fuzz -R ${FUZZ_TESTS} --output-on-failure --timeout 28800"
- cmake --build build/ --target test -- ARGS="-L fuzz --output-on-failure -j $(nproc) --timeout 28800"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above; if timeout is an env. var we can over-ride it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I suspect we won't be running fuzz tests for more than 28800 secs though. I'm also not exactly sure how I make this into env, probably something like this?

cmake --build build/ --target test -- ARGS="-L fuzz --output-on-failure -j $(nproc) --timeout $TIMEOUT_VALUE"

artifacts:
# upload all files in the fuzz_coverage_report directory
files:
- '**/*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad file mask should keep this from failing, but have you tested the use of artifact uploads? (they have been flaky).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wildcard match looks a little bit suspicious to me. Won't this upload everything? I'd expect the pattern to be something more like fuzz_coverage_report/*.

Copy link
Contributor Author

@jouho jouho Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**/* means all files recursively, from the base-directory which is currently set to fuzz_coverage_report.
The generated report contains many nested folders so we want to make sure to include those.

documentation: https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#:~:text=artifacts/base%2Ddirectory

Alternatively I could probably just specify fuzz_coverage_report/**/* without the base-directory option, but wanted to keep consistency with how mainline coverage specifies it:

artifacts:
# upload all files in the coverage_report directory
files:
- '**/*'
base-directory: coverage_report

Either way, I will test this to confirm the behavior, and paste the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I just missed the "base-directory" field 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested uploading coverage files to S3 bucket and is working. I uploaded a folder test-fuzz-coverage to s2n-build-artifacts bucket and you should be able to see the report there

@@ -31,6 +31,9 @@ FUZZ_TIMEOUT_SEC=$2
BUILD_DIR_PATH=$3
CORPUS_UPLOAD_LOC=$4
ARTIFACT_UPLOAD_LOC=$5
FUZZ_COVERAGE=$6
S2N_ROOT=$7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using this variable anywhere? It's sort of a historical holddover from the Makefiles... and between CodeBuild and CMake, there should be existing equivalents defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, 7 CLI arguments is a lot, and I worry about the maintainability of this.

How can we make things simpler?

1 - Remove Unnecessary Options

My understanding is that right now BUILDER_DIR_PATH, CORPUS_UPLOAD_LOC, and ARTIFACT_UPLOAD_LOC are never actually set to anything else. And I'd imagine that if they were set to something else, a whole host of things would break. That's a good sign that these aren't actually configurable, and are probably better off being represented as constant values.

2 - Separation of Concerns

Generally it's best to "separate concerns". Don't unnecessarily couple things that don't need to be coupled. This applies to shell scripts that same way that it applies to actual code

// tricky and complex, probably not the best option
fn run_fuzz_tests_and_generate_coverage_report(arg1, arg2, arg3, arg4, arg5, arg6, arg6)

// a bit more manageable
fn run_fuzz_tests(arg1, arg2, arg3, arg4)
fn generate_coverage_report(arg5, arg6, arg7)

I'd imagine an ideal "coverage pipeline" to look something like this

  1. build the fuzz test binaries with instrumentation enabled
  2. download the corpus files
  3. run the fuzz tests (which is mostly agnostic as to whether or not coverage is enabled)
  4. upload the improved corpus files
  5. merge the coverage information/generate the report
  6. upload the report

It seems odd that all of the coverage reporting is living inside a script called runFuzzTests. Perhaps we could refactor coverage_report.sh to be slightly more generic, and then everything could use the same script?

3 - Scrutinize Coverage Assertions

I like the idea of asserting on some sort of coverage information. "feature coverage" from the fuzzer seems very nice and easy to assert on.

Asserting on coverage from the actual report seems to requires a big blob of sed/grep/regex all squished together, and that is not sparking joy.

What benefit does this complicated stuff provide that feature coverage doesn't? And are our thresholds/assertions well-researched enough to actually be useful? I know we found that session ticket coverage issue when manually reviewing the code, so I question the value of these more complex assertions.

Copy link
Contributor Author

@jouho jouho Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I should be able to remove BUILD_DIR_PATH and just give a hardcoded path to the build directory.

But CORPUS_UPLOAD_LOC, and ARTIFACT_UPLOAD_LOC, are s3 bucket locations (For example, "s3:://bucketName"). There is probably no security concerns to make it public, as it requires correct IAM roles to be able to access the bucket. Is it fine making such resource info into hard-coded const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some refactoring so it only takes 3 CLI arguments instead of 7.

Since runFuzzTest.sh is very large and hard to read, I have broken it into smaller components which are:

runFuzzTest.sh: responsible for running the actual fuzz test
fuzz_corpus_download.sh: responsible for downloading corpus files from S3
fuzz_corpus_upload.sh: responsible for uploading improved corpus files to S3
fuzz_coverage_report.sh: responsible for generating coverage information

On top of that, I removed couple of print statements that were hidden by CTest. I left the outputs that are useful for debugging in case fuzzing fails.

LINE_COV=$(echo $S2N_COV | cut -d' ' -f1)
FUNC_COV=$(echo $S2N_COV | cut -d' ' -f2)
BRANCH_COV=$(echo $S2N_COV | cut -d' ' -f3)
printf "Coverage Report:\nLine coverage: %s\nFunction coverage: %s\nBranch coverage: %s\n" "$LINE_COV" "$FUNC_COV" "$BRANCH_COV"
Copy link
Contributor

@maddeleine maddeleine Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are fuzz coverage for the entire repo(I assume). However, we don't fuzz test the entire repo. I would look for a more meaningful metric, like coverage of a single fuzz tested function. Otherwise it's probably better to leave this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the coverage stats for the entire repo.

Hmm yeah the numbers themselves are probably not very valuable without being able to see the details. I can remove this from the output. We will have access to these numbers from the report anyways.

For getting coverage of a single fuzz tested function, I'm not too sure how it will look like, but my guess is that each fuzz test has its own targeted functions (for example, s2n_deserialize_resumption_state_test is targeting s2n_deserialize_resumption_state) and we want to see how much of the targeted functions are covered by that test.

I'm afraid adding that might complicate this PR, so I will have a follow-up PR after this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue for this: #4966

@@ -46,7 +49,7 @@ ASAN_OPTIONS+="symbolize=1"
LSAN_OPTIONS+="log_threads=1"
UBSAN_OPTIONS+="print_stacktrace=1"
NUM_CPU_THREADS=$(nproc)
LIBFUZZER_ARGS+="-timeout=5 -max_len=4096 -print_final_stats=1 -jobs=${NUM_CPU_THREADS} -workers=${NUM_CPU_THREADS} -max_total_time=${FUZZ_TIMEOUT_SEC}"
LIBFUZZER_ARGS+="-timeout=5 -max_len=4096 -print_final_stats=1 -workers=${NUM_CPU_THREADS} -max_total_time=${FUZZ_TIMEOUT_SEC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LIBFUZZER_ARGS+="-timeout=5 -max_len=4096 -print_final_stats=1 -workers=${NUM_CPU_THREADS} -max_total_time=${FUZZ_TIMEOUT_SEC}"
LIBFUZZER_ARGS+="-timeout=5 -max_len=4096 -print_final_stats=1 -max_total_time=${FUZZ_TIMEOUT_SEC}"

The workers arg should not longer be needed (because everything is single threaded/run on a single core).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this in the latest commit

CMakeLists.txt Outdated
@@ -696,6 +696,12 @@ if (BUILD_TESTING)
set(ARTIFACT_UPLOAD_LOC "none")
endif()

if(COVERAGE)
set(FUZZ_COVERAGE "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that we can't just use COVERAGE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I can just use COVERAGE 😓. Thank you for pointing it out

artifacts:
# upload all files in the fuzz_coverage_report directory
files:
- '**/*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wildcard match looks a little bit suspicious to me. Won't this upload everything? I'd expect the pattern to be something more like fuzz_coverage_report/*.

@@ -31,6 +31,9 @@ FUZZ_TIMEOUT_SEC=$2
BUILD_DIR_PATH=$3
CORPUS_UPLOAD_LOC=$4
ARTIFACT_UPLOAD_LOC=$5
FUZZ_COVERAGE=$6
S2N_ROOT=$7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, 7 CLI arguments is a lot, and I worry about the maintainability of this.

How can we make things simpler?

1 - Remove Unnecessary Options

My understanding is that right now BUILDER_DIR_PATH, CORPUS_UPLOAD_LOC, and ARTIFACT_UPLOAD_LOC are never actually set to anything else. And I'd imagine that if they were set to something else, a whole host of things would break. That's a good sign that these aren't actually configurable, and are probably better off being represented as constant values.

2 - Separation of Concerns

Generally it's best to "separate concerns". Don't unnecessarily couple things that don't need to be coupled. This applies to shell scripts that same way that it applies to actual code

// tricky and complex, probably not the best option
fn run_fuzz_tests_and_generate_coverage_report(arg1, arg2, arg3, arg4, arg5, arg6, arg6)

// a bit more manageable
fn run_fuzz_tests(arg1, arg2, arg3, arg4)
fn generate_coverage_report(arg5, arg6, arg7)

I'd imagine an ideal "coverage pipeline" to look something like this

  1. build the fuzz test binaries with instrumentation enabled
  2. download the corpus files
  3. run the fuzz tests (which is mostly agnostic as to whether or not coverage is enabled)
  4. upload the improved corpus files
  5. merge the coverage information/generate the report
  6. upload the report

It seems odd that all of the coverage reporting is living inside a script called runFuzzTests. Perhaps we could refactor coverage_report.sh to be slightly more generic, and then everything could use the same script?

3 - Scrutinize Coverage Assertions

I like the idea of asserting on some sort of coverage information. "feature coverage" from the fuzzer seems very nice and easy to assert on.

Asserting on coverage from the actual report seems to requires a big blob of sed/grep/regex all squished together, and that is not sparking joy.

What benefit does this complicated stuff provide that feature coverage doesn't? And are our thresholds/assertions well-researched enough to actually be useful? I know we found that session ticket coverage issue when manually reviewing the code, so I question the value of these more complex assertions.

- split report generation logic from fuzz runner
- remove unused output
- use less arguments

modify script name

use correct file name

refactor into smaller components

add execute permissions

suppress noisy output

fix path to profraw

fix profraw path

fix profraw gen
@jouho jouho force-pushed the fuzz-tests-in-parellel-and-generate-coverage branch from 2fc5827 to ce73ffa Compare December 12, 2024 01:26
@jouho jouho changed the title CI: run fuzz tests in parallel and generate coverage report ci: run fuzz tests in parallel and generate coverage report Dec 13, 2024
Copy link
Contributor

@jmayclin jmayclin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, but overall a big improvement from the mega script 🥳

codebuild/spec/buildspec_fuzz_scheduled.yml Outdated Show resolved Hide resolved
tests/fuzz/runFuzzTest.sh Outdated Show resolved Hide resolved
codebuild/bin/fuzz_corpus_download.sh Outdated Show resolved Hide resolved
codebuild/bin/fuzz_coverage_report.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@dougch dougch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @jmayclin's changes...

- remove unset LD_PRELOADS
- fix script name in the usage
- use cmake to configure compiler
- remove unused options
@jouho jouho requested a review from jmayclin December 13, 2024 23:59
@jouho jouho enabled auto-merge December 23, 2024 17:53
@jouho jouho added this pull request to the merge queue Dec 23, 2024
Merged via the queue into aws:main with commit 23209c4 Dec 23, 2024
40 checks passed
@jouho jouho deleted the fuzz-tests-in-parellel-and-generate-coverage branch December 23, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants