Skip to content

Commit

Permalink
infra: base-runner: coverage: set max parallel jobs to be half of CPU…
Browse files Browse the repository at this point in the history
… count (google#10277)

The current number of parallel fuzzers running is set to the number of
available CPUs. This is causing issues in Tensorflow:

```
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4501 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: decode_compressed_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4873 Killed                  lvm-cov show -instr-profile=$profdata_file -object=$target -line-coverage-gt=0 $shared_libraries $BRANCH_COV_ARGS $LL
VM_COV_COMMON_ARGS > ${TEXTCOV_REPORT_DIR}/$target.covreport
Step #5: /usr/local/bin/coverage: line 75:  4897 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
...
Step #5: error: saved_model_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
Step #5: /usr/local/bin/coverage: line 75:  4638 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:05,246 INFO] Finding shared libraries for targets (if any).
...
Step #5: [2023-05-08 11:57:09,911 INFO] Finished finding shared libraries for targets.
Step #5: /usr/local/bin/coverage: line 75:  4276 Killed                  llvm-cov expor -summary-only -instr-profile=$profdata_file -object=$target $shared_libraries $LLVM_COV_COMMON_ARGS > 
$FUZZER_STATS_DIR/$target.json
Step #5: /usr/local/bin/coverage: line 75:  5450 Killed                  llvm-profdata merge -j=1 -sparse $profraw_file_mask -o $profdata_file
Step #5: [2023-05-08 11:57:40,282 INFO] Finding shared libraries for targets (if any).
Step #5: [2023-05-08 11:57:40,323 INFO] Finished finding shared libraries for targets.
Step #5: error: end_to_end_fuzz: Failed to load coverage: No such file or directory
Step #5: error: Could not load coverage information
Step #5: error: No such file or directory: Could not read profile data!
```

[log](https://oss-fuzz-build-logs.storage.googleapis.com/log-050f4040-5009-4a23-81c4-9093922b4ffb.txt)
(don't open in a browser but `wget`/`curl` it, as it's quite a large
file and will probably annoy the browser).
I assume this is because the fuzzers take up lots of the memory. A
Tensorflow fuzzer can be ~3GB and there are ~50 fuzzers in Tensorflow,
so I think the artifacts read by `llvm-profdata merge` will eat up
memory, which consequently starts to crash processes on the system.

I could imagine this happens for more projects with many fuzzers of
large size?

Signed-off-by: David Korczynski <[email protected]>
Co-authored-by: Oliver Chang <[email protected]>
  • Loading branch information
DavidKorczynski and oliverchang authored Nov 23, 2023
1 parent 4168a98 commit f716590
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions infra/base-images/base-runner/coverage
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ TIMEOUT=1h
objects=""

# Number of CPUs available, this is needed for running tests in parallel.
# Set the max number of parallel jobs to be the CPU count and a max of 10.
NPROC=$(nproc)
MAX_PARALLEL_COUNT=10

CORPUS_DIR=${CORPUS_DIR:-"/corpus"}

Expand Down Expand Up @@ -364,9 +366,9 @@ for fuzz_target in $FUZZ_TARGETS; do
fi


# Do not spawn more processes than the number of CPUs available.
# Limit the number of processes to be spawned.
n_child_proc=$(jobs -rp | wc -l)
while [ "$n_child_proc" -eq "$NPROC" ]; do
while [[ "$n_child_proc" -eq "$NPROC" || "$n_child_proc" -gt "$MAX_PARALLEL_COUNT" ]]; do
sleep 4
n_child_proc=$(jobs -rp | wc -l)
done
Expand Down

0 comments on commit f716590

Please sign in to comment.