Themis is a collection of real-world, reproducible crash bugs (collected from open-source Android apps) and a unified, extensible infrastructure for benchmarking automated GUI testing for Android and beyond.
Publication (Presentation Video)
[1] "Benchmarking Automated GUI Testing for Android against Real-World Bugs". Ting Su, Jue Wang, Zhendong Su. 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021)
@inproceedings{themis,
author = {Ting Su and
Jue Wang and
Zhendong Su},
title = {Benchmarking Automated GUI Testing for Android against Real-World Bugs},
booktitle = {Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE)},
pages = {119--130},
year = {2021},
doi = {10.1145/3468264.3468620}
}
- Themis is using by ByteDance's FastBot to evaluate and improve its bug finding abilities. See the release note of FastBot: "add some new GUI fuzzing & mutation features (inspired/supported by Themis)"
Themis now contains 52 reproducible crash bugs. All these bugs are labeled by the app developers as "critical bugs" (i.e., important bugs), which affected the major app functionalities and the larger percentage of app users.
For each bug, we provide:
-
The original bug report of the bug
-
An executable APK (Jacoco-instrumented for coverage collection),
-
A bug-reproducing script and a video
-
The stack trace of the bug
-
Metadata for supporting evaluation (e.g., app login scripts and configuration files used by Themis for code coverage computation)
-
The app source code w.r.t each bug
Issue Id | App | Bug report, Bug data | Code Version | App Category | GitHub Stars | Reproducible (Android SDK)? | Network? | Login? | System Setting? |
---|---|---|---|---|---|---|---|---|---|
1 | AmazeFileManager | #1837, data | 3.4.2 | File Manager | 3.0K | 6.0/7.1 | no | no | no |
2 | AmazeFileManager | #1796, data | 3.3.2 | File Manager | 3.0K | 4.4/6.0/7.1 | no | no | no |
3 | AmazeFileManager | #1558, data | 3.3.2 | File Manager | 3.0K | 6.0/7.1 | no | no | no |
4 | AmazeFileManager | #1232, data | 3.2.1 | File Manager | 3.0K | 6.0/7.1 | no | no | no |
5 | ActivityDiary | #285, data | 1.4.0 | Personal Diary | 55 | 6.0/7.1 | no | no | no |
6 | ActivityDiary | #118, data | 1.1.8 | Personal Diary | 55 | 6.0/7.1 | no | no | no |
7 | and-bible | #375, data | 3.1.309 | Bible Reader | 231 | 4.4/6.0/7.1 | yes | no | no |
8 | and-bible | #697, data | 3.2.369 | Bible Reader | 231 | 6.0/7.1 | yes | no | no |
9 | and-bible | #261, data | 3.0.286 | Bible Reader | 231 | 6.0/7.1 | yes | no | no |
10 | and-bible | #703, data | 3.3.377 | Bible Reader | 231 | 6.0/7.1 | yes | no | no |
11 | and-bible | #480, data | 3.2.327 | Bible Reader | 231 | 4.4/6.0/7.1 | yes | no | no |
12 | Anki-Android | #4707, data | 2.9alpha4 | Card Learning | 2.8K | 7.1 | no | no | no |
13 | Anki-Android | #5638, data | 2.9.1 | Card Learning | 2.8K | 6.0/7.1 | no | no | no |
14 | Anki-Android | #4200, data | 2.6 | Card Learning | 2.8K | 4.4/6.0/7.1 | no | no | no |
15 | Anki-Android | #4451, data | 2.7 | Card Learning | 2.8K | 4.4/6.0/7.1 | no | no | no |
16 | Anki-Android | #6145, data | 2.10 | Card Learning | 2.8K | 4.4/6.0/7.1 | no | no | yes |
17 | Anki-Android | #5756, data | 2.9.4 | Card Learning | 2.8K | 4.4/6.0/7.1 | no | no | no |
18 | Anki-Android | #4977, data | 2.9 | Card Learning | 2.8K | 4.4/6.0/7.1 | no | no | no |
19 | APhotoManager | #116, data | 0.6.4 | Photo Manager | 148 | 4.4/6.0/7.1 | no | no | no |
20 | collect | #3222, data | 1.23.0 | Form Data Collector | 528 | 4.4/6.0/7.1 | yes | no | no |
21 | commons | #3244, data | 2.11.0 | Wikimedia | 601 | 6.0/7.1 | yes | yes | no |
22 | commons | #2123, data | 2.9.0 | Wikimedia | 601 | 6.0/7.1 | yes | yes | no |
23 | commons | #1391(Ref:#1329), data | 2.6.7 | Wikimedia | 601 | 6.0/7.1 | yes | yes | no |
24 | commons | #1385(Ref:#1329), data | 2.6.7 | Wikimedia | 601 | 6.0/7.1 | yes | yes | no |
25 | commons | #1581, data | 2.7.1 | Wikimedia | 601 | 6.0/7.1 | yes | yes | yes |
26 | FirefoxLite | #4881, data | 2.1.12 | Browser | 231 | 6.0/7.1 | yes | no | no |
27 | FirefoxLite | #5085, data | 2.1.20 | Browser | 231 | 6.0/7.1 | yes | no | no |
28 | FirefoxLite | #4942, data | 2.1.16 | Browser | 231 | 6.0/7.1 | yes | no | no |
29 | Frost-for-Facebook | #1323, data | 2.2.1 | Facebook Client | 377 | 6.0/7.1 | yes | yes | yes |
30 | geohashdroid | #73, data | 0.9.4 | Geohash | 13 | 4.4/6.0/7.1 | no | no | no |
31 | MaterialFBook | #224, data | 4.0.2 | Social | 122 | 6.0/7.1 | yes | yes | no |
32 | nextcloud | #5173, data | 3.10.0 | Productivity | 1.9K | 6.0/7.1 | yes | yes | no |
33 | nextcloud | #4026, data | 3.6.1 | Productivity | 1.9K | 6.0/7.1 | yes | yes | no |
34 | nextcloud | #4792, data | 3.9.2 | Productivity | 1.9K | 4.4/6.0/7.1 | yes | yes | no |
35 | nextcloud | #1918, data | 2.0.0 | Productivity | 1.9K | 6.0/7.1 | yes | yes | no |
36 | Omni-Notes | #745, data | 6.1.0 | Notebook | 2.1K | 4.4/6.0/7.1 | no | no | no |
37 | open-event-attendee | #2198, data | 0.5 | Social Events | 1.5K | 6.0/7.1 | yes | no | no |
38 | openlauncher | #67(Ref:#250), data | 0.3.1 | App Launcher | 256 | 6.0/7.1 | no | no | yes |
39 | osmeditor4android | #729, data | 11.0.0 | Map | 171 | 4.4/6.0/7.1 | no | no | no |
40 | osmeditor4android | #637, data | 0.9.10 | Map | 171 | 6.0/7.1 | no | no | no |
41 | Phonograph | #112, data | 0.15.0 | Music Player | 2.4K | 4.4/6.0/7.1 | no | no | no |
42 | Scarlet-Notes | #114, data | 6.9.5 | Notebook | 272 | 4.4/6.0/7.1 | no | no | no |
43 | sunflower | #239, data | 0.1.6 | Utility Tool | 12K | 4.4/6.0/7.1 | no | no | no |
44 | wordpress | #8659, data | 11.3 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
45 | wordpress | #7182, data | 9.2 | Social | 2.3K | 4.4/6.0/7.1 | yes | yes | no |
46 | wordpress | #6530, data | 8.1 | Social | 2.3K | 4.4/6.0/7.1 | yes | yes | no |
47 | wordpress | #11992, data | 14.9 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
48 | wordpress | #11135, data | 13.6 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
49 | wordpress | #10876, data | 13.7 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
50 | wordpress | #10547, data | 13.3 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
51 | wordpress | #10363, data | 13.1 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
52 | wordpress | #10302, data | 12.9 | Social | 2.3K | 6.0/7.1 | yes | yes | no |
Themis contains a unified, extensible infrastructure for benchmarking automated GUI testing for Android. Any testing tool can be easily integrated into this infrastructure and deployed on a given machine with one line of command.
Tool Name | Venue | Open-source | Main Technique | Need App Code? | Need App Instrumentation? | Supported SDKs | Implementation Basis |
---|---|---|---|---|---|---|---|
Monkey | - | yes | Random Testing | no | no | Any | - |
Sapienz | ISSTA'16 | no | Search-based | no | no | 4.4 | Monkey-based |
Stoat | FSE'17 | yes | Model-based | no | no | Any | A3E-based |
Ape | ICSE'19 | yes | Model-based | no | no | 6.0/7.1 | Monkey-based |
Humanoid | ASE'19 | yes | Deep learning-based | no | no | Any | DroidBot-based |
ComboDroid | ICSE'20 | yes | Model-based | no | yes | 6.0/7.1 | Monkey-based |
TimeMachine | ICSE'20 | yes | State-based | no | yes | 4.4/7.1 | Monkey-based |
Q-testing | ISSTA'20 | no | reinforcement learning-based | no | no | 4.4/7.1/9.0 | - |
usage: themis.py [-h] [--avd AVD_NAME] [--apk APK] [-n NUMBER_OF_DEVICES] [--apk-list APK_LIST] -o O [--time TIME] [--repeat REPEAT] [--max-emu MAX_EMU] [--no-headless] [--login LOGIN_SCRIPT]
[--wait IDLE_TIME] [--monkey] [--ape] [--timemachine] [--combo] [--combo-login] [--humanoid] [--stoat] [--sapienz] [--qtesting] [--weighted] [--offset OFFSET]
optional arguments:
-h, --help show this help message and exit
--avd AVD_NAME the device name
--apk APK
-n NUMBER_OF_DEVICES number of emulators created for testing, default: 1
--apk-list APK_LIST list of apks under test
-o O output dir
--time TIME the fuzzing time in hours (e.g., 6h), minutes (e.g., 6m), or seconds (e.g., 6s), default: 6h
--repeat REPEAT the repeated number of runs, default: 1
--max-emu MAX_EMU the maximum allowed number of emulators
--no-headless show gui
--login LOGIN_SCRIPT the script for app login
--wait IDLE_TIME the idle time to wait before starting the fuzzing
--monkey
--ape
--timemachine
--combo
--combo-login
--humanoid
--stoat
--sapienz
--qtesting
--offset OFFSET device offset number w.r.t emulator-5554
The directory structure of Themis is as follows:
Themis
|
|--- esecfse2021-paper1009.pdf the accepted paper of Themis
|
|--- scripts: scripts for running testing tools and analyzing testing results.
|
|--- themis.py: the main script for deploying themis.
|
|--- check_crash.py: the script to check whether a tool find the bugs.
|
|--- compute_coverage.py: the script to compute the code coverage achieved by a tool.
|
|--- compare_bug_triggering_time.py: the script to pairwisely compare bug-triggering times between different tools.
|
|--- run_monkey.sh the internal shell script to invoke Monkey, Ape, Humanoid, ComboDroid, TimeMachine and Q-testing
|--- run_ape.sh
|--- run_humanoid.sh
|--- run_qtesting.sh
|--- run_timemachine.sh
|--- run_combodroid.sh
|
|--- tools: the supported auotmated testing tools.
|
|--- Humanoid the tool Humanoid
|
|--- TimeMachine the tool TimeMachine
|
|--- Q-testing the tool Q-testing
|
|--- Ape the tool Ape
|
|--- ComboDroid the tool ComboDroid
|
|--- Monkey the tool Monkey
|
|--- app_1: The bugs collected from app_1.
|
|--- app_2: The bugs collected from app_2.
|
|--- ...
|
|--- app_N The bugs collected from app_n.
The instructions in this section was used for artifact evaluation.
In the artifact evaluation, we run Themis in Virtual Machine with all the required stuffs already installed and prepared.
You can follow the instructions in this section to get familar with Themis.
You can download the VM image Themis.ova
(15GB) from this link on Google Drive.
For using Themis for your own research, we recommend you to build and run Themis on a native machine (see 3. Instructions for Reusing Themis).
- You need to enable the virtualization technology in your computer's BIOS (see this link for how to enable the virtualization technology). Most computers by default already have this virtualization option turned on.
- Your computer needs at least 8 CPU cores (4 cores may also work), 16G of memory, and at least 40G of storage.
- We built our artifact by using VirtualBox v6.1.20. Please install VirtualBox based on your OS type. After installing VirtualBox, you may need to reboot the computer.
- Open VirtualBox, click "File", click "Import Appliance", then import the file named
Themis.ova
(this step may take about five to ten minutes to complete). - After the import is completed, you should see "vm" as one of the listed VMs in your VirtualBox.
- Click "Settings", click "System", click "Processor", and allocate 4-8 CPU cores (8-cores is preferred), and check "Enable Nested VT-x/AMD-V". Click "Memory", and set memory size to at least 8GB (16GB is preferred). Overall, you can allocate more memory and CPU cores if your system permits to ensure smooth evaluation.
- Run the virtual machine. The username is
themis
and the password isthemis-benchmark
. - If you could not run the VM with "Nested VT-x/AMD-V" option enabled in VirtualBox, you should check whether the Hyper-V option is enabled. You can disable the Hyper-V option (see this link for more information about this).
Take the quick test to get familar with Themis and validate whether it is ready.
Step 1. open a terminal and switch to Themis's scripts directory
cd the-themis-benchmark/scripts
Step 2. run Monkey on one target bug for 10 minutes
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../monkey-results/ --monkey
Here,
--no-headless
shows the emulator GUI.--avd Android7.1
specifies the name of the emulator (which has already been created in the VM).--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
specifies the target bug which isActivityDiary
's bug#118
inv1.1.8
.--time 10m
allocates 10 minutes for the testing tool to find the bug-o ../monkey-results/
specifies the output directory of testing results--monkey
specifies the testing tool
Expected results: you should see (1) an Android emulator is started, (2) the app ActivityDiary
is installed and started, (3) Monkey is started to test the app, (4) the following sample texts are outputted on the terminal during testing, and (5) the emulator is automatically closed at the end.
**click to see the sample output on the terminal of a successful run.**
allocate emulators: emulator-5554
the apk list to fuzz: ['../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk']
True
Now allocate the apk: ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk on emulator-5554
its login script: ""
wait the allocated devices to finish...
execute monkey: bash -x run_monkey.sh ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk emulator-5554 Android7.1 ../monkey-results/ 10m "" ""
+ APK_FILE=../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ AVD_SERIAL=emulator-5554
+ AVD_NAME=Android7.1
+ OUTPUT_DIR=../monkey-results/
+ TEST_TIME=10m
+ HEADLESS=
+ LOGIN_SCRIPT=
+ RETRY_TIMES=5
++ seq 1 5
+ for i in $(seq 1 $RETRY_TIMES)
+ echo 'try to start the emulator (emulator-5554)...'
try to start the emulator (emulator-5554)...
+ sleep 5
+ avd_port=5554
+ sleep 5
+ emulator -port 5554 -avd Android7.1 -read-only
emulator: Requested console port 5554: Inferring adb port 5555.
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
+ wait_for_device emulator-5554
+ avd_serial=emulator-5554
+ timeout 5s adb -s emulator-5554 wait-for-device
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ i=0
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#0 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#0 times) ...
+ sleep 5
++ expr 0 + 1
+ i=1
+ [[ 1 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#1 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#1 times) ...
+ sleep 5
++ expr 1 + 1
+ i=2
+ [[ 2 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#2 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#2 times) ...
+ sleep 5
++ expr 2 + 1
+ i=3
+ [[ 3 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#3 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#3 times) ...
+ sleep 5
++ expr 3 + 1
+ i=4
+ [[ 4 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#4 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#4 times) ...
+ sleep 5
++ expr 4 + 1
+ i=5
+ [[ 5 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#5 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#5 times) ...
+ sleep 5
++ expr 5 + 1
+ i=6
+ [[ 6 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#6 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#6 times) ...
+ sleep 5
++ expr 6 + 1
+ i=7
+ [[ 7 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#7 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#7 times) ...
+ sleep 5
++ expr 7 + 1
+ i=8
+ [[ 8 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#8 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#8 times) ...
+ sleep 5
++ expr 8 + 1
+ i=9
+ [[ 9 == 10 ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ echo ' Waiting for emulator (emulator-5554) to fully boot (#9 times) ...'
Waiting for emulator (emulator-5554) to fully boot (#9 times) ...
+ sleep 5
++ expr 9 + 1
+ i=10
+ [[ 10 == 10 ]]
+ echo 'Cannot connect to the device: (emulator-5554) after (#10 times)...'
Cannot connect to the device: (emulator-5554) after (#10 times)...
+ break
++ adb -s emulator-5554 shell getprop init.svc.bootanim
adb: device offline
+ OUT=
+ [[ '' != \s\t\o\p\p\e\d ]]
+ adb -s emulator-5554 emu kill
OK: killing emulator, bye bye
OK
+ echo 'try to restart the emulator (emulator-5554)...'
try to restart the emulator (emulator-5554)...
+ [[ 10 == RETRY_TIMES ]]
+ for i in $(seq 1 $RETRY_TIMES)
+ echo 'try to start the emulator (emulator-5554)...'
try to start the emulator (emulator-5554)...
+ sleep 5
emulator: WARNING: Not saving state: RAM not mapped as shared
+ avd_port=5554
+ sleep 5
+ emulator -port 5554 -avd Android7.1 -read-only
emulator: Requested console port 5554: Inferring adb port 5555.
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
qemu-system-i386: warning: TSC frequency mismatch between VM (2903985 kHz) and host (2903991 kHz), and TSC scaling unavailable
+ wait_for_device emulator-5554
+ avd_serial=emulator-5554
+ timeout 5s adb -s emulator-5554 wait-for-device
++ adb -s emulator-5554 shell getprop init.svc.bootanim
+ OUT=stopped
+ i=0
+ [[ stopped != \s\t\o\p\p\e\d ]]
++ adb -s emulator-5554 shell getprop init.svc.bootanim
+ OUT=stopped
+ [[ stopped != \s\t\o\p\p\e\d ]]
+ break
+ echo ' emulator (emulator-5554) is booted!'
emulator (emulator-5554) is booted!
+ adb -s emulator-5554 root
restarting adbd as root
++ date +%Y-%m-%d-%H-%M-%S
+ current_date_time=2021-05-26-16-58-44
++ basename ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ apk_file_name=ActivityDiary-1.1.8-debug-#118.apk
+ result_dir=../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ mkdir -p ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ echo '** CREATING RESULT DIR (emulator-5554): ' ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
** CREATING RESULT DIR (emulator-5554): ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ [[ '' != '' ]]
+ adb -s emulator-5554 install -g ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
+ echo '** INSTALL APP (emulator-5554)'
** INSTALL APP (emulator-5554)
+ sleep 10
++ aapt dump badging ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
++ grep package
++ awk '{print $2}'
++ sed s/name=//g
++ sed 's/'\''//g'
+ app_package_name=de.rampro.activitydiary.debug
+ echo '** PROCESSING APP (emulator-5554): ' de.rampro.activitydiary.debug
** PROCESSING APP (emulator-5554): de.rampro.activitydiary.debug
+ echo '** START LOGCAT (emulator-5554) '
** START LOGCAT (emulator-5554)
+ adb -s emulator-5554 logcat -c
+ echo '** START COVERAGE (emulator-5554) '
** START COVERAGE (emulator-5554)
+ adb -s emulator-5554 logcat AndroidRuntime:E CrashAnrDetector:D System.err:W CustomActivityOnCrash:E ACRA:E WordPress-EDITOR:E '*:F' '*:S'
+ echo '** RUN MONKEY (emulator-5554)'
** RUN MONKEY (emulator-5554)
+ adb -s emulator-5554 shell date +%Y-%m-%d-%H:%M:%S
+ bash dump_coverage.sh emulator-5554 de.rampro.activitydiary.debug ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44
+ timeout 10m adb -s emulator-5554 shell monkey -p de.rampro.activitydiary.debug -v --throttle 200 --ignore-crashes --ignore-timeouts --ignore-security-exceptions --bugreport 1000000
+ tee ../monkey-results//ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2021-05-26-16-58-44/monkey.log
:Monkey: seed=1622177363949 count=1000000
:AllowPackage: de.rampro.activitydiary.debug
:IncludeCategory: android.intent.category.LAUNCHER
:IncludeCategory: android.intent.category.MONKEY
// Event percentages:
// 0: 15.0%
// 1: 10.0%
// 2: 2.0%
// 3: 15.0%
// 4: -0.0%
// 5: -0.0%
// 6: 25.0%
// 7: 15.0%
// 8: 2.0%
// 9: 2.0%
// 10: 1.0%
// 11: 13.0%
:Switch: #Intent;action=android.intent.action.MAIN;category=android.intent.category.LAUNCHER;launchFlags=0x10200000;component=de.rampro.activitydiary.debug/de.rampro.activitydiary.ui.main.MainActivity;end
// Allowing start of Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] cmp=de.rampro.activitydiary.debug/de.rampro.activitydiary.ui.main.MainActivity } in package de.rampro.activitydiary.debug
:Sending Trackball (ACTION_MOVE): 0:(3.0,0.0)
:Sending Touch (ACTION_DOWN): 0:(6.0,111.0)
:Sending Touch (ACTION_UP): 0:(75.08969,0.25681877)
:Sending Touch (ACTION_DOWN): 0:(461.0,414.0)
:Sending Touch (ACTION_UP): 0:(463.19458,413.57645)
:Sending Touch (ACTION_DOWN): 0:(443.0,1154.0)
:Sending Touch (ACTION_UP): 0:(337.0955,1190.089)
:Sending Touch (ACTION_DOWN): 0:(392.0,379.0)
:Sending Touch (ACTION_UP): 0:(404.1744,297.83728)
:Sending Trackball (ACTION_MOVE): 0:(-1.0,-4.0)
:Sending Touch (ACTION_DOWN): 0:(739.0,89.0)
:Sending Touch (ACTION_UP): 0:(800.0,146.10786)
:Sending Trackball (ACTION_MOVE): 0:(-3.0,4.0)
:Sending Touch (ACTION_DOWN): 0:(749.0,443.0)
:Sending Touch (ACTION_UP): 0:(747.92065,456.49356)
:Sending Touch (ACTION_DOWN): 0:(485.0,814.0)
:Sending Touch (ACTION_UP): 0:(492.87933,804.6225)
:Sending Trackball (ACTION_MOVE): 0:(-4.0,-4.0)
:Sending Trackball (ACTION_MOVE): 0:(0.0,2.0)
:Sending Touch (ACTION_DOWN): 0:(454.0,119.0)
:Sending Touch (ACTION_UP): 0:(465.00302,121.41099)
:Sending Touch (ACTION_DOWN): 0:(476.0,305.0)
:Sending Touch (ACTION_UP): 0:(476.1956,313.51404)
+ adb -s emulator-5554 shell date +%Y-%m-%d-%H:%M:%S
+ echo '** STOP MONKEY (emulator-5554)'
** STOP MONKEY (emulator-5554)
++ adb -s emulator-5554 shell ps
++ grep monkey
++ awk '{print $2}'
+ adb -s emulator-5554 shell kill 17521
+ echo '** STOP COVERAGE (emulator-5554)'
** STOP COVERAGE (emulator-5554)
++ ps aux
++ grep 'dump_coverage.sh emulator-5554'
++ grep -v grep
++ awk '{print $2}'
+ kill 1248836
+ echo '** STOP LOGCAT (emulator-5554)'
** STOP LOGCAT (emulator-5554)
++ ps aux
++ grep 'emulator-5554 logcat'
++ grep -v grep
++ awk '{print $2}'
+ kill 1248835
+ sleep 5
+ adb -s emulator-5554 emu kill
OK: killing emulator, bye bye
OK
+ echo '@@@@@@ Finish (emulator-5554): ' de.rampro.activitydiary.debug @@@@@@@
@@@@@@ Finish (emulator-5554): de.rampro.activitydiary.debug @@@@@@@
Step 3. inspect the output files
If Step 2 succeeds, you can see the outputs under ../monkey-results/
(i.e., /home/themis/the-themis-benchmark/monkey-results
).
$ cd ../monkey-results/
$ ls
ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/ # the output directory
$ cd ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/
$ ls
coverage_1.ec # the coverage data file (used for computing coverage)
coverage_2.ec
install.log # the log of app installation
logcat.log # the system log of emulator (this file contains the crash stack traces if the target bug was triggered)
monkey.log # the log of Monkey (including the events that Monkey generates)
monkey_testing_time_on_emulator.txt # the first line is the starting testing time, and the second line is the ending testing time
How to validate: If you can see all these files and these files are non-empty (use ls -l
to check), the quick test succeeds. Note that the number of coverage data files (e.g., coverage_1.ec
) varies according to the testing time. In practice, Themis notifies an app to dump coverage data every five minutes.
Please note that the outuput files of different testing tools may vary (but all the other tools have these similar types of output files like Monkey
).
I. The supported tools
Themis now supports and maintains 6 state-of-the-art fully-automated testing tools for Android (see below). These tools can be cloned from Themis's repositories and are put under the-themis-benchmark/tools
.
Monkey
: distributed with Android SDKsApe
: https://github.com/the-themis-benchmarks/ape-bincombodroid
: https://github.com/the-themis-benchmarks/combodroidHumanoid
: https://github.com/the-themis-benchmarks/Humanoid, which depends ondroidbot
(https://github.com/the-themis-benchmarks/droidbot/tree/themis-branch)Q-testing
: https://github.com/the-themis-benchmarks/Q-testingTimeMachine
: https://github.com/the-themis-benchmarks/TimeMachine
Note that these tools are the modified/enhanced versions of their originals because we coordinated with the authors of these tools to assure correct and rigorous setup (e.g., report the encountered tool bugs to the authors for fixing). We tried our best efforts to minimize the bias and ensure that each tool is at "its best state" in bug finding (see Section 3.2 in the Themis's paper).
Specifically, we track the tool modifications to facilitate review and validation. We spent slight efforts to integrate Monkey
, Ape
, Humanoid
and Q-testing
into Themis. Combodroid
was modifled by its author and intergrated into Themis, while TimeMachine
was modified by us and later verified by its authors (view this commit to check all the modifications/enhancements made in TimeMachine
).
II. Running the supported tools
** In the following, we take Monkey
as a tool and ActivityDiary-1.1.8-debug-#118.apk
as a target bug to illustrate
how to replicate the whole evaluation, and how to validate the artifact if you do not have enough resources/time**
[Replicate the whole evaluation]:
Step 1. run Monkey
on ActivityDiary-1.1.8-debug-#118.apk
for 6 hours and repeat this process for 5 runs.
This step will take 30 hours to finish because of 5 runs of testing on one emulator. We do not recommend to run
more than one emulators in the VM because of the limited memory and performance. If you do not have enough resource/time, see the instruction below.
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk -n 1 --repeat 5 --time 6h -o ../monkey-results/ --monkey
Here,
--no-headless
shows the emulator GUI--avd Android7.1
specifies the name of the emulator (which has already been created in the VM).--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
specifies the target bug which isActivityDiary
's bug#118
inv1.1.8
.-n 1
denotes one emulator instance will be created (in practice at most 16 emulators are allowed to run in parallel on one native machine)--repeat 5
denotes the testing process will be repeated for 5 runs (these 5 runs will be distributed to the available emulator instances)--time 6h
allocates 6 hours for the testing tool to find the bug-o ../monkey-results/
specifies the output directory of testing results--monkey
specifies the testing tool
[Validate the artifact if you do not have enough resources/time: run 1-2 tools on 1-2 bugs at your will with limited testing time]:
For example, in Step 1, we recommend you to shorten the testing time (e.g., use --time 1h
for 1 hour or --time 30m
for 30 minutes).
Thus, you can use the following command (this step will take 2 hours to finish because of 2 runs of testing on one emulator).
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk -n 1 --repeat 2 --time 1h -o ../monkey-results/ --monkey
Step 2. When the testing terminates, you can inspect whether the target bug was found or not in each run, how long does it take to find the bug, and how many times the bug was found by using the command below.
python3 check_crash.py --monkey -o ../monkey-results/ --app ActivityDiary --id \#118 --simple
Here,
--app ActivityDiary
specifies the target app (You can omit this option to check all the tested apps and their bugs).--id \#118
specifies the target bug id (You can omit this option to check all the target bugs of appActivityDiary
).--simple
outputs the checking result to the terminal for quick check (You can substitute--simple
with--csv FILE_PATH
to output the checking results into a CSV file).- Use
-h
to see the detailed list of command options.
An example output could be (In this case, the target bug, ActivityDiary
's #118
, was not found by Moneky in all the five runs):
**click to see the sample output.**
ActivityDiary
=========
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27) [ActivityDiary, #118] testing time: 2020-06-24-20:39:33 the start testing time is: 2020-06-24-20:39:33 the start testing time (parsed) is: 2020-06-24 20:39:33
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5556.Android7.1#2020-06-24-20:39:32) [ActivityDiary, #118] testing time: 2020-06-24-20:39:36 the start testing time is: 2020-06-24-20:39:36 the start testing time (parsed) is: 2020-06-24 20:39:36
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5558.Android7.1#2020-06-24-20:39:37) [ActivityDiary, #118] testing time: 2020-06-24-20:39:43 the start testing time is: 2020-06-24-20:39:43 the start testing time (parsed) is: 2020-06-24 20:39:43
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5560.Android7.1#2020-06-24-20:39:42) [ActivityDiary, #118] testing time: 2020-06-24-20:39:47 the start testing time is: 2020-06-24-20:39:47 the start testing time (parsed) is: 2020-06-24 20:39:47
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5562.Android7.1#2020-06-24-20:39:47) [ActivityDiary, #118] testing time: 2020-06-24-20:39:52 the start testing time is: 2020-06-24-20:39:52 the start testing time (parsed) is: 2020-06-24 20:39:52
Another example output could be (In this case, the target bug, AnkiDroid
's #4451
, was found by 1 time after running Monkey for 55 minutes in one run):
**click to see the sample output.**
AnkiDroid
=========
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5554.Android7.1#2020-06-26-00:59:31) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:32 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:32 the start testing time is: 2020-06-26-00:59:32 the start testing time (parsed) is: 2020-06-26 00:59:32
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5576.Android7.1#2020-06-26-12:04:55) [AnkiDroid, #4451] testing time: 2020-06-26-12:04:57 [AnkiDroid, #4451] testing time: 2020-06-26-18:04:57 the start testing time is: 2020-06-26-12:04:57 the start testing time (parsed) is: 2020-06-26 12:04:57
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5574.Android7.1#2020-06-26-12:04:55) [AnkiDroid, #4451] testing time: 2020-06-26-12:04:56 [AnkiDroid, #4451] testing time: 2020-06-26-18:04:56 the start testing time is: 2020-06-26-12:04:56 the start testing time (parsed) is: 2020-06-26 12:04:56
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5558.Android7.1#2020-06-26-00:59:38) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:39 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:40 the start testing time is: 2020-06-26-00:59:39 the start testing time (parsed) is: 2020-06-26 00:59:39
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5556.Android7.1#2020-06-26-00:59:33) [AnkiDroid, #4451] testing time: 2020-06-26-00:59:34 [AnkiDroid, #4451] testing time: 2020-06-26-06:59:35 the start testing time is: 2020-06-26-00:59:34 the start testing time (parsed) is: 2020-06-26 00:59:34 [AnkiDroid, #4451] the crash was triggered (1) times [AnkiDroid, #4451] the time duration: ['55'] (mins)
(1) You can substitute --monkey
with --ape
or --combo
in the command line in Step 1 to directly run the corresponding tool. You may need to change the output directory -o ../monkey-results/
to a distinct directory, e.g., -o ../ape-results
. You can follow the similar steps described in Step 2 to inspect whether the target bug was found or not and the related info.
(2) Specifically, for humanoid
, before running, you need to setup the specific virtual Python environment of Humanoid. Open a terminal, and run:
cd /home/themis/the-themis-benchmark/tools/Humanoid-tool
source venv/bin/activate # Humanoid depends on tensorflow 1.12, which requires specific Python version
cd Humanoid
python3 agent.py -c config.json # start the server of Humanoid
Open a new terminal, run Humanoid
(which internally runs droidbot
) on the emulator Android7.1_Humanoid
(with specific screen size):
cd /home/themis/the-themis-benchmark/scripts/
python3 themis.py --no-headless --avd Android7.1_Humanoid --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../humanoid-results --humanoid
Remember to execute deactivate
when you finish the running to exit from the specific virtual Python environment.
(3) For Q-testing
, before running, you need to setup the specific virtual Python environment of Q-testing
. Open a terminal, and run:
cd /home/themis/the-themis-benchmark/tools/Q-testing
source venv/bin/activate # Q-testing depends on Python 2.7
And then, run the command line:
cd /home/themis/the-themis-benchmark/scripts/
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../qtesting-results/ --qtesting
ing
Remember to execute deactivate
when you finish the running to exit from the specific virtual Python environment.
(4) For TimeMachine
, we cannot build TimeMachine
within this VM because TimeMachine
tests apps by using Docker on which it runs another layer of VirtualBox. Thus, we strongly recommend to build TimeMachine
on the native machines if you want to evaluate it (see the instructions in its repo: https://github.com/the-themis-benchmarks/TimeMachine).
(5) If the app under test requires user login (see the Table of bug dataset), you should specify the login script. Themis will call the login script before testing. For example, if we run Monkey
on ../nextcloud/nextcloud-#5173.apk
which requires user login, the command line should be:
python3 themis.py --no-headless --avd Android7.1 --apk ../nextcloud/nextcloud-#5173.apk --time 10m -o ../monkey-results/ --login ../nextcloud/login-#5173.py --monkey
Here,
--login ../nextcloud/login-#5173.py
specifies the login script (which will be executed before GUI testing)- In practice, we use the emulator snapshot to save the app login state directly.
In practice, we strongly recommend the users to setup our artifact on local native machines or remote servers rather than virtual machines to ensure (1) the optimal testing performance and (2) evaluation efficiency. Thus, we provide the instructions to setup Themis from scratch.
- Ubuntu 18.04/20.04
- Python 3
- Android environment (Android 7.1 or above)
- Docker (needed by
TimeMachine
)
- create an Android emulator before running Themis (see this link for creating an emulator using avdmanager).
- An example: create an Android emulator
Android7.1
with SDK version 7.1 (API level 25), X86 ABI image and Google APIs:
sdkmanager "system-images;android-25;google_apis;x86"
avdmanager create avd --force --name Android7.1 --package 'system-images;android-25;google_apis;x86' --abi google_apis/x86 --sdcard 1024M --device 'Nexus 7'
- (optional) modify the emulator configuration to ensure optimal testing performance of testing tools:
In our evaluation, we set an emulator with 2GB RAM, 1GB SdCard, 1GB internal storage and 256MB heap size (the file for modification usually is: ~/.android/avd/Android7.1.avd/config.ini
)
sdcard.size=1024M
disk.dataPartition.size=1024M
vm.heapSize=256
hw.ramSize=2048
- (optional but recommended) copy dummy documents into emulators to allow file access from the apps under test
emulator -avd avd_Android7.1 -port 5554 &
cd themis/scripts
bash -x copy_dummy_documents.sh emulator-5554
adb emu kill emulator-5554
- install uiautomator2, which is used for executing login scripts
pip3 install --upgrade --pre uiautomator2
-
If you run Themis on remote servers, please omit the option
--no-headless
which turns off the emulator GUI. -
Install all the necessary dependecies required by the respective testing tools. Please see the README.md of each tool in its Themis's repository. We provided the detailed building instructions.
Take ActivityDiary-1.1.8-debug-#118.apk
as an example, the basic steps to add such a bug into Themis's dataset include:
- build the buggy app version into an executable apk file (i.e.,
ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
, whereActivityDiary
is the app name,1.1.8
is the code version,#118
is the original issue id.) - reproduce the bug and record the stack trace (i.e.,
ActivityDiary/crash_stack_#118.txt
) - write a bug-triggering script in
uiautomator2
and record a bug-triggering video (i.e.,ActivityDiary/script-#118.py
andActivityDiary/video-#118.mp4
) - add a JSON file to facilitate coverage computation at runtime (this step is only required by
TimeMachine
) which describes its class files and source files (i.e.,ActivityDiary/class_files.json
)
The basic steps to add such a bug into Themis's infrastructure include:
- In
scripts/check_crash.py
, one should add the app name into listALL_APPS
and the crash signature (i.e., the crash type info and the partial crash trace related to the app itself) into dictapp_crash_data
.
In the future, we plan to (1) add the crash bugs from other existing benchmarks, (2) add crash bugs with different levels of severity rather than only critical ones.
Take Monkey
as an example, the basic steps to add a new tool into Themis's infrastructure include:
- add an internal shell script to invoke the new tool (see
scripts/run_monkey.sh
) which defines (1) the concrete command line of invoking the tool, (2) the outputs of the tool, and (3) the tool-specific configurations before running - add the call of this shell script in
scripts/themis.py
(see functionrun_monkey
) - add the code of parsing the outputs of the new tool in
scripts/check_crash.py
.
In fact, we already have successfully integrated FastBot, an industrial testing tool developed by ByteDance, into Themis. See the script for supporting FastBot.
In Themis's paper, Section 4.2 and 4.3 point out many optimization opportunities and future research for improving existing testing tools. By using Themis,
- The tool authors or other researchers can debug/validate the tool improvement and evaluate/compare with new testing tools
- We can contribute new enhancement features to the original tools by pull requests because Themis forked the testing tools from their original repositories.
Themis now supports coverage profiling by running
python3 compute_coverage.py -o ../monkey-results --monkey --app ActivityDiary --id \#118 --acc_csv
The main usage is:
usage: compute_coverage.py [-h] -o O [-v] [--monkey] [--ape] [--timemachine] [--combo] [--humandroid] [--qtesting] [--stoat] [--app APP_NAME] [--id ISSUE_ID] [--acc_csv ACC_CSV] [--single_csv SINGLE_CSV]
[--average_csv AVERAGE_CSV]
optional arguments:
-h, --help show this help message and exit
-o O the output directory of testing results
-v
--monkey
--ape
--timemachine
--combo
--humandroid
--qtesting
--app APP_NAME
--id ISSUE_ID
--acc_csv ACC_CSV compute the accumulative coverage of all runs
--single_csv SINGLE_CSV
compute the coverage of single runs
--average_csv AVERAGE_CSV
compute the average coverage of all runs
By leveraging the results, one can inspect the detailed coverage report generated by Jacoco.
Themis can also benefit other research (e.g., fault localization, program repair, etc.)
Ting Su, East China Normal University, China
Jue Wang, Nanjing University, China
Zhendong Su, ETH Zurich, Switzerland
We appreciate the contributions from:
-
Enze Ma, Beijing Forestry University, China
-
Weigang He, East China Normal University, China
-
Shan Huang, East China Normal University, China
We welcome any feedback or questions on Themis. Free feel to share your ideas, open issues or pull requests. We are actively maintaining Themis to benefit our community.