We thank you for your time in reviewing our submission. We understand that you are performing a service to the community. In order to ensure that review of this code is as easy as possible, we have:
- Included a one-line script to recreate all results (
repro_results.sh
) - Heavily commented and documented the code
- Included an overview of the codebase and the files.
Thank you for reviewing our submission!
The files are organized as follows:
data_structures
contains all of the data structures used in our experiments, e.g., forests, trees, nodes, and histograms.- the
wrappers
subdirectory contains convenience classes to instantiate models of various types, e.g., aRandomForestClassifier
is a forest classifier withbootstrap=True
(indicating to draw a bootstrap sample of then
datapoints for each tree),feature_subsampling=SQRT
(indicating to consider onlysqrt(F)
features of the originalF
features at each node split), etc.
- the
- the
experiments
subdirectory contains all the code for our core experimentsexperiments/runtime_exps
contains the script (compare_runtimes.py
) to reproduce the results of Tables 1 and 2, as well as the results of running that script (the files ending in_profile
or_dict
)experiments/budget_exps
contains the script (compare_budgets.py
) to reproduce the results of Tables 3 and 4, as well as the results of running that script (the files ending in_dict
)experiments/sklearn_exps
contains the script (compare_baseline_implementations.py
) to reproduce the results of Table 6 in Appendix 4experiments/scaling_exps
contains the scripts (investigate_scaling.py
andmake_scaling_plot.py
) to reproduce Appendix Figure 1 in Appendix 2
- the
tests
subdirectory tests that we wrote to verify the correctness of our implementationstests/feature_importance_tests.py
is also used to regenerate the results in Table 5- You can reproduce the results for just Table 5 by running
tests/feature_importance_tests.py
. The results will be stored in the first 4 lines oftests/stat_test_stability_log/reproduce_stability.csv
file.
- You can reproduce the results for just Table 5 by running
- the
utils
directory contains helper code for training forest-based modelsutils/solvers.py
includes the core implementation of MABSplit in thesolve_mab()
function
- To reproduce the results in all the tables, and to reproduce the figure in Appendix 2, please run
repro_script.sh
. This may take many hours.