Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Test FIL probabilities with absolute error thresholds in python (#3582)
Probabilities are limited between [0.0, 1.0]. Also, we generally care more about large probabilities which are `O(1/n_classes)`. The largest relative probability errors are usually caused by a small ground truth probability (e.g. 1e-3), as opposed to a large absolute error. Hence, relative probability error is not the best metric. Absolute probability error is more relevant. Moreover, absolute probability error is more stable, as relative errors have a long tail. When training or even inferring on many rows, the chance of getting a ground truth probability sized 1e-3 or 1e-4 grows. In some cases, there is no reasonable and reliable threshold. Last, if the number of predicted probabilities (clipped values) per input row grows, so does the long tail of relative probability errors, due to less undersampling. This unfairly compares binary classification with regression, and multiclass classification with binary classification. The changes below are based on collecting absolute errors under `--run_unit`, `--run_quality` and `--run_stress`. These thresholds are violated at most a couple times per million samples, in most cases never. Authors: - @levsnv Approvers: - John Zedlewski (@JohnZed) - Andy Adinets (@canonizer) URL: #3582
- Loading branch information