I originally calculated the confidence intervals for Krippendorf's alpha using kripp.boot
. However the size of the confidence intervals did not appear to decline as the sample size increased. I found this surprising although I am not sure if it is expected behaviour so raised this as a github issue.
Scripts 1-3 of this repo run some simulations to calculate Krippendorf's alpha with a 95% confidence interval using kripp.boot
and the krippendorffsalpha
package. In the second case the confidence interval declines as the sample size increases.
I ran some simulations with a varying sample size but keeping the classes balanced (50% in each class). The error rate was 0.2.
I have used the phrase "error rate" to refer to the proportion of true positives and true negatives that are misclassified. For example, with a balanced dataset of 100 perfectly classified samples, the confusion matrix would be:
Rater 2 | ||||
---|---|---|---|---|
Rater 1 | Class | 0 | 1 | |
0 | 50 | 0 | ||
1 | 0 | 50 |
If we set the error rate at 0.2, the confusion matrix would be:
Rater 2 | ||||
---|---|---|---|---|
Rater 1 | Class | 0 | 1 | |
0 | 40 | 10 | ||
1 | 10 | 40 |
This would lead to Krippendorf's alpha and Cohen's Kappa of about 0.6.
Conversely in an imbalanced dataset with perfect agreement between raters, we would have this confusion matrix:
Rater 2 | ||||
---|---|---|---|---|
Rater 1 | Class | 0 | 1 | |
0 | 80 | 0 | ||
1 | 0 | 20 |
Applying a 0.2 error rate would lead to:
Rater 2 | ||||
---|---|---|---|---|
Rater 1 | Class | 0 | 1 | |
0 | 64 | 16 | ||
1 | 4 | 16 |
Note that for simplicity I have assumed that the error rate is applied in equal proportions to the negative and positive samples.
The estimate of alpha
is the same with both packages. However, the confidence intervals are not:
The size of the confidence interval decreases as the sample size decreases using the krippendorffsalpha
package but not the kripp.boot
package:
This holds true across imbalanced samples as well (although note that with imbalanced data, the value of Krippendorf's alpha is lower than with balanced data, and the confidence interval is greater):
The estimates of alpha remain the same with each package. However, the 95% confidence intervals differ.
Once again we see that with kripp.boot
the confidence interval does not decrease as the number of samples increases.
In this case, the number of samples was held constant at 300 and the classes were balanced.
Once again the estimates of alpha
is the same with both packages. In this case the confidence intervals of both packages change - although there is an absolute difference between them.
We see more clearly in the scatter plot that both estimates of the confidence interval of the estimate of alpha are a function of the error rate and follow the same distribution. However the kripp.boot
confidence interval is always significantly larger.
The same holds true with imbalanced data:
Apart from when comparing the packages, the estimates in this repo use the krippendorfsalpha
package rather than kripp.boot
.