posthoc_conover_friedman calculates incorrect p-values #63

liborak · 2023-11-29T22:10:21Z

Describe the bug
I have identified a discrepancy in the p-values produced by 'scikit_posthocs.posthoc_conover_friedman' when compared to the results obtained through manual calculation using the provided references ([1] Conover & Iman, 1979; [2] Conover, 1999). Upon closer examination of '_posthocs.py', I believe the issue lies in the formula implemented on line 731 (calculation of T2) which is inconsistent with the formula specified in references [1] and [2]:

# (...) 
# Below is the critical part of scikit_posthocs.posthoc_conover_friedman function:
x['mat'] = x.groupby(_block_col)[_y_col].rank()
R = x.groupby(_group_col)['mat'].sum()
A1 = (x['mat'] ** 2).sum()
m = 1
S2 = m/(m*k - 1.) * (A1 - m*k*n*(m*k + 1.)**2./4.)
T2 = 1. / S2 * (np.sum(R) - n * m * ((m * k + 1.) / 2.)**2.)
A = S2 * (2. * n * (m * k - 1.)) / (m * n * k - k - n + 1.)
B = 1. - T2 / (n * (m * k - 1.))
# (...)

Dataset
See the code below.

To Reproduce

from scikit_posthocs import posthoc_conover_friedman

data=[
	[3.88, 30.58, 25.24, 4.44, 29.41, 38.87],
	[5.64, 30.14, 33.52, 7.94, 30.72, 33.12],
	[5.76, 16.92, 25.45, 4.04, 32.92, 39.15],
	[4.25, 23.19, 18.85, 4.40, 28.23, 28.06],
	[5.91, 26.74, 20.45, 4.23, 23.35, 38.23],
	[4.33, 10.91, 26.67, 4.36, 12.00, 26.65]
]

pval = posthoc_conover_friedman(data, p_adjust=None)

The script yields in following p-values:

          0         1         2         3         4         5
0  1.000000  0.036732  0.019296  0.771000  0.009817  0.001127
1  0.036732  1.000000  0.771000  0.067323  0.561483  0.153707
2  0.019296  0.771000  1.000000  0.036732  0.771000  0.250288
3  0.771000  0.067323  0.036732  1.000000  0.019296  0.002360
4  0.009817  0.561483  0.771000  0.019296  1.000000  0.385789
5  0.001127  0.153707  0.250288  0.002360  0.385789  1.000000

Expected behavior
Correct results (as I believe) for the testing data are reported on page 18 of "The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR)" by Thorsten Pohlert (2016).

I have verified these results using the script below and independently through manual calculation. The verification process followed the methodology outlined in [2] Conover, 1999 (page 371). The validation shows significant discrepancies with p-values produced by scikit_posthocs.

import numpy as np
import scipy.stats as ss

# Data
data = np.array([
    # T1     T2     T3    T4    T5      T6      = Treatment (k-times)
    [3.88, 30.58, 25.24, 4.44, 29.41, 38.87],   # Value 1
    [5.64, 30.14, 33.52, 7.94, 30.72, 33.12],   # Value 2
    [5.76, 16.92, 25.45, 4.04, 32.92, 39.15],   # Value 3
    [4.25, 23.19, 18.85, 4.40, 28.23, 28.06],   # Value 4
    [5.91, 26.74, 20.45, 4.23, 23.35, 38.23],   # Value 5
    [4.33, 10.91, 26.67, 4.36, 12.00, 26.65]    # Value 6
])

# Rank data
data_ranked = np.array([ss.rankdata(col) for col in data])

# Calculate Conover statistics (tval)
b, k = data.shape
A1 = np.sum(data_ranked**2)
C1 = b*k*(k+1)**2/4
R = np.sum(data_ranked, axis=0)
T1 = (k-1)*(np.sum(R**2)-b*C1)/(A1-C1)
SE = ((A1-C1)*2*b/((b-1)*(k-1))*(1-T1/(b*(k-1))))**0.5

difR = np.zeros((k, k))
for i in range(k):
    for j in range(k):
        difR[i, j] = abs(R[i] - R[j])

tval = difR / SE

# Calculate p-values
pval = 2 * ss.t.sf(tval, df=(b*k - k - b + 1))

This yields in following p-values:

[[1.000000 0.000143 0.000030 0.555472 0.000007 0.000000]
 [0.000143 1.000000 0.555472 0.000666 0.243213 0.006214]
 [0.000030 0.555472 1.000000 0.000143 0.555472 0.024680]
 [0.555472 0.000666 0.000143 1.000000 0.000030 0.000000]
 [0.000007 0.243213 0.555472 0.000030 1.000000 0.085107]
 [0.000000 0.006214 0.024680 0.000000 0.085107 1.000000]]

These values are consistent with Pohlert's results.

System and package information (please complete the following information):

OS: Windos 10 Pro
Package version:
- scipy 1.11.1
- scikit_posthocs 0.8.0

** Aditional context **
No p-value adjustment was used in the tested runs.

The text was updated successfully, but these errors were encountered:

maximtrp · 2023-11-30T08:25:33Z

Hello! Thank you! Let's try to investigate this one deeper. Could you please run the following R script and post your results here?

library(PMCMRplus)

y <- matrix(c(
  3.88, 5.64, 5.76, 4.25, 5.91, 4.33, 30.58, 30.14, 16.92,
  23.19, 26.74, 10.91, 25.24, 33.52, 25.45, 18.85, 20.45,
  26.67, 4.44, 7.94, 4.04, 4.4, 4.23, 4.36, 29.41, 30.72,
  32.92, 28.23, 23.35, 12, 38.87, 33.12, 39.15, 28.06, 38.23,
  26.65), nrow=6, ncol=6,
  dimnames=list(1:6, LETTERS[1:6])
)

frdAllPairsConoverTest(y, p.adjust.method = 'none')

Here are my results (equal to scikit-posthocs output):

  A      B      C      D      E     
B 0.0367 -      -      -      -     
C 0.0193 0.7710 -      -      -     
D 0.7710 0.0673 0.0367 -      -     
E 0.0098 0.5615 0.7710 0.0193 -     
F 0.0011 0.1537 0.2503 0.0024 0.3858

P value adjustment method: none

maximtrp · 2023-11-30T08:28:59Z

Also, if you look at PMCMRplus code for frdAllPairsConoverTest, you will find out that scikit-posthocs implementation is very close to it, almost equal in some lines.

I assume that you are using the latest version of PMCMRplus.

liborak · 2023-11-30T10:51:49Z

Hello Max! I'm not familiar with R and don't have the capability to run R scripts. However, I've manually verified the results by substituting values into the formulas as outlined by Conover in his book "Practical nonparametric Statistics" (3rd. Edition, Wiley, 1999; pages 369-371). I suspect parameter T2 might be critical in this regard. PMCMRplus code references Conover's original paper from 1978 which specifies T2 as follows:

However, in PMCMRplus code (and also in scikit-posthocs), T2 is calculated as follows:

PMCMRplus:

  S2 <- m / (m * k - 1) * (sum(r ^ 2) - m * k * n *
                                 (m * k + 1) ^ 2 / 4)
  T2 <- 1 / S2 * (sum(R.sum) - n * m * ((m * k + 1) / 2) ^ 2)

scikit-posthocs:

  R = x.groupby(_group_col)['mat'].sum()
  S2 = m/(m*k - 1.) * (A1 - m*k*n*(m*k + 1.)**2./4.)
  T2 = 1. / S2 * (np.sum(R) - n * m * ((m * k + 1.) / 2.)**2.)

Indeed, the implementation is the same. In both codes, however, the second power is applied only to the parentheses ((m * k + 1) / 2) and is excluded from the summation while Conover applies the second power to the whole term inside the summation (which includes R and the terms with n,m,k). This is not equivalent.

My results corresponded to Pohlert's documentation for PMCMR as shown in my bug report. Maybe Code has changed in PMCMRplus (I didn't check). If so, a bug may have occured with the change.

I compared both Conover's publications (1979 and 1999) and although he gives slightly different formulas in those two publications, they give the same results.

maximtrp · 2023-12-01T18:16:02Z

Interesting! Thank you! I need some time to look into it. If you have time, please report this in PMCMRplus repo as well

liborak · 2023-12-04T14:03:12Z

Thank you. I have compared the codes for Conover's test in PMCMR and PMCMRplus, and I have noticed that the calculations for T2 are indeed different in the two codes. The correct formula, i.e. consistent with the references, is present in PMCMR. I have already reported this discrepancy in PMCMRplus.

maximtrp · 2023-12-05T08:36:50Z

Thank you!

liborak · 2023-12-10T20:07:08Z

Here is an email response from the author & maintainer of the PMCMRplus package confirming the bug:

I can confirm the issue and I have already discovered the bug.
I will fix it as soon as possible and will submit it to CRAN.
The results from the older PMCMR package were correct.

liborak · 2023-12-10T20:32:23Z

It seems that the bug has already been fixed in PMCMRplus version 1.9.10. See the news below. Please note that the rdrr snippet for online running the R Code uses version 1.7.0 of the PMCMRplus package and, therefore, contains the bug.

maximtrp · 2023-12-19T14:28:12Z

Thank you! Fixed in the latest release.

liborak added the bug label Nov 29, 2023

liborak assigned maximtrp Nov 29, 2023

maximtrp added the check needed label Nov 30, 2023

maximtrp closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posthoc_conover_friedman calculates incorrect p-values #63

posthoc_conover_friedman calculates incorrect p-values #63

liborak commented Nov 29, 2023

maximtrp commented Nov 30, 2023 •

edited

Loading

maximtrp commented Nov 30, 2023 •

edited

Loading

liborak commented Nov 30, 2023

maximtrp commented Dec 1, 2023

liborak commented Dec 4, 2023

maximtrp commented Dec 5, 2023

liborak commented Dec 10, 2023

liborak commented Dec 10, 2023

maximtrp commented Dec 19, 2023

posthoc_conover_friedman calculates incorrect p-values #63

posthoc_conover_friedman calculates incorrect p-values #63

Comments

liborak commented Nov 29, 2023

maximtrp commented Nov 30, 2023 • edited Loading

maximtrp commented Nov 30, 2023 • edited Loading

liborak commented Nov 30, 2023

maximtrp commented Dec 1, 2023

liborak commented Dec 4, 2023

maximtrp commented Dec 5, 2023

liborak commented Dec 10, 2023

liborak commented Dec 10, 2023

maximtrp commented Dec 19, 2023

maximtrp commented Nov 30, 2023 •

edited

Loading

maximtrp commented Nov 30, 2023 •

edited

Loading