You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recently been playing around with t.test(), compare_means() and stat_compare_means() using R's example dataset ToothGrowth (the one used in common examples). But I have, what may be, 2 simple questions.
Here is what I've been doing
For compare_means and stat_compare_means I prepare the data like so.
data("ToothGrowth")
df = ToothGrowth
Note: I only use the "Lenght" (containing meaurements) and "supp" (containing group info, VC or OJ) columns in analysis
For t.test() I just split the data from the supp column of thr ToothGrowth dataset into two numeric vectors like so (as per here)
Here, I use these 3 applications to compare tooth growth in two groups OJ (larger mean) and VC (smaller mean) from the "supp" column of the ToothGrowth dataset.
Firstly, I used t.test() as I figured it may be useful to get to know, given that its used in compare-_means and stat_compare_means. By placing VC second in the command line, it is taken as the reference/control in this test (this is the only conclusion I can arrive at, given the outcome). Here I test the alternative hypothesis that the tooth growth in OJ significantly lower, which it is clearly not. So we expect to be able to reject this Alternative Hypothesis. And indeed it its the case.
t.test(OJ, VC, alternative="less", var.equal=TRUE)
Two Sample t-test
data: OJ and VC
t = 1.9153, df = 58, p-value = 0.9698
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 6.92918
sample estimates:
mean of x mean of y
20.66333 16.96333
If I switch the order of the groups in the command line, taking OJ as reference and testing the alternative hypothesis that tooth growth in the VC group is significantly lower (which is the case), I expect to be able to confirm this alternative hypothesis. And indeed it its the case.
`t.test(VC, OJ, alternative="less", var.equal=TRUE)
Two Sample t-test
data: VC and OJ
t = -1.9153, df = 58, p-value = 0.0302
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -0.4708204
sample estimates:
mean of x mean of y
16.96333 20.66333
As an extension of this, if I test a different alternative hypothesis that tooth growth in the VC group is significantly greater, then as expected the test reveals that we should reject this alternative hypothesis.
t.test(VC, OJ, alternative="greater", var.equal=TRUE)
Two Sample t-test
data: VC and OJ
t = -1.9153, df = 58, p-value = 0.9698
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-6.92918 Inf
sample estimates:
mean of x mean of y
16.96333 20.66333
All fine
In agreement with this, if I call upon the t.test() package, the same data and same perameters to test the same hypotheses in compare_means(), I get the same answers as t.test() using OJ as the reference group and testing the "less" or "greater" alternative hypotheses
>compare_means(len ~ supp, data = df, ref.group = "OJ", method = "t.test", alternative = "less", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len OJ VC 0.0302 0.03 0.03 * T-test
>compare_means(len ~ supp, data = df, ref.group = "OJ", method = "t.test", alternative = "greater", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len OJ VC 0.970 0.97 0.97 ns T-test
All great, every thing checks out so far.
Problem with stat_compare_means()
However, if I perform the exact same tests using stats_compare_means() I get the opposite answer
The p-value from the test above is is 0.97 (see PDF of resulting plot), indicating that we reject the alternative hypothesis that tooth growth in the VC group is smaller than OJ (despite the other two methods giving the opposite answer)
Then get the same opposite effect when I test the opposite alternative hypothesis the tooth growth in the VC group is greater (a hypotheses that I know I can reject from earlier results), in this case the alternative hypotheses holds, with a p-value of 0.03 (?) .The resulting plot would even suggest otherwise.
I'm unsure why this is happening, hopefully its just something small I that I'm doing wrong. Any thoughts?
Problem with compare_means()
When I return to compare_means to play around with the "ref.group" function as part of troubleshooting, I noticed something else I don't understand. When I switch the reference groups from earlier tests, rather than getting the opposite results, there is no change.
Example
OJ as reference group = p-value of 0.0302
>compare_means(len ~ supp, data = df, ref.group = "OJ", method = "t.test", alternative = "less", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len OJ VC 0.0302 0.03 0.03 * T-test
same test but with VC as reference group, p=value = 0.0302
> compare_means(len ~ supp, data = df, ref.group = "VC", method = "t.test", alternative = "less", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len VC OJ 0.0302 0.03 0.03 * T-test
And the same if I test with the opposite alternative hypothesis. Different reference groups, but the same answer
>compare_means(len ~ supp, data = df, ref.group = "OJ", method = "t.test", alternative = "greater", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len OJ VC 0.970 0.97 0.97 ns T-test
> compare_means(len ~ supp, data = df, ref.group = "VC", method = "t.test", alternative = "greater", var.equal=TRUE)
# A tibble: 1 x 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 len VC OJ 0.970 0.97 0.97 ns T-test
Am I missing something here? Any thoughts on this would be much appreciated.
The text was updated successfully, but these errors were encountered:
The option ref.group was only considered when the grouping variable contains more than two levels. In that case, each level is compared against the specified reference group. Now, ref.group option is also considereded in two samples mean comparisons.
I've recently been playing around with t.test(), compare_means() and stat_compare_means() using R's example dataset ToothGrowth (the one used in common examples). But I have, what may be, 2 simple questions.
Here is what I've been doing
For compare_means and stat_compare_means I prepare the data like so.
Note: I only use the "Lenght" (containing meaurements) and "supp" (containing group info, VC or OJ) columns in analysis
For t.test() I just split the data from the supp column of thr ToothGrowth dataset into two numeric vectors like so (as per here)
Here, I use these 3 applications to compare tooth growth in two groups OJ (larger mean) and VC (smaller mean) from the "supp" column of the ToothGrowth dataset.
Firstly, I used t.test() as I figured it may be useful to get to know, given that its used in compare-_means and stat_compare_means. By placing VC second in the command line, it is taken as the reference/control in this test (this is the only conclusion I can arrive at, given the outcome). Here I test the alternative hypothesis that the tooth growth in OJ significantly lower, which it is clearly not. So we expect to be able to reject this Alternative Hypothesis. And indeed it its the case.
If I switch the order of the groups in the command line, taking OJ as reference and testing the alternative hypothesis that tooth growth in the VC group is significantly lower (which is the case), I expect to be able to confirm this alternative hypothesis. And indeed it its the case.
As an extension of this, if I test a different alternative hypothesis that tooth growth in the VC group is significantly greater, then as expected the test reveals that we should reject this alternative hypothesis.
All fine
In agreement with this, if I call upon the t.test() package, the same data and same perameters to test the same hypotheses in compare_means(), I get the same answers as t.test() using OJ as the reference group and testing the "less" or "greater" alternative hypotheses
All great, every thing checks out so far.
Problem with stat_compare_means()
However, if I perform the exact same tests using stats_compare_means() I get the opposite answer
The p-value from the test above is is 0.97 (see PDF of resulting plot), indicating that we reject the alternative hypothesis that tooth growth in the VC group is smaller than OJ (despite the other two methods giving the opposite answer)
Then get the same opposite effect when I test the opposite alternative hypothesis the tooth growth in the VC group is greater (a hypotheses that I know I can reject from earlier results), in this case the alternative hypotheses holds, with a p-value of 0.03 (?) .The resulting plot would even suggest otherwise.
I'm unsure why this is happening, hopefully its just something small I that I'm doing wrong. Any thoughts?
Problem with compare_means()
When I return to compare_means to play around with the "ref.group" function as part of troubleshooting, I noticed something else I don't understand. When I switch the reference groups from earlier tests, rather than getting the opposite results, there is no change.
Example
OJ as reference group = p-value of 0.0302
same test but with VC as reference group, p=value = 0.0302
And the same if I test with the opposite alternative hypothesis. Different reference groups, but the same answer
Am I missing something here? Any thoughts on this would be much appreciated.
The text was updated successfully, but these errors were encountered: