-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3_Discussion 3.Rmd
117 lines (80 loc) · 4.26 KB
/
3_Discussion 3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Discussion3"
author: "Jing Lyu"
date: "1/25/2023"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## One-Way ANOVA
For this section, we will use PlantGrowth dataset. It contains weights of plants produced under two distinct treatment conditions and a control condition. We will investigate the relationship between conditions and weights.
#### 1. Write down a one-way ANOVA model for this data. Use the factor-effect form.
$$Y_{i,j}=\mu + \alpha_i+ \epsilon_{i,j}, \ j=1,\ldots, n_i, i =1,\ldots, 3$$
where $\{ \alpha_i\}$ satisfies that $\sum_{i=1}^3 n_i \alpha_i=0$ and $\{\epsilon_{i,j}\}$ are i.i.d. $N(0,\sigma^2)$.
In this model, $\alpha_i$ represent the effect from the three conditions, which are control ($i=1$), treatment 1 ($i=2$) and treatment 2 ($i=3$). The outcome $Y_{i,j}$ represents the $j$th subject under $i$th condition. The mean effect $\mu$ represents the mean weight in the population. The errors $\epsilon_{i,j}$ capture any unexplained effects on weights. Values of $n_i$ can be found in the following table.
```{r table}
table(PlantGrowth$group)
```
#### 2. Obtain the main effects plots. Summarize your findings.
```{r plot,warning=F,message=F}
library(gplots)
plotmeans(weight ~ group, data = PlantGrowth,
xlab = "Condition", ylab = "Weight",
main="Main effect of treatment")
```
Example observations:
* Apparent differences in weights across condition.
* Largest variability in treatment 1.
* Treatment 1 has the lowest weight.
* Equal sample size under each condition.
#### 3. Set up the ANOVA table using R for your model. Briefly explain this table. (explain what Df, Sum Sq, Mean Sq, F value, and Pr(>F) mean in this table.)
```{r anova.plant}
res.aov <- aov(weight ~ group, data = PlantGrowth)
summary(res.aov)
```
Eg: Treatment sum of squares is 3.766. Residual sum of squares is 10.492. F test statistics is 4.846. P-value is 0.0159.
#### 4. Test whether there is any association between conditions and weights. What are the null and alternative hypotheses?
P-value is 0.0159 less than 0.05 which indicates significant difference of weights under different conditions.
$$H_0: \alpha_1 = \alpha_2 = \alpha_3 = 0 \ \ {\rm v.s.} \ \ H_A: {\rm not \ all\ } \alpha_i\ {\rm are\ the\ zero}.$$
## Contrasts
In this section, we will use the salaries dataset. It contains data on the salaries of different professors. We will investigate the relationship between ranks of professors and salaries.
```{r salary, warning=F, message=F}
library(car)
df=Salaries
head(df)
levels(df$rank)
table(df$rank)
```
One-way ANOVA table:
```{r anova.salary, message=F, warning=F}
aov1 = aov(salary ~ rank, df)
summary.lm(aov1)
```
We access the information in the form of regression output using the *summary.lm* command.
Global test shows the difference exists among the means.
Questions:
(1) Do non-tenured position (AsstProf) and tenured position (AssocProf and Prof) have different salary?
(2) Is there a difference of salary within tenured position (AssocProf vs Prof)?
Denote mean salary of each group as $\mu_1$:AsstProf, $\mu_2$:AssocProf, $\mu_3$:Prof.
Gloal test:
$$H_0:\mu_1=\mu_2=\mu_3\quad vs.\quad H_A:\text{they are not all equal.}$$
Contrast 1: In the first contrast, we group AssocProf and Prof into the treatment condition. Then the test becomes
$$H_0:\mu_1=\frac{\mu_2+\mu_3}{2}\quad vs.\quad H_A:\mu_1\neq\frac{\mu_2+\mu_3}{2}$$
The contrast we are interested in is $\mu_1-(\mu_2+\mu_3)/2$ or $2\mu_1-(\mu_2+\mu_3)$ with $c_1=2,c_2=c_3=-1$.
Constrast 2:
$$H_0:\mu_2=\mu_3\quad vs.\quad H_A:\mu_2\neq\mu_3$$
The constrast we are interested in is $\mu_2-\mu_3$ with $c_1=1,c_2=-1$.
Assign the contrasts to the variable `rank` in the dataset df.
```{r contrasts}
contrast1 = c(2,-1,-1)
contrast2 = c(0,1,-1)
contrasts(df$rank) = cbind(contrast1, contrast2)
contrasts(df$rank) # check
```
We now analyze our contrasts by rerunning the same ANOVA command that we ran before. However, because now R has more information on the structure of the variable `rank` in the form of contrasts, the output will be different.
```{r}
aov2 = aov(salary ~ rank, df)
summary.lm(aov2)
```
Both contrasts are significant, meaning that becoming tenured affects professors’ salaries and so does moving up among tenured positions.