-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDATA210-T-Tests and Regression
188 lines (131 loc) · 10.7 KB
/
DATA210-T-Tests and Regression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
###
#DATA-2100
#Adefoluke Shemsu
#Week 7
###
setwd("~/Documents/Education/Penn/Classes/DATA 210/Week 7")
library(tidyverse)
library(randomizr)
set.seed(28)
# In many states, almost all states convicted felons are banned from voting while they serve their prison, parole,
# or probation sentence. In some of those states, they are able to have their voting rights restored.
# Even after voting rights are restored, former felons continue to register to vote and vote at extremely low rates.
# In this question, we will look at an experiment (co-authored by UPenn’s Dr. Marc Meredith) that sought to understand
# what caused these low rates. The researchers were specifically interested in how much of this low rate could be
# explained by felons’ not knowing that their voting rights had been or could be restored after serving their sentence.
# For these exercises, you will use the dataset ‘felons.RData’ to replicate some of the findings in the article
# “Can Incarcerated Felons be (Re)integrated into the Political System? Results from a Field Experiment.”
# a. Imagine you wanted to understand the effect that felony convictions and incarceration have on voter
# registration and turnout (after the sentence has been served).
# Somebody suggests that you compare the voter registration rates and turnout rates of former felons to people
# who never have served time for a felony. Discuss why this research design could or could not help you to get a
# good estimate of the causal effect that you are interested in.
# This design could be productive as long as we have more data to work from, as the presented premise relies on
# dependent variables that are technically very different contextually and are arguably mutually exclusive. In other words,
# though there may be correlation, the fact that we don't know how one might directly influence the other would likely
# lead to data that is skewed against ex-felons' participation for an array of un-addressed reasons. We need a third element
# that accounts for outlaying factors and distinctly ties these variables together. An example could be something
# like whether the non-felons served time at all (since even 6 months in jail for non-felonies can impact
# civic participation) for something else, or whether the felon went to prison more than once
# (since 1 felony bout vs 3 will impact civic participation).
#b. Read the Experimental Design section of the “Can Incarcerated Felons be (Re)integrated into the Political System?”
# (pages 915 through 917 in gerber, et al 2015.pdf ).
#i. What are the causal effect(s) that the authors are interested in studying?
# The authors want to study the causal impact of felony time served on an individual's likeliness to vote
# compared against on their level of education on their ability to exercise their political engagement rights.
# ii. Describe the treatment and control conditions in the experiment.
# The control group was a group of ex-felons whose data was provided by the state of CT that would not be contacted
# to inform of their right to engage in the voting process. The treatment group was another group of CT-based
# ex-felons that would be engaged via marketing campaign and educated on their voting rights. This group was also
# parsed down to consider the nature of the crimes committed, and to account for the 40% outreach that bounce.
# CT was chosen because it is a state that re-instates a felon's voting rights upon release.
#The findings here also took further a multi-state study that tested a similar hypothesis (minus outreach) in 2008.
#iii. Describe the randomization strategy that the authors used.
# For the treatment group, a randomly selected subset of ex-felons were sent letters at random (one of two types) on
# Secretary of State letterhead informing them that they were currently not registered to vote but were eligible to do so.
# This group was then split into two groups, where any imbalances in demographics, release date, and crimes committed
# were balanced, and where one treatment group's letter merely stated an eligibility to vote while the other group
# was offered an explanation of their voting rights from a certified letter from the Secretary of State.
# c. Now we’re going to analyze the results from the experiment. Begin by removing the 161 people in the dataset
# who returned to prison before the experiment was conducted. Then create a new variable called ‘treatment_collapsed’
# which tells us whether each observation in the data was in the control group (FALSE) or a treatment group (TRUE).
load("~/Documents/Education/Penn/Classes/DATA 210/Week 7/felons.RData")
count(felons, returntoprison == 1) # Confirming that 161 subjects have returned to prison
felons.exp <- subset(felons, felons$returntoprison == 0) # Creating new set with 161 excluded
count(felons.exp, returntoprison == 1) # Validating prev line
attributes(felons$treatment) # Getting better data for the treatment column
felons.exp <- mutate(felons.exp,
treatment_collapsed = treatment != 1) # Adding new variable based on control group logic
felons.exp <- select(felons.exp, -returntoprison) # Removing "returntoprison" since none in this group went back
# d. The first thing you should always do before analyzing the results of an experiment is assess whether you have
# balance in your treatment and control groups. In a well-balanced experiment, no pre-treatment covariates
# (i.e. the variables that existed before you ran the experiment) would predict whether or not somebody ended up
# in the treatment or control group. For the following questions, use the treatment_collapsed variable.
# i. Use 4 t-tests to assess whether the felons’ age, number of days served in prison, time since their release
# from prison, or 2008 vote turnout is a statistically significant predictor of treatment. To do the t-tests,
# you’ll want to write code that looks like this: t.test(felons$age ~ felons$treatment_collapsed).
# Create a well-formatted table the present the average values for each of these variables in the treatment
# and control groups, as well as the the p-value associated with the difference between those averages.
# You can pull out these values from the output of the t.test() object using the $ operator.
# Is there significant imbalance for any of those four variables?
age.exp <- t.test(felons.exp$age ~ felons.exp$treatment_collapsed) # Getting analysis for each
num.days.served.exp <- t.test(felons.exp$days_served ~ felons.exp$treatment_collapsed)
yrs.since.release.exp <- t.test(felons.exp$yrs_since_release ~ felons.exp$treatment_collapsed)
vote08.exp <- t.test(felons.exp$vote08 ~ felons.exp$treatment_collapsed)
# Validating t-tests
mean(felons.exp$age[felons.exp$treatment_collapsed == TRUE]) # 35.23
mean(felons.exp$age[felons.exp$treatment_collapsed == FALSE]) # 35.34
mean(felons.exp$days_served[felons.exp$treatment_collapsed == TRUE]) # 369.48
mean(felons.exp$days_served[felons.exp$treatment_collapsed == FALSE]) # 370.13
mean(felons.exp$yrs_since_release[felons.exp$treatment_collapsed == TRUE]) # 1.8
mean(felons.exp$yrs_since_release[felons.exp$treatment_collapsed == FALSE]) # 1.8
mean(felons.exp$vote08[felons.exp$treatment_collapsed == TRUE], na.rm = TRUE) # .0522
mean(felons.exp$vote08[felons.exp$treatment_collapsed == FALSE], na.rm = TRUE) # .0495
library(broom) # Organizing into a table
library(purrr)
testgroup <- map_df(list(age.exp, num.days.served.exp, vote08.exp, yrs.since.release.exp), tidy)
testgroup <- rename(testgroup,
mean.diff = estimate,
control = mean.false,
treatment = mean.true) # Renaming columns for easier navigation
rownames(testgroup) <- c("age", "num.days.served", "vote08", "time.since.release") # Identifying each row better as well
# According to my table, there aren't any major imbalances.
# ii. Use linear regression to assess whether the type of crime predicts whether somebody ended up in the treatment or control group.
# Were any crimes strong predictors of the treatment?
# Analyzing whether treatment group is influenced by felony type
summary(lm(treatment_collapsed ~ felony_type, data = felons.exp))
# According to this analysis, no particular felony type influenced whether or not they were in a particular group.
# iii. Use linear regression to assess balance for all the variables (age, days in prison, time since release, 2008 turnout,
# crime type) simultaneously. When you do this, do you find imbalance for any of the pre-treatment covariates?
summary(lm(treatment_collapsed ~
vote08 +
age +
days_served +
yrs_since_release +
felony_type,
data = felons.exp))
# No imbalances found.
# e. Did the experiment have an effect on whether or not ex-felons registered to vote? Did it impact their turnout in 2012?
# If so, how much did the treatment increase or decrease the probability that they registered or turned out?
# You can use linear regression and the ‘treatment_collapsed’ variable to answer this question.
summary(lm(treatment_collapsed ~ registered, data = felons.exp))
# There is strong indication that registration numbers were impacted by the experiment, as
# the p-value of .0047 demonstrates a significant difference between the control and treatment groups.
summary(lm(treatment_collapsed ~ vote12, data = felons.exp))
# Though less impacted (p-value of .048 or < .05), the experiment did make a significant difference on 2012 turnout.
summary(lm(treatment_collapsed ~ registered + vote12, data = felons.exp))
# f. Use linear regression to estimate these two treatment effects again. This time, control for the five pre-treatment
# covariates (the ones you checked for balance in part C in your regression). What effect did the treatment have
# on registration and voting?
# Regressions with pre-treatments included to get registration data:
summary(lm(treatment_collapsed ~ felony_type + treatment + age + days_served + yrs_since_release,
data = felons.exp[felons.exp$registered == 1,]))
summary(lm(treatment_collapsed ~ felony_type + treatment + age + days_served + yrs_since_release,
data = felons.exp[felons.exp$registered == 0,]))
# Regressions with pre-treatments included to get 2012 voter data:
summary(lm(treatment_collapsed ~ felony_type + treatment + age + days_served + yrs_since_release,
data = felons.exp[felons.exp$vote12 == 1,]))
summary(lm(treatment_collapsed ~ felony_type + treatment + age + days_served + yrs_since_release,
data = felons.exp[felons.exp$vote12 == 0,]))
# The data, even once weighted by pre-treatment controls, demonstrates a significant increase in
# voter registration and turnout for those included in the treatment group.