-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmdx
75 lines (51 loc) · 2.69 KB
/
index.Rmdx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title : Developing Data Project - Slidify Presentation
subtitle : devdataprod-002
author : PavanKB
job : Student at Coursera
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
knit : slidify::knit2slides
logo : coursera.png
---
## Introduction
These slides have been created as part of the Data Products project on Coursera.
The following slides will describe the Shiny App that was created as part of the project. The Shiny App demonstrates the change in distribution of two sets of random variables as the correlation between them is modified.
The app is available at - [ShinyApp](http://pavankb.shinyapps.io/DataProdProject)
--- .class #id
## Math
Correlated random variables can be generated as follows, for two datasets x_1 and x_2 from the same distribution
$$ y = \rho x_1 + x_2\sqrt{1-\rho^2} $$
Now `y` and `x1` are correlated with $\rho$
```{r fig.width = 4, fig.height=4, fig.align='center'}
x1 <- rnorm(50);x2 <- 0.5*x1 +sqrt(1-0.5^2)*rnorm(50)
plot(x1,x2,pch=19,col='blue',xlab='x1',ylab='x2',asp = 1, main = 'Sample Plot')
```
--- .class #id
## UI
The Shiny application contains the following controls:
**Seed** - Set this to ensure that simulation can be recreated
**Distribution** - Choose the distribution from which the random variables will be generated. There are three options
* Std.Normal
* t-Distribution - Degrees of Freedom = 20
* Chi- Square distribution - Degrees of Freedom = 20
**Number of points** - Set the number of points to be plotted on the graph.
**Correlation** - Set the value for the correlation between the two sets of variables.
--- .class #id
## Output
Every time a parameter is modified, the app will automatically generate a new graph. The parameters used to generate it are summarised on the title.
The graph also plots the guidelines : `y=x`, `y=-x` and the regression line (in red), so that the user can compare the results as $\rho$ is modified.
As $\rho$ tends to 1 the points align themselves along `y=x`
similarly as $\rho$ tends to -1 the points will align themselves to `y=-x`
--- .class #id
## Limits
1. While the correlation effects can be explored for multiple data sets, this app only explores two datasets.
2. The number of variables is limited to the range 10 - 1000
3. The app onlu shows the effects of correlation on only 3 distributions
4. Distribution parameters are fixed
* Std, Normal - $\mu$ = 0, $\sigma$ = 1
* t-distribution - dof = 20
* Chi- squared - dof = 20