Skip to content

Commit

Permalink
Merge branch 'main' into mary-episode4-changes
Browse files Browse the repository at this point in the history
  • Loading branch information
ailithewing authored Apr 2, 2024
2 parents 755a00a + 59e3ff8 commit 6e2ced4
Show file tree
Hide file tree
Showing 15 changed files with 376 additions and 177 deletions.
3 changes: 2 additions & 1 deletion CITATION
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
FIXME: describe how to cite this lesson.
O’Callaghan A, Robertson G, LLewellyn M, Becher H, Meynert A, Vallejos C, Ewing A. (2024). High dimensional statistics with R. https://github.com/
carpentries-incubator/high-dimensional-stats-r.
20 changes: 5 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,7 @@

[![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://swc-slack-invite.herokuapp.com/)

**Thanks for contributing to The Carpentries Incubator!**
This repository provides a blank starting point for lessons to be developed
here.

A member of the [Carpentries Curriculum Team](https://carpentries.org/team/)
will work with you to get your lesson listed on the
[Community Developed Lessons page][community-lessons]
and make sure you have everything you need to begin developing your new lesson.

## What to do next

Before you begin developing your new lesson,
here are a few things we recommend you do:

* [ ] [Add relevant topic tags to your lesson repository][cdh-topic-tags].
This repository is part of The Carpentries Incubator, a place for The Carpentries community to collaboratively create, test, and improve lessons.

## Contributing

Expand All @@ -42,6 +28,10 @@ Look for the tag
This indicates that the maintainers will welcome a pull request fixing this
issue.

## Reviews

The lesson has been iteratively developed and improved. For information on the development process, reviews and feedback from instructors following teaching see [REVIEWS](reviews.md).

## Maintainer(s)

Current maintainers of this lesson are
Expand Down
12 changes: 6 additions & 6 deletions _episodes_rmd/01-introduction-to-high-dimensional-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -142,18 +142,18 @@ of the challenges we are facing when working with high-dimensional data.
> >
> >
> > ```{r dim-prostate, eval = FALSE}
> > dim(prostate) #print the number of rows and columns
> > dim(prostate) # print the number of rows and columns
> > ```
> >
> > ```{r head-prostate, eval = FALSE}
> > names(prostate) # examine the variable names
> > head(prostate) #print the first 6 rows
> > names(prostate) # examine the variable names
> > head(prostate) # print the first 6 rows
> > ```
> >
> > ```{r pairs-prostate}
> > names(prostate) #examine column names
> > ```{r pairs-prostate, fig.cap="Pairwise plots of the 'prostate' dataset.", fig.alt="A set of pairwise scatterplots of variables in the 'prostate' dataset, namely lcavol, lweight, age, lbph, svi, lcp, gleason, pgg45, lpsa. The plots are shown in a grid."}
> > names(prostate) # examine column names
> >
> > pairs(prostate) #plot each pair of variables against each other
> > pairs(prostate) # plot each pair of variables against each other
> > ```
> > The `pairs()` function plots relationships between each of the variables in
> > the `prostate` dataset. This is possible for datasets with smaller numbers
Expand Down
47 changes: 19 additions & 28 deletions _episodes_rmd/02-high-dimensional-regression.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ methyl_mat <- assay(methylation)
The distribution of these M-values looks like this:

```{r histx, fig.cap="Methylation levels are generally bimodally distributed.", fig.alt="Histogram of M-values for all features. The distribution appears to be bimodal, with a large number of unmethylated features as well as many methylated features, and many intermediate features."}
hist(methyl_mat, breaks = "FD", xlab = "M-value")
hist(methyl_mat, xlab = "M-value")
```

You can see that there are two peaks in this distribution, corresponding
Expand All @@ -105,7 +105,11 @@ sample-level metadata we have relating to these data. In this case, the
metadata, phenotypes, and groupings in the `colData` look like this for
the first 6 samples:

```{r datatable}
```{r, eval=FALSE}
head(colData(methylation))
```

```{r datatable, echo=FALSE}
knitr::kable(head(colData(methylation)), row.names = FALSE)
```

Expand Down Expand Up @@ -1029,15 +1033,10 @@ conservative, especially with a lot of features!
```{r p-fwer, fig.cap="Bonferroni correction often produces very large p-values, especially with low sample sizes.", fig.alt="Plot of Bonferroni-adjusted p-values (y) against unadjusted p-values (x). A dashed black line represents the identity (where x=y), while dashed red lines represent 0.05 significance thresholds."}
p_raw <- toptab_age$P.Value
p_fwer <- p.adjust(p_raw, method = "bonferroni")
library("ggplot2")
ggplot() +
aes(p_raw, p_fwer) +
geom_point() +
scale_x_log10() + scale_y_log10() +
geom_abline(slope = 1, linetype = "dashed") +
geom_hline(yintercept = 0.05, linetype = "dashed", col = "red") +
geom_vline(xintercept = 0.05, linetype = "dashed", col = "red") +
labs(x = "Raw p-value", y = "Bonferroni p-value")
plot(p_raw, p_fwer, pch = 16, log="xy")
abline(0:1, lty = "dashed")
abline(v = 0.05, lty = "dashed", col = "red")
abline(h = 0.05, lty = "dashed", col = "red")
```

You can see that the p-values are exactly one for the vast majority of
Expand Down Expand Up @@ -1090,7 +1089,7 @@ experiment over and over.
> > \frac{0.05}{100} = 0.0005
> > $$
> >
> > 2. Trick question! We can't say what proportion of these genes are
> > 2. We can't say what proportion of these genes are
> > truly different. However, if we repeated this experiment and
> > statistical test over and over, on average 5% of the results
> > from each run would be false discoveries.
Expand All @@ -1100,25 +1099,17 @@ experiment over and over.
> >
> > ```{r p-fdr, fig.cap="Benjamini-Hochberg correction is less conservative than Bonferroni", fig.alt="Plot of Benjamini-Hochberg-adjusted p-values (y) against unadjusted p-values (x). A dashed black line represents the identity (where x=y), while dashed red lines represent 0.05 significance thresholds."}
> > p_fdr <- p.adjust(p_raw, method = "BH")
> > ggplot() +
> > aes(p_raw, p_fdr) +
> > geom_point() +
> > scale_x_log10() + scale_y_log10() +
> > geom_abline(slope = 1, linetype = "dashed") +
> > geom_hline(yintercept = 0.05, linetype = "dashed", color = "red") +
> > geom_vline(xintercept = 0.05, linetype = "dashed", color = "red") +
> > labs(x = "Raw p-value", y = "Benjamini-Hochberg p-value")
> > plot(p_raw, p_fdr, pch = 16, log="xy")
> > abline(0:1, lty = "dashed")
> > abline(v = 0.05, lty = "dashed", col = "red")
> > abline(h = 0.05, lty = "dashed", col = "red")
> > ```
> >
> > ```{r plot-fdr-fwer, fig.alt="Plot of Benjamini-Hochberg-adjusted p-values (y) against Bonferroni-adjusted p-values (x). A dashed black line represents the identity (where x=y), while dashed red lines represent 0.05 significance thresholds."}
> > ggplot() +
> > aes(p_fdr, p_fwer) +
> > geom_point() +
> > scale_x_log10() + scale_y_log10() +
> > geom_abline(slope = 1, linetype = "dashed") +
> > geom_hline(yintercept = 0.05, linetype = "dashed", color = "red") +
> > geom_vline(xintercept = 0.05, linetype = "dashed", color = "red") +
> > labs(x = "Benjamini-Hochberg p-value", y = "Bonferroni p-value")
> > plot(p_fwer, p_fdr, pch = 16, log="xy")
> > abline(0:1, lty = "dashed")
> > abline(v = 0.05, lty = "dashed", col = "red")
> > abline(h = 0.05, lty = "dashed", col = "red")
> > ```
> >
> {: .solution}
Expand Down
Loading

0 comments on commit 6e2ced4

Please sign in to comment.