From fdf78479da8ded10eab501360ff16c34293f28af Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 14 Mar 2024 13:28:50 +0000 Subject: [PATCH 01/28] add reviews --- reviews.md | 155 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 reviews.md diff --git a/reviews.md b/reviews.md new file mode 100644 index 00000000..5fb76361 --- /dev/null +++ b/reviews.md @@ -0,0 +1,155 @@ +# Reviews: high dimensional statistics with R +The purpose of this document is to summarise and track how the lesson has developed in response to peer reviews, feedback from instructors and Carpentries advice. We also detail the main changes that still need to be made and thus define a roadmap to publication. + +Note that the lesson has been developed over around 3 years and iteratively improved. This document only highlights reviews contributed by reviewers external to the main authors, except following rounds of teaching. Details of other improvements can be found throughout the repository and the list of authors is given in [AUTHORS](AUTHORS). + +Thank you to our reviewers and instructors for their feedback. If you would like to submit a review or pull request, please see our [Contribution Guide](https://github.com/carpentries-incubator/high-dimensional-stats-r/blob/main/CONTRIBUTING.md) for more information. + +## Peer reviews +**Review by Emma Rand on Episode 1: Introduction to high-dimensional data ([#39](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/39))** + +The reviewer liked this episode as an introduction to the course, particularly that high-dimensional data were defined explicitly with examples, that important points were reiterated in the text, and that the motivation for using alternative methods when considering high-dimensional data was given. The comments pertained to the entire episode, with the big changes relating to elaborating and expanding the questions or solutions for the challenges, inline code formatting and elaborating reason for package use. + +☑ Changes were made in line with all the suggestions exactly, and are itemised in the issue and the associated points in [#64](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/64). + +**Review by Emma Rand on Episode 2: Regression with many outcomes ([#47](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/47]))** + +The reviewer particularly liked that this episode demonstrates why we need alternative approaches to regression for high-dimensional data and the multiple testing section. Although many comments were given, the reviewer highlighted that the episode was long, that new concepts should be removed from Challenge 1 and that the smoking model figure should be corrected. The review also highlighted issues with the remote theme. + +☑ Changes were made in line with the suggestions exactly, including reducing the length of the lesson. The changes are itemised in the issue and the associated points in [#64](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/64). + +☒ The only point that remains to be addressed: + +- The first challenge makes good points but introduces new concepts rather than tests presented content. + +**Review by Emma Rand on Episode 3: Regularised regression ([#49](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/49))** + +The reviewer liked that the episode emphasises a genuine understanding of the methods. Amongst the full review comments, the reviewer commented that the episode is long and suggested some sections to remove. The reviewer also suggested several points that could be expanded to improve the use of statistical 'jargon' and drawing links between jargon to make the episode more approachable to a biological sciences audience. + +☑ All the suggested changes were made and are detailed in the issue. + + +**Review by Christie Barron on Episode 5: Factor analysis ([#53](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/53))** + +Christie commented that it may be useful to discuss confirmatory factor analysis in addition to exploratory factor analysis to clarify that this is another approach that can be used. In addition, approaches to factor enumeration could be discussed and R packages that make factor analysis easier. + +☑ All of the suggested changes were made and are detailed in the issue and in commits 14584c8 and 3419337. + +**Review by Mary Llewellyn on Episode 1: Introduction to high-dimensional data ([#112](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/112))** + +The reviewer liked that this episode struck a good balance between motivating the course clearly whilst avoiding cognitive overload. The reviewer suggested changes largely related to adding signposting, foreshadowing to motivate the entire lesson from the start and re-ordering paragraphs. The reviewer also suggested minor wording changes to, for example, clarify the difference between the "Challenges" (exercises) and "challenges" in the general sense and to ensure learners, especially independent learners, had completed the setup instructions. + +☑ All the suggested changes were made, detailed in the issue. From the discussions following this review, we have also clarified the definition of high-dimensional data and plan to set up a data description page [#132](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/132). + + + +**Review by Mary Llewellyn on Episode 2: Regression with many outcomes ([#114](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/114))** + +The reviewer wrote that they really liked the episode and believed it's really valuable to explore many outcomes as well as many predictors. They had a few queries on the episode and suggested mainly that some of the more complex programming concepts could be removed to avoid cognitive overload, and clarification about the motivation of the episode as avoiding data dredging. + +☑ All of the suggested changes were made and are detailed in the issue. + +**Review by Mary Llewellyn on Episode 3: Regularised regression ([#115](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/115))** + +The reviewer liked the episode and commented that, although it's long, it makes challenging ideas approachable. The suggestions largely related to how regularisation is motivated and linking ideas to the previous lesson, signposting, how singularities are described and the placement of the linear regression section within the episode. Further adjustments were recommended for independent learners. + +☑ All of the suggested changes were made, detailed in the issue. + + +**Review by Mary Llewellyn on Episode 4: Principal Component Analysis ([#117](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/117))** + +The reviewer really liked the way that PCA is presented practically. The main comments related to clarifying the motivation for various parts of the episode, moving discussion of advantages and disadvantages to the end of the episode, signposting, making it clearer when examples are demonstrative and streamlining package use. + +☑ All of the suggested changes were made and are detailed in the issue. Various additional changes were made following the comments (detailed in the issue), including refining the number of PCA packages used to one, simplifying the scree plots and adding further detail to the code comments. + +**Review by Mary Llewellyn on Episode 5: Factor analysis ([#118](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/118))** + +The reviewer thought that this episode was well-balanced with the previous episode and had a few suggestions to differentiate between factor analysis and PCA, how latent variables are defined, signposting, moving discussion of advantages and disadvantages to the end of the episode, some wording around the hypothesis tests, and some adaptations for the individual learner. + +☑ All of the suggested changes were made, detailed in the issue. Additional changes were made following the comments with respect to removing discussion of the rotations to reduce the likelihood of cognitive overload. + +**Review by Mary Llewellyn on Episode 6: K-means ([#119](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/119))** + +The reviewer really liked that this episode builds gradually from an initial example and stated that this makes the narrative very clear. Most of the suggestions were with respect to wording, minor re-ordering of sections, signposting and differentiating K-means from the methods already introduced. + +☑ All of the suggested changes were made and are detailed in the issue. + +**Review by Mary Llewellyn on Episode 7: Hierarchical clustering ([#120](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/120))** + +The reviewer liked how this episode built on K-means clustering on the second episode and the use of visualisation to illustrate the concepts in the episode. The suggestions related to adding further motivation for the episode, structural re-ordering, annotating plots and code, and signposting. + +☑ All of the suggested changes were made and are detailed in the issue. + + + +## Instructor feedback +**Feedback from teaching 21st October 2021 ([#33](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/33))** + +Overall, the learners liked that the practical examples were clear and easily understood by biologists, the slides were informative and well-presented, and they liked how useful the lesson is. They particularly noted that they liked the pace and depth of the first two episodes and the visualisations in episode 7. + +There were some issues with equation rendering in chrome and some learners found that the pace could be a little faster in places. Episode-specific comments noted that episode 3 was too theoretical, episodes 4 and 5 could contain more code comments for learners looking back on the course and episodes 6 and 7 could contain more examples and give an overview of the general steps of each method/when each is useful. + +☑ The lesson as been iteratively improved over time. As such, episode 3 is now presented much more practically (fewer mathematical expressions, existing theoretical concepts are more clearly and practically motivated, additional content such as Bayesian methods have been removed to focus on the concepts already introduced). Episodes 4 and 5 have almost completely changed and the code is commented and motivated much more clearly. Episodes 6 and 7 now motivate the practical uses of each method more clearly. From teaching, the course was also improved by shortening the introduction to focus only on the difficulties of high-dimensional data, episode 3 was presented from a more practical perspective and episode 5 was made more detailed. + +**Additional changes following teaching from February-June 2022 ([#52](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/33), [#57](https://github.com/carpentries-incubator/high-dimensional-stats-r/pull/57), [#63](https://github.com/carpentries-incubator/high-dimensional-stats-r/pull/63), [#64](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/64), [#76](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/76))** + +Several other changes following notes and feedback from teaching are detailed in these issues. + +**Feedback from Edward Wallace from teaching September 2022 ([#86](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/86), [#88](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/88))** + +The learners found the lessons relevant to their work, particularly episodes 4 and 5, which they thought were explained really well, were practical and easily to follow, and introduced concepts they found important to their work at a level that was understandable to them. They said that the way these episodes were presented helped them to fill the gaps in their understanding from practical implementation. They also particularly liked the coding and visualisation in episode 3. + +Many of the comments related to timings (allowing more time) and clarifying wording. + +☑ Changes were made in response to most comments in [#86](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/86), [#89](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/88) and many remaining changes are evident in the current lesson. + +☒ The remaining points to be addressed are: + +- Providing function definitions or access to function documentation (i.e., clarifying arguments). This noted by both instructors and learners. +- Removing or clarifying example 2 of Challenge 1, Episode 4. The instructors noted that it could be interpreted as PCA-appropriate. +- Rewording Challenge 2 in Episode 4. The instructors noted that some of the learners said it seemed like a trick question. + +**Feedback from February 2024 teaching ([#145](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/145))** + +The instructors liked teaching the course and found it fun to teach. Comments largely related to timing adjustments, explaining packages, adjusting the way the factor analysis episode is presented and some structural changes to the first three episodes. + +☒ Changes in response to this feedback are ongoing and documented in the issue. The points that remain to be addressed are: + + - Possibly could describe the R packages/rationalise them in greater detail to reduce the risk of technical issues. + - The factor analysis episode could be combined into another episode (potentially doesn't need to be its own episode) since the course is pitched at biologists. This could be combined into the PCA episode in a callout. +- In episode 2, the DNA methylation data discussion could be moved to episode 1, the broom package is possibly unnecessary and we could just use summary(), the advanced content to compute the t-statistics by hand would probably be better off in a sidebar. +- In episode 3, the Introduction and coefficient estimates section could be moved to episode 2, the coefficient estimates section should probably go before discussion of singularities, more explanation of the heat map in the cross-validation section required to clarify what it shows. + +## Carpentries-specific + +☑ The lesson has been developed using The Carpentries template. As such, a number of requirements are fulfilled: + +- Alt text and captions complete for Episodes 1, 2, 5 in line with [The Carpentries guide](https://carpentries.org/blog/2022/11/accathon/). +- Conforms to the [The Carpentries Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). +- Testing that the lesson is appropriate for the target audience identified, is accurate, descriptive and easy to understand and is structured to manage cognitive load. +- Does not use dismissive language. +- All lesson tools are open source and the data sets are accessible. +- Tools and data checked for CC0 license compatibility. Most are compatible, exceptions to check are listed below. +- Data sets are representative of data typically encountered in the domain. +- Tests that the example tasks and narrative of the lesson are appropriate and realistic. +- Tested that the solutions to all exercises are accurate and sufficiently explained, and that the tasks and formats are appropriate for the expected experience level of the target audience. +- Exercises are designed with diagnostic power. +- The learning objectives are clear, descriptive and measurable, and focus on the skills being taught and not the functions/tools e.g. "filter the rows of a data frame based on the contents of one or more columns," rather than "use the filter function on a data frame." +- The target audience identified for the lesson is specific and realistic. +- Tested that the list of required prior skills and/or knowledge is complete and accurate. +- The setup and installation instructions are complete, accurate, and easy to follow. +- It has been taught at least two times by Instructors who had not been heavily involved in the development of the +lesson before that point. + +☒ The points still to be addressed are: + +- Conversion to The Carpentries Workbench +- Alt text for Episode 3 and Challenge figures in Episode 4, 6 and 7 required. +- Ensuring that Alt text is accessible from the WAVE Web Accessibility Evaluation Tool or associated browser extensions. +- Ensure tools and data sets have CC0 compatible license: CC0 (required by Carpentries) vs GPL (>= 2) license ("genridge" for prostate data, limma, glmnet, cluster, pheatmap, dendextend), GPL-3 (PCAtools, SingleCellExperiment, scater, bluster, ), Artistic-2.0 (for BiocManager, minfi, SummarizedExperiment packages), GPL (knitr) compatibility, CC-BY (Horvath data), LGPL-3 (clValid). +- The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). +- Ensuring that the lesson content does not make extensive use of contractions ("can’t" instead of "cannot" etc). +- The example data sets are described. +- All key terms are contained in the internal lesson glossary. +- Check that the lesson includes exercises in a variety of formats. +- All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. Mostly complete apart from Challenges needed to test "Understand the importance of clustering in high-dimensional data" in Episode 6, and "Understand when to use hierarchical clustering on high-dimensional data" and "Explore different distance matrix methods" in Episode 7. From 037adb210e557ce02510e770192f41a4d8387dab Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 14 Mar 2024 13:33:12 +0000 Subject: [PATCH 02/28] replace default text in readme? Was this just standard text from when the repo was cloned? --- README.md | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/README.md b/README.md index 3dcf224b..21248fd8 100644 --- a/README.md +++ b/README.md @@ -2,21 +2,7 @@ [![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://swc-slack-invite.herokuapp.com/) -**Thanks for contributing to The Carpentries Incubator!** -This repository provides a blank starting point for lessons to be developed -here. - -A member of the [Carpentries Curriculum Team](https://carpentries.org/team/) -will work with you to get your lesson listed on the -[Community Developed Lessons page][community-lessons] -and make sure you have everything you need to begin developing your new lesson. - -## What to do next - -Before you begin developing your new lesson, -here are a few things we recommend you do: - -* [ ] [Add relevant topic tags to your lesson repository][cdh-topic-tags]. +This repository is part of The Carpentries Incubator, a place for The Carpentries community to collaboratively create, test, and improve lessons. ## Contributing From 44e79f44cce02ce6512af023f410ec0f782a63be Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 14 Mar 2024 13:36:59 +0000 Subject: [PATCH 03/28] link to reviews in readme --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 21248fd8..6bc3efbc 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,10 @@ Look for the tag This indicates that the maintainers will welcome a pull request fixing this issue. +## Reviews + +The lesson has been iteratively developed and improved. For information on the development process, reviews and feedback from instructors following teaching see [REVIEWS](reviews.md). + ## Maintainer(s) Current maintainers of this lesson are From b2fbee7da9bc690c8da1998b16bbe2cbe7c4540f Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 14 Mar 2024 13:37:30 +0000 Subject: [PATCH 04/28] change reviews title for consistency with contributing, authors etc --- reviews.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reviews.md b/reviews.md index 5fb76361..5236e787 100644 --- a/reviews.md +++ b/reviews.md @@ -1,4 +1,4 @@ -# Reviews: high dimensional statistics with R +# Reviews The purpose of this document is to summarise and track how the lesson has developed in response to peer reviews, feedback from instructors and Carpentries advice. We also detail the main changes that still need to be made and thus define a roadmap to publication. Note that the lesson has been developed over around 3 years and iteratively improved. This document only highlights reviews contributed by reviewers external to the main authors, except following rounds of teaching. Details of other improvements can be found throughout the repository and the list of authors is given in [AUTHORS](AUTHORS). From abf90f21b5b302245eef9f9fc47157a290a06f9b Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Fri, 15 Mar 2024 13:00:44 +0000 Subject: [PATCH 05/28] update addressed points following meeting --- reviews.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/reviews.md b/reviews.md index 5236e787..3b84fba7 100644 --- a/reviews.md +++ b/reviews.md @@ -129,7 +129,7 @@ The instructors liked teaching the course and found it fun to teach. Comments la - Testing that the lesson is appropriate for the target audience identified, is accurate, descriptive and easy to understand and is structured to manage cognitive load. - Does not use dismissive language. - All lesson tools are open source and the data sets are accessible. -- Tools and data checked for CC0 license compatibility. Most are compatible, exceptions to check are listed below. +- Tools and data checked for CC0 license compatibility. - Data sets are representative of data typically encountered in the domain. - Tests that the example tasks and narrative of the lesson are appropriate and realistic. - Tested that the solutions to all exercises are accurate and sufficiently explained, and that the tasks and formats are appropriate for the expected experience level of the target audience. @@ -140,16 +140,13 @@ The instructors liked teaching the course and found it fun to teach. Comments la - The setup and installation instructions are complete, accurate, and easy to follow. - It has been taught at least two times by Instructors who had not been heavily involved in the development of the lesson before that point. +- Check that the lesson includes exercises in a variety of formats. ☒ The points still to be addressed are: - Conversion to The Carpentries Workbench - Alt text for Episode 3 and Challenge figures in Episode 4, 6 and 7 required. -- Ensuring that Alt text is accessible from the WAVE Web Accessibility Evaluation Tool or associated browser extensions. -- Ensure tools and data sets have CC0 compatible license: CC0 (required by Carpentries) vs GPL (>= 2) license ("genridge" for prostate data, limma, glmnet, cluster, pheatmap, dendextend), GPL-3 (PCAtools, SingleCellExperiment, scater, bluster, ), Artistic-2.0 (for BiocManager, minfi, SummarizedExperiment packages), GPL (knitr) compatibility, CC-BY (Horvath data), LGPL-3 (clValid). - The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). -- Ensuring that the lesson content does not make extensive use of contractions ("can’t" instead of "cannot" etc). - The example data sets are described. - All key terms are contained in the internal lesson glossary. -- Check that the lesson includes exercises in a variety of formats. - All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. Mostly complete apart from Challenges needed to test "Understand the importance of clustering in high-dimensional data" in Episode 6, and "Understand when to use hierarchical clustering on high-dimensional data" and "Explore different distance matrix methods" in Episode 7. From 291b2f62c03184f73ba49fff137c28c95661c1b4 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Wed, 20 Mar 2024 14:06:21 +0000 Subject: [PATCH 06/28] Add how to cite --- CITATION | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CITATION b/CITATION index 56ece3c4..c65503dd 100644 --- a/CITATION +++ b/CITATION @@ -1 +1,2 @@ -FIXME: describe how to cite this lesson. \ No newline at end of file +O’Callaghan, A., Robertson, G., Vallejos, C., Ewing, A., Meynert, A., and Becher, H. (2024). High dimensional statistics with R. https://github.com/ +carpentries-incubator/high-dimensional-stats-r. From f6aece7698868f776f38a1bde4fd60ee72076414 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Wed, 20 Mar 2024 14:07:49 +0000 Subject: [PATCH 07/28] Add empty data page file --- _extras/data.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 _extras/data.md diff --git a/_extras/data.md b/_extras/data.md new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/_extras/data.md @@ -0,0 +1 @@ + From f3c49013b4cfc41f6a05282fce526b81af872b3c Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Wed, 20 Mar 2024 15:34:11 +0000 Subject: [PATCH 08/28] add prostate data description --- _extras/data.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/_extras/data.md b/_extras/data.md index 8b137891..45198ecb 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -1 +1,22 @@ +--- +title: "Data" +--- + +## Prostate cancer data +[Source](https://search.r-project.org/CRAN/refmans/bayesQR/html/Prostate.html) + +Prostate specific antigen values and clinical measures for 97 patients hospitalised for a radical prostatectomy. Prostate specimens underwent histological and morphometric analysis. The column names refer to + +- lcavol: log(cancer volume) +- lweight: log(prostate weight) +- age: age +- lbph: log(benign prostatic hyperplasia amount) +- svi: seminal vesicle invasion +- lcp: log(capsular penetration) +- gleason: Gleason score +- pgg45: percentage Gleason scores 4 or 5 +- lpsa: log(prostate specific antigen) + + +{% include links.md %} From 10ab7a6c7419c7ac781bdb14fe715b44bc8637d7 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Wed, 20 Mar 2024 16:01:33 +0000 Subject: [PATCH 09/28] partial fill of methylation data looking for the other information --- _extras/data.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/_extras/data.md b/_extras/data.md index 45198ecb..dc11d73c 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -2,7 +2,7 @@ title: "Data" --- -## Prostate cancer data +# Prostate cancer data [Source](https://search.r-project.org/CRAN/refmans/bayesQR/html/Prostate.html) Prostate specific antigen values and clinical measures for 97 patients hospitalised for a radical prostatectomy. Prostate specimens underwent histological and morphometric analysis. The column names refer to @@ -17,6 +17,28 @@ Prostate specific antigen values and clinical measures for 97 patients hospitali - pgg45: percentage Gleason scores 4 or 5 - lpsa: log(prostate specific antigen) +# Methylation data + +[Source](https://bioconductor.org/packages/release/data/experiment/html/FlowSorted.Blood.EPIC.html) + +Illumina Human Methylation data from EPIC on sorted peripheral adult blood cell populations. The data record DNA methylation assays for each individual, which measure, for many sites in the genome, the proportion of DNA that carries a methyl mark (a chemical modification that does not alter the DNA sequence). The methylation assays are recorded as normalised methylation levels (M-values), where negative values correspond to unmethylated DNA and positive values correspond to methylated DNA. The data object also contains phenotypic metadata for each individual such as age and BMI. Precisely, the data object contains: + +- assay(data): normalised methylation levels +- colData(data): individual-level information: + - Sample_Well: + - Sample_Name: + - purity: + - Sex: + - Age: age in years + - weight_kg: weight in kilograms + - height_m: height in metres + - bmi: BMI + - bmi_clas: BMI class + - Ethnicity_wide: + - Ethnic_self: + - smoker: yes/no indicator of smoker status + - Array: + - Slide: {% include links.md %} From c10a95b8a3abee9a9473b7aff13c4126133a82be Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 11:29:17 +0000 Subject: [PATCH 10/28] complete methylation data description --- _extras/data.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/_extras/data.md b/_extras/data.md index dc11d73c..f6086a26 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -24,21 +24,21 @@ Prostate specific antigen values and clinical measures for 97 patients hospitali Illumina Human Methylation data from EPIC on sorted peripheral adult blood cell populations. The data record DNA methylation assays for each individual, which measure, for many sites in the genome, the proportion of DNA that carries a methyl mark (a chemical modification that does not alter the DNA sequence). The methylation assays are recorded as normalised methylation levels (M-values), where negative values correspond to unmethylated DNA and positive values correspond to methylated DNA. The data object also contains phenotypic metadata for each individual such as age and BMI. Precisely, the data object contains: - assay(data): normalised methylation levels -- colData(data): individual-level information: - - Sample_Well: - - Sample_Name: - - purity: - - Sex: +- colData(data): individual-level information + - Sample_Well: sample well + - Sample_Name: name of sample + - purity: sample cell purity + - Sex: sex - Age: age in years - weight_kg: weight in kilograms - height_m: height in metres - bmi: BMI - bmi_clas: BMI class - - Ethnicity_wide: - - Ethnic_self: + - Ethnicity_wide: ethnicity, wide class + - Ethnic_self: ethnicity, self-identified - smoker: yes/no indicator of smoker status - - Array: - - Slide: + - Array: type of array from the EPIC array library + - Slide: slide identifier {% include links.md %} From ee59d35e85817ca631b030e99e21ba77b2ff7d35 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 11:39:09 +0000 Subject: [PATCH 11/28] add titles for other data sets --- _extras/data.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/_extras/data.md b/_extras/data.md index f6086a26..5a6e04d7 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -38,7 +38,13 @@ Illumina Human Methylation data from EPIC on sorted peripheral adult blood cell - Ethnic_self: ethnicity, self-identified - smoker: yes/no indicator of smoker status - Array: type of array from the EPIC array library - - Slide: slide identifier + - Slide: slide identifier + + # Horvath data + + # Breast cancer gene expression data + + # Single-cell RNA sequencing data {% include links.md %} From 674fba9670704900cac587606aec990631498960 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 11:53:01 +0000 Subject: [PATCH 12/28] fill Horvath, needs more info --- _extras/data.md | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/_extras/data.md b/_extras/data.md index 5a6e04d7..437cf0b9 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -40,8 +40,38 @@ Illumina Human Methylation data from EPIC on sorted peripheral adult blood cell - Array: type of array from the EPIC array library - Slide: slide identifier - # Horvath data +# Horvath data +[Source](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0014821#s5) + +Methylation markers across different age groups. + +- CpGmarker: +- CoefficientTraining +- CoefficientTrainingShrunk: +- varByCpG: +- minByCpG: +- maxByCpG: +- medianByCpG: +- medianByCpGYoung: +- medianByCpGOld: +- Gene_ID: +- GenomeBuild: +- Chr: +- MapInfo: +- SourceVersion: +- TSS_Coordinate: +- Gene_Strand: +- Symbol: +- Synonym: +- Accession: +- GID: +- Annotation: +- Product: +- Marginal.Age.Relationship: + + Can't find these data or a full data description in the paper? It's also not clear to me what these data record? Methylation markers by age? + # Breast cancer gene expression data # Single-cell RNA sequencing data From d567d8a8b0c7b632acda817f389287b0bc5365d8 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 11:53:48 +0000 Subject: [PATCH 13/28] is the source the right reference? --- _extras/data.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_extras/data.md b/_extras/data.md index 437cf0b9..8cf7e1cf 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -71,6 +71,7 @@ Methylation markers across different age groups. - Marginal.Age.Relationship: Can't find these data or a full data description in the paper? It's also not clear to me what these data record? Methylation markers by age? + Is the source the right reference? # Breast cancer gene expression data From 944dac09f0e239479a44e8b2d5ee75a94174b50e Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 12:17:38 +0000 Subject: [PATCH 14/28] fill out breast cancer data based on the associated paper, are the RFS variables right? --- _extras/data.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/_extras/data.md b/_extras/data.md index 8cf7e1cf..d2cba494 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -73,9 +73,24 @@ Methylation markers across different age groups. Can't find these data or a full data description in the paper? It's also not clear to me what these data record? Methylation markers by age? Is the source the right reference? - # Breast cancer gene expression data +# Breast cancer gene expression data - # Single-cell RNA sequencing data +[Source](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2990) + +Gene expression data showing microarray results for different probes used to examine gene expression profiles in 91 different breast cancer patient samples and metdata for the sampled patients. + +- assay(data): gene expression data for each individual +- colData(data): individual-level information + - Study: study identifier + - Age: age in years + - Distant.RFS: indicator of distant relapse free survival + - ER: estrogen receptor positive or negative status + - GGI: gene expression grade index + - Grade: histologic grade + - Size: tumour size in cm + - Time.RFS: time between the date of surgery and diagnosis of relapse (time in relapse free survival, RFS) + +# Single-cell RNA sequencing data {% include links.md %} From 66cc3ba79d37955be8899dd5e1c7d67175c39028 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 13:10:28 +0000 Subject: [PATCH 15/28] fill scrnaseq data, see extended description I can see that this was pre-processed in scrnaseq - what's going on there? Not sure what some of the variables are or mean from the paper --- _extras/data.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/_extras/data.md b/_extras/data.md index d2cba494..bd08b2dd 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -92,5 +92,24 @@ Gene expression data showing microarray results for different probes used to exa # Single-cell RNA sequencing data +[Source](https://pubmed.ncbi.nlm.nih.gov/25700174/) + +Gene expression measurements for over 9000 genes in over 3000 mouse cortex and hippocampus cells. These data are an excerpt of the original source. + +- assay(data): gene expression data +- colData(data): individual cell-level information + - tissue: tissue type + - group #: group number + - total mRNA mol: + - well: + - sex: sex + - age: age + - diameter: + - cell_id: cell identifier + - level1class: + - level2class: + - sizeFactor: + + {% include links.md %} From 3998874634582ba6f9c648cad63727719a316f9b Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 21 Mar 2024 17:24:27 +0000 Subject: [PATCH 16/28] update reviews with new PR info --- reviews.md | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/reviews.md b/reviews.md index 3b84fba7..85bfff9c 100644 --- a/reviews.md +++ b/reviews.md @@ -101,13 +101,7 @@ The learners found the lessons relevant to their work, particularly episodes 4 a Many of the comments related to timings (allowing more time) and clarifying wording. -☑ Changes were made in response to most comments in [#86](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/86), [#89](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/88) and many remaining changes are evident in the current lesson. - -☒ The remaining points to be addressed are: - -- Providing function definitions or access to function documentation (i.e., clarifying arguments). This noted by both instructors and learners. -- Removing or clarifying example 2 of Challenge 1, Episode 4. The instructors noted that it could be interpreted as PCA-appropriate. -- Rewording Challenge 2 in Episode 4. The instructors noted that some of the learners said it seemed like a trick question. +☑ Changes were made in response to most comments in [#86](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/86), [#89](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/88) and [#167](https://github.com/carpentries-incubator/high-dimensional-stats-r/pull/167), and any remaining changes are evident in the current lesson. **Feedback from February 2024 teaching ([#145](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/145))** @@ -115,7 +109,6 @@ The instructors liked teaching the course and found it fun to teach. Comments la ☒ Changes in response to this feedback are ongoing and documented in the issue. The points that remain to be addressed are: - - Possibly could describe the R packages/rationalise them in greater detail to reduce the risk of technical issues. - The factor analysis episode could be combined into another episode (potentially doesn't need to be its own episode) since the course is pitched at biologists. This could be combined into the PCA episode in a callout. - In episode 2, the DNA methylation data discussion could be moved to episode 1, the broom package is possibly unnecessary and we could just use summary(), the advanced content to compute the t-statistics by hand would probably be better off in a sidebar. - In episode 3, the Introduction and coefficient estimates section could be moved to episode 2, the coefficient estimates section should probably go before discussion of singularities, more explanation of the heat map in the cross-validation section required to clarify what it shows. From 60b9da31a1ce1e5a6bcfe1098aef5a46c1718580 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Mon, 25 Mar 2024 14:39:32 +0000 Subject: [PATCH 17/28] remove most Horvath variables --- _extras/data.md | 29 +---------------------------- 1 file changed, 1 insertion(+), 28 deletions(-) diff --git a/_extras/data.md b/_extras/data.md index bd08b2dd..708ebb7b 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -44,34 +44,7 @@ Illumina Human Methylation data from EPIC on sorted peripheral adult blood cell [Source](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0014821#s5) -Methylation markers across different age groups. - -- CpGmarker: -- CoefficientTraining -- CoefficientTrainingShrunk: -- varByCpG: -- minByCpG: -- maxByCpG: -- medianByCpG: -- medianByCpGYoung: -- medianByCpGOld: -- Gene_ID: -- GenomeBuild: -- Chr: -- MapInfo: -- SourceVersion: -- TSS_Coordinate: -- Gene_Strand: -- Symbol: -- Synonym: -- Accession: -- GID: -- Annotation: -- Product: -- Marginal.Age.Relationship: - - Can't find these data or a full data description in the paper? It's also not clear to me what these data record? Methylation markers by age? - Is the source the right reference? +Methylation markers across different age groups. The CpGmarker variable used in this lesson are CpG site encodings. # Breast cancer gene expression data From 4aa4391dba2fdb919099cd1f7ffaed7a8acdccaa Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Mon, 25 Mar 2024 15:42:19 +0000 Subject: [PATCH 18/28] remove glossary, see new data glossary and #89 --- reference.md | 1 - 1 file changed, 1 deletion(-) diff --git a/reference.md b/reference.md index f7cdcb6c..24bac376 100644 --- a/reference.md +++ b/reference.md @@ -2,7 +2,6 @@ layout: reference --- -## Glossary {% include links.md %} From a6c5d4a78f18f55d3f9b5b8b2f7fafe5c41d4599 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Tue, 26 Mar 2024 09:10:24 +0000 Subject: [PATCH 19/28] alan fill data page Co-authored-by: Alan O'Callaghan --- _extras/data.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/_extras/data.md b/_extras/data.md index 708ebb7b..c725628f 100644 --- a/_extras/data.md +++ b/_extras/data.md @@ -73,15 +73,15 @@ Gene expression measurements for over 9000 genes in over 3000 mouse cortex and h - colData(data): individual cell-level information - tissue: tissue type - group #: group number - - total mRNA mol: - - well: - - sex: sex - - age: age - - diameter: + - total mRNA mol: total number of observed mRNA molecules corresponding to this cell's unique barcode identifier + - well: the well that this cell's cDNA was stored in during processing + - sex: sex of the donor animal + - age: age of the donor animal + - diameter: estimated cell diameter - cell_id: cell identifier - - level1class: - - level2class: - - sizeFactor: + - level1class: a cluster label identified using a mix of computational techniques and manual annotation + - level2class: a cluster label identified using a mix of computational techniques and manual annotation + - sizeFactor: estimate size factor calculated for scaling normalisation using (e.g., **`scran`**). {% include links.md %} From 83fa691095ff31aef617c2714041044cbd967a79 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Tue, 26 Mar 2024 09:12:47 +0000 Subject: [PATCH 20/28] update for complete alt text --- reviews.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/reviews.md b/reviews.md index 85bfff9c..19d1de10 100644 --- a/reviews.md +++ b/reviews.md @@ -117,7 +117,7 @@ The instructors liked teaching the course and found it fun to teach. Comments la ☑ The lesson has been developed using The Carpentries template. As such, a number of requirements are fulfilled: -- Alt text and captions complete for Episodes 1, 2, 5 in line with [The Carpentries guide](https://carpentries.org/blog/2022/11/accathon/). +- Alt text and captions complete in line with [The Carpentries guide](https://carpentries.org/blog/2022/11/accathon/). - Conforms to the [The Carpentries Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). - Testing that the lesson is appropriate for the target audience identified, is accurate, descriptive and easy to understand and is structured to manage cognitive load. - Does not use dismissive language. @@ -138,7 +138,6 @@ lesson before that point. ☒ The points still to be addressed are: - Conversion to The Carpentries Workbench -- Alt text for Episode 3 and Challenge figures in Episode 4, 6 and 7 required. - The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). - The example data sets are described. - All key terms are contained in the internal lesson glossary. From 4ca0a4d9be4d4722bb405404cb8d5442390add6f Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Tue, 26 Mar 2024 09:14:46 +0000 Subject: [PATCH 21/28] update for data page and key points glossary --- reviews.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/reviews.md b/reviews.md index 19d1de10..fe286ca9 100644 --- a/reviews.md +++ b/reviews.md @@ -133,12 +133,12 @@ The instructors liked teaching the course and found it fun to teach. Comments la - The setup and installation instructions are complete, accurate, and easy to follow. - It has been taught at least two times by Instructors who had not been heavily involved in the development of the lesson before that point. -- Check that the lesson includes exercises in a variety of formats. +- Check that the lesson includes exercises in a variety of formats. +- The example data sets are described. + - Key terms are contained in the internal glossary in the form of key points. ☒ The points still to be addressed are: - Conversion to The Carpentries Workbench - The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). -- The example data sets are described. -- All key terms are contained in the internal lesson glossary. - All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. Mostly complete apart from Challenges needed to test "Understand the importance of clustering in high-dimensional data" in Episode 6, and "Understand when to use hierarchical clustering on high-dimensional data" and "Explore different distance matrix methods" in Episode 7. From d023db938dc4aa856bb0332f9af068fee7d49952 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Wed, 27 Mar 2024 15:17:19 +0000 Subject: [PATCH 22/28] update reviews for completing changes in response to recent teaching feedback --- reviews.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/reviews.md b/reviews.md index fe286ca9..3eee81c9 100644 --- a/reviews.md +++ b/reviews.md @@ -107,11 +107,7 @@ Many of the comments related to timings (allowing more time) and clarifying word The instructors liked teaching the course and found it fun to teach. Comments largely related to timing adjustments, explaining packages, adjusting the way the factor analysis episode is presented and some structural changes to the first three episodes. -☒ Changes in response to this feedback are ongoing and documented in the issue. The points that remain to be addressed are: - - - The factor analysis episode could be combined into another episode (potentially doesn't need to be its own episode) since the course is pitched at biologists. This could be combined into the PCA episode in a callout. -- In episode 2, the DNA methylation data discussion could be moved to episode 1, the broom package is possibly unnecessary and we could just use summary(), the advanced content to compute the t-statistics by hand would probably be better off in a sidebar. -- In episode 3, the Introduction and coefficient estimates section could be moved to episode 2, the coefficient estimates section should probably go before discussion of singularities, more explanation of the heat map in the cross-validation section required to clarify what it shows. +☒ Changes in response to this feedback are documented in the issue. ## Carpentries-specific From e962989537fe67931843075c7bfb9b92c468b3b0 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 28 Mar 2024 11:15:24 +0000 Subject: [PATCH 23/28] remove reintroduction of prostate data, episode 5 should add a link to the data page once live --- _episodes_rmd/05-factor-analysis.Rmd | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) diff --git a/_episodes_rmd/05-factor-analysis.Rmd b/_episodes_rmd/05-factor-analysis.Rmd index 261d925f..11674e24 100644 --- a/_episodes_rmd/05-factor-analysis.Rmd +++ b/_episodes_rmd/05-factor-analysis.Rmd @@ -80,29 +80,10 @@ components are ordered by the amount of variance they account for. # Prostate cancer patient data -The prostate dataset represents data from 97 men who have prostate cancer. -The data come from a study which examined the correlation between the level -of prostate specific antigen and a number of clinical measures in men who were -about to receive a radical prostatectomy. The data have 97 rows and 9 columns. +We revisit the prostate dataset of 97 men who have prostate cancer. Although not strictly a high-dimensional dataset, as with other episodes, we use this dataset to explore the method. - -Columns are: - - -- `lcavol`: log (cancer volume) -- `lweight`: log (prostate weight) -- `age`: age (years) -- `lbph`: log (benign prostatic hyperplasia amount) -- `svi`: seminal vesicle invasion -- `lcp`: log (capsular penetration); amount of spread of cancer in outer walls - of prostate -- `gleason`: [Gleason score](https://en.wikipedia.org/wiki/Gleason_grading_system) -- `pgg45`: percentage Gleason scores 4 or 5 -- `lpsa`: log (prostate specific antigen) - - In this example, we use the clinical variables to identify factors representing various clinical variables from prostate cancer patients. Two principal components have already been identified as explaining a large proportion From caf6723f3df71d8d55e430b51d87c1f1ff899183 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 28 Mar 2024 15:04:08 +0000 Subject: [PATCH 24/28] update changes made in response to Emma Rand --- reviews.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/reviews.md b/reviews.md index 3eee81c9..fe21e9a0 100644 --- a/reviews.md +++ b/reviews.md @@ -16,11 +16,7 @@ The reviewer liked this episode as an introduction to the course, particularly t The reviewer particularly liked that this episode demonstrates why we need alternative approaches to regression for high-dimensional data and the multiple testing section. Although many comments were given, the reviewer highlighted that the episode was long, that new concepts should be removed from Challenge 1 and that the smoking model figure should be corrected. The review also highlighted issues with the remote theme. -☑ Changes were made in line with the suggestions exactly, including reducing the length of the lesson. The changes are itemised in the issue and the associated points in [#64](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/64). - -☒ The only point that remains to be addressed: - -- The first challenge makes good points but introduces new concepts rather than tests presented content. +☑ Changes were made in line with the suggestions exactly, including reducing the length of the lesson. The changes are itemised in the issue, the associated points in [#64](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/64) and the current version of the lesson. **Review by Emma Rand on Episode 3: Regularised regression ([#49](https://github.com/carpentries-incubator/high-dimensional-stats-r/issues/49))** From 507428000809295a5a2939244601bcd8ae644edb Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 28 Mar 2024 15:05:56 +0000 Subject: [PATCH 25/28] cross to tick --- reviews.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reviews.md b/reviews.md index fe21e9a0..c65d6ca0 100644 --- a/reviews.md +++ b/reviews.md @@ -103,7 +103,7 @@ Many of the comments related to timings (allowing more time) and clarifying word The instructors liked teaching the course and found it fun to teach. Comments largely related to timing adjustments, explaining packages, adjusting the way the factor analysis episode is presented and some structural changes to the first three episodes. -☒ Changes in response to this feedback are documented in the issue. +☑ Changes in response to this feedback are documented in the issue. ## Carpentries-specific From 8f5c8d5cb75c2419c7ab8e4e0464eca5329b7a9b Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 28 Mar 2024 15:32:55 +0000 Subject: [PATCH 26/28] complete challenge alignment with objective tasks as in #171 --- reviews.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/reviews.md b/reviews.md index c65d6ca0..f011bd5d 100644 --- a/reviews.md +++ b/reviews.md @@ -127,10 +127,10 @@ The instructors liked teaching the course and found it fun to teach. Comments la lesson before that point. - Check that the lesson includes exercises in a variety of formats. - The example data sets are described. - - Key terms are contained in the internal glossary in the form of key points. +- Key terms are contained in the internal glossary in the form of key points. +- All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. ☒ The points still to be addressed are: - Conversion to The Carpentries Workbench - The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). -- All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. Mostly complete apart from Challenges needed to test "Understand the importance of clustering in high-dimensional data" in Episode 6, and "Understand when to use hierarchical clustering on high-dimensional data" and "Explore different distance matrix methods" in Episode 7. From fe69966f1f33fd480e385403aedb58b424e77e59 Mon Sep 17 00:00:00 2001 From: Mary Llewellyn Date: Thu, 28 Mar 2024 16:01:31 +0000 Subject: [PATCH 27/28] cars and farm data removed, update reviews document --- reviews.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/reviews.md b/reviews.md index f011bd5d..c95990d1 100644 --- a/reviews.md +++ b/reviews.md @@ -129,8 +129,9 @@ lesson before that point. - The example data sets are described. - Key terms are contained in the internal glossary in the form of key points. - All lesson and episode objectives are assessed by exercises or another opportunity for formative assessment. +- The lesson does not make use of superfluous data sets. + ☒ The points still to be addressed are: - Conversion to The Carpentries Workbench -- The lesson does not make use of superfluous data sets: ensuring that later uses of microarray and scrnaseq data is justified (as opposed to prostate/methylation). From 301ee1e92745d33048dd7686c5f00fa191fca6d4 Mon Sep 17 00:00:00 2001 From: Ailith Ewing <54178580+ailithewing@users.noreply.github.com> Date: Tue, 2 Apr 2024 09:17:01 +0100 Subject: [PATCH 28/28] Adjust author list --- CITATION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CITATION b/CITATION index c65503dd..b020a963 100644 --- a/CITATION +++ b/CITATION @@ -1,2 +1,2 @@ -O’Callaghan, A., Robertson, G., Vallejos, C., Ewing, A., Meynert, A., and Becher, H. (2024). High dimensional statistics with R. https://github.com/ +O’Callaghan A, Robertson G, LLewellyn M, Becher H, Meynert A, Vallejos C, Ewing A. (2024). High dimensional statistics with R. https://github.com/ carpentries-incubator/high-dimensional-stats-r.