Third delivery suggested changes #64

ailithewing · 2022-05-20T14:15:40Z

A list of proposed changes following the May delivery of HDS

These are in addition to the changes in the pull request ailith_delivery3 and to the changes that Hannes made that have yet to be pushed to the main course materials.

Throughout

bold package names and include () for functions

Intro

Regression with many features (many outcomes)

rank results in toptable by effect size
include small intro to feature selection to motivate why these techniques are useful as we took the feature selection lesson out of the 2-day course.
check exercises aren't introducing new concepts
check direction of smoker is consistent between model and plot
Add brackets for function names in text, e.g. pairs() (from Emma's review in Review comments: Introduction to high-dimensional data #39)
Explore whether the episode can be made shorter or divided (from Emma's review in Review comments: Regression with many features #47)
Add a reference for the source of the methylation data
Change title to regression with many outcomes and add a brief comment to distinguish between dealing with many outcomes and/or many features (we can mention that the regularisation episode will address that). Potentially, we can create a separate episode Regression in high-dimensional settings where we introduce the methylation data and the two different types of problems. However, this is outside the scope for this round of changes. Creating this separate episode would also address some of Emma's concerns.
Add mention of dream() from VariancePartition which is similar to limma but can handle grouping (random effects)

Regularisation

needs split up
- motivation & rationale - in expanded intro
- intro to model selection/cross validation
- what is regularisation in general?
- ridge and lasso
more explanation of Horvath
greater figure explanation in the materials
fix overuse of Xi
more detail on extracting coefficients and model interpretation
glossary of jargon
add link to ML course for related materials (from Self-review notes #7)

CAV (20220206) Link added to episode 1 instead as it's general across different types of ML approaches.

review plot labels (from Self-review notes #7)

CAV (20220206) I can't recall what the specific issue was, but the episode has been extensively revised and labels look ok.

review phrasing in "why would we...?" - Alan marked it as convoluted (from Self-review notes #7)

CAV (20220206) Paragraph was revised, so hopefully OK now.

review ridge/EN equations (partially from Self-review notes #7)

CAV (20220206) Notation review.

in exercise 2, maybe ask why mean squared rather than sum of squared (from Self-review notes #7)
Add brackets for function names in text, e.g. pairs() (from Emma's review in Review comments: Introduction to high-dimensional data #39)
move up the section "Using regularisation to impove generalisability"
add reason for training and test intro, like: "Before we move on to regularised regression, we have to introduce..."
when talking about elastic net, say we've used it all along - lasso and ridge are special cases with alpha=0/1

PCA

consider removing scaling from gene expression pca (include box about gene expression normalisation to emphasise that that's not what we're talking about)
Add brackets for function names in text, e.g. pairs() (from Emma's review in Review comments: Introduction to high-dimensional data #39)
Equation half way down needed at all (which refers to original exaple?
add note the PCAtools taks data in the Bioconductor orientation
STRUCTURAL add table comparing terms for loadings and scores used in different packages

FA

move advantages and disadvantages of FA up so it's in the introduction
more detail on communality and uniqueness
mention confirmatory factor analysis
discuss ways of determining number of factors
Add brackets for function names in text, e.g. pairs() (from Emma's review in Review comments: Introduction to high-dimensional data #39)

K means

Hierarchical clusters

Check there's not a confusing switch between clustering features and clustering observations
Add brackets for function names in text, e.g. pairs() (from Emma's review in Review comments: Introduction to high-dimensional data #39)
improve exercise with linkage method perhaps add examples? : https://towardsdatascience.com/understanding-the-concept-of-hierarchical-clustering-technique-c6e8243758ec
STRUCTURAL add material on when to use which linkage method

Other

Consider temporarily removing optional episodes until reviewed/edited.
Edit setup.md to indicate approx time based on RStudio cloud (~30 mins) (from review comments: setup.md #34)
Check whether the list in dependencies.csv can be reduced (see review comments: setup.md #34)
Test setup.md in different environments. (see review comments: setup.md #34)
Create a docker with setup.md?

The text was updated successfully, but these errors were encountered:

ailithewing · 2022-05-20T14:17:15Z

@catavallejos @nathansam @hwarden162 @alanocallaghan Please add any additional things that I've missed.

nathansam · 2022-05-20T14:47:27Z

kmeans: set seed for heatmap code chunk starting library("pheatmap") (which might be covered by the coloured blocks to do)

hannesbecher · 2022-06-02T14:15:30Z

Challenge 1 in episode 1. Not sure about question 4. Is this a good example of high-dim data? Because it is one observation and so many features?

Predicting probability of a patient's cancer progressing using gene
expression data from 20,000 genes, as well as data associated with general patient health
(age, weight, BMI, blood pressure) and cancer growth (tumour size,
localised spread, blood test results).

alanocallaghan · 2022-06-02T15:04:10Z

Changing that challenge from singular to plural patients would also be good to avoid implying high precision from generic prediction models (ie precision med hype)

hannesbecher · 2022-06-06T14:11:13Z

Current uniqueness/communality explanations contradicts Wikipedia I think: https://en.wikipedia.org/wiki/Factor_analysis#Terminology

alanocallaghan · 2022-06-06T21:14:22Z

One way of reducing the number of dep packages is to move all the data wrangling stuff to a data package and then just remotes::install_github it.

hannesbecher · 2023-10-02T11:12:09Z

Glossary still open, but covered by issue #89

ailithewing assigned catavallejos and ailithewing May 20, 2022

catavallejos mentioned this issue Jun 6, 2022

Self-review notes #7

Closed

7 tasks

hannesbecher mentioned this issue Jun 6, 2022

Ailith delivery3 #66

Closed

catavallejos mentioned this issue Jun 6, 2022

review comments: setup.md #34

Closed

This was referenced Jun 7, 2022

Review comments: Introduction to high-dimensional data #39

Closed

Review comments: Regression with many features #47

Closed

Issues spotted during second delivery #52

Closed

hannesbecher mentioned this issue Aug 8, 2022

Structural changes suggested by after recent #84

Closed

catavallejos mentioned this issue Feb 6, 2023

Changes to episode 3 #100

Merged

hannesbecher closed this as completed Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Third delivery suggested changes #64

Third delivery suggested changes #64

ailithewing commented May 20, 2022 •

edited by hannesbecher

Loading

ailithewing commented May 20, 2022

nathansam commented May 20, 2022 •

edited

Loading

hannesbecher commented Jun 2, 2022

alanocallaghan commented Jun 2, 2022

hannesbecher commented Jun 6, 2022

alanocallaghan commented Jun 6, 2022

hannesbecher commented Oct 2, 2023

Third delivery suggested changes #64

Third delivery suggested changes #64

Comments

ailithewing commented May 20, 2022 • edited by hannesbecher Loading

A list of proposed changes following the May delivery of HDS

Throughout

Intro

Regression with many features (many outcomes)

Regularisation

PCA

FA

K means

Hierarchical clusters

Other

ailithewing commented May 20, 2022

nathansam commented May 20, 2022 • edited Loading

hannesbecher commented Jun 2, 2022

alanocallaghan commented Jun 2, 2022

hannesbecher commented Jun 6, 2022

alanocallaghan commented Jun 6, 2022

hannesbecher commented Oct 2, 2023

ailithewing commented May 20, 2022 •

edited by hannesbecher

Loading

nathansam commented May 20, 2022 •

edited

Loading