-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback from September 2022 delivery #88
Comments
I don't know if these are rhetorical, but
The second example preserves the variable names as is, so when you use predict with newdata it doesn't throw a warning. Should probably work with a dataframe from the start there
I'm not 100% but presumably this is removing the intercept as glmnet automatically adds one. Again probably would be better to set the data up so the code is similar across lm and glmnet calls, although I think that's actually rather difficult |
@alanocallaghan thanks, it wasn't rhetorical and sorry for being unclear. I agree that it would be helpful to either set up the code to be more similar, or to explain the details. |
The first is mentioned in this issue for a fuller explanation #52 |
Many of these are now implemented now. Others have become obsolete due to restructuring. |
DRAFT TO BE UPDATED AFTER DAY 4 - saved here to get started, currently updated to day 3.
EdCarp delivery 2022-09-27 to 2022-09-30, with instructors @hannesbecher, @luciewoellenstein44, @ewallace.
https://edcarp.github.io/2022-09-27_ed-dash_high-dim-stats/
Collaborative document:
https://pad.carpentries.org/2022-09-27_ed-dash_high-dim-stats
Overall went very well, good material, happy and engaged students.
Day 1 - Introduction, Regression with many features
Learner feedback
Please list 1 thing that you liked or found particularly useful
Please list another thing that you found less useful, or that could be improved
Instructor feedback
Day 2 - Regularised regression
Learner feedback
Please list 1 thing that you liked or found particularly useful
the coding and visulisation of the results are really helpful.
Please list another thing that you found less useful, or that could be improved
Instructor feedback
Learners had several questions about extra arguments in calls to lm(), glmnet(), and so on. See etherpad day 2. Those should give clues to places to simplify:
as.data.frame
? Comparing simplerfit_horvath <- lm(train_age ~ train_mat)
to the examplefit_horvath <- lm(train_age ~ ., data = as.data.frame(train_mat))
lasso <- cv.glmnet(methyl_mat[, -1], age, alpha = 1)
Day 3 - Principal component analyses, Factor analysis
Learner feedback
Please list 1 thing that you liked or found particularly useful
Please list another thing that you found less useful, or that could be improved
Instructor feedback
PCA (Episode 4)
biplot
is used forPCAtools::biplot
andstats::biplot
.GSMxxxxx
or211122_s_at
. Also they are hard to read - too small and/or overlapping and give ggrepel error messages.plotloadings
was unclear to instructors and to learners. We wondered how the included variables chosen, and is it important to include it? Reading the?plotloadings
, it's says that therangeRetain
argument gives a "Cut-off value for retaining variables" in terms of "top/bottom fraction of the loadings range". I (Edward) find that unintuitive. For example there are still many points in 1/10000th of the loadings range:plotloadings(pc, labSize = 3, rangeRetain = 1e-5)
Factor analysis (Episode 5)
Day 4 - K-means clustering, Hierarchical clustering
Learner feedback
Instructor feedback
The text was updated successfully, but these errors were encountered: