-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review comments: Episode 5 - factor analysis #118
Comments
Shouldn't include the index variable in factor analysis... Generally use column names to index any variables, not indices, imo this is R 101 |
Also remove the mention of rotations, because it's not explained at all |
Convert to task list:
"If the p-value is less than 0.05, we reject the null hypothesis that the number of factors is sufficient. If the p-value is greater than 0.05, we do not reject the null hypothesis that the number of factors used captures variation in the data. We often therefore conclude that this number of factors is sufficient" rather than "If the p-value is less than 0.05, we reject the null hypothesis and accept that the number of factors included is too small. If the p-value is greater than 0.05, we accept the null hypothesis that the number of factors used captures variation in the data." Would also add "and we repeat the analysis with more factors. When the p-value is greater than 0.05..." after the first sentence to make it clear that this is iterative. Also, I know we don't want to complicate things, but it may be more accurate to say "if the p-value is less than our significance level..." instead of using 0.05 as a hard threshold. If the p-value was 0.06, you'd probably also reject in practice?
|
I really like this episode and think the length is good given the information in the previous episode. I have relatively few comments, listed below.
I will also submit pull requests!
Line 18 & 20/Keypoints: these key points about identifying the number of factors are discussed at the start but I think should also be mentioned where choosing the number of factors is discussed (paragraph beginning Line 198)
Line 32/Introduction: This possibly needs to differentiate between FA and PCA and when you may use them more clearly. It's sort of covered towards the end of the introduction, but I think it needs to be more explicit.
Line 35: "Here, we introduce more general set of methods..." -> "Here we introduce an alternative but related set of methods.." to clarify that they're different approaches.
Line 40/Introduction: "latent variable" not defined until later. Could just define here instead.
Line 41/Introduction: I would remove "data-driven" here as both EFA and CFA are data-driven techniques.
Line 54/An example: Call this section "Student scores" for consistency with other episodes?
Line 74/Advantages and disadvantages of Factor Analysis: As in the last episode, I think it is hard to understand
the advantages and disadvantages of FA without understanding what it is. I think this should come at the end of the episode.
Line 190/Performing EFA: A brief statement summarising the interpretation of factors/loadings in this example may be useful here just to clarify why you might use EFA.
Line 198/Performing EFA: Could add a section heading here for consistency with PCA episode/consistency. Could also
back reference to PCA in Line 200 to highlight the similarities between the approaches here.
Line 200/Performing EFA: "In practise, we repeat the factor analysis using different values in the
factors
argument." ->"In practice, we repeat the factor analysis for different numbers of factors (by specifying different values in the
factors
argument) since the upshot is that we're changing the number of factors.Line 206/Performing EFA: the hypothesis test wording. Should it be:
"If the p-value is less than 0.05, we reject the null hypothesis that the number of factors is sufficient. If the p-value
is greater than 0.05, we do not reject the null hypothesis that the number of factors used captures variation in the data. We
often therefore conclude that this number of factors is sufficient"
rather than
"If the p-value is less than 0.05, we reject the null hypothesis and accept that the number of factors included is too small. If the p-value is greater than 0.05, we accept the null hypothesis that the number of factors used captures variation in the data."
Would also add "and we repeat the analysis with more factors. When the p-value is greater than 0.05..." after the first sentence to make it clear that this is iterative.
Also, I know we don't want to complicate things, but it may be more accurate to say "if the p-value is less than our significance level..." instead of using 0.05 as a hard threshold. If the p-value was 0.06, you'd probably also reject in practice?
Line 261: This feels incomplete. Maybe could include a brief statement about what this plot tells us about the relationship between variables and factors to tie things together.
Line 269/Challenge 2: "discuss in groups" should maybe be adapted for the individual learner. "Consider or discuss in groups" as proposed by my review of episode 3?
Minor comments
Captions and alt text.
Line 43/Introduction: "a-priori" -> "a priori".
Line 100/Prostate cancer patient data: I like that the prostate data is used as a simple example in these episodes but again think that it needs to be made clear that it's not high-dimensional and is used for pedagogical purposes!
Line 200/Performing EFA: "In practise" -> "In practice"
Line 203/Performing EFA: "output shows" -> "output then shows"
Line 224/Performing EFA: "explaind" -> "explained"
The text was updated successfully, but these errors were encountered: