Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review comments: Episode 5 - factor analysis #118

Closed
mallewellyn opened this issue Feb 21, 2024 · 3 comments
Closed

Review comments: Episode 5 - factor analysis #118

mallewellyn opened this issue Feb 21, 2024 · 3 comments

Comments

@mallewellyn
Copy link
Contributor

I really like this episode and think the length is good given the information in the previous episode. I have relatively few comments, listed below.

I will also submit pull requests!

  • Line 18 & 20/Keypoints: these key points about identifying the number of factors are discussed at the start but I think should also be mentioned where choosing the number of factors is discussed (paragraph beginning Line 198)

  • Line 32/Introduction: This possibly needs to differentiate between FA and PCA and when you may use them more clearly. It's sort of covered towards the end of the introduction, but I think it needs to be more explicit.

  • Line 35: "Here, we introduce more general set of methods..." -> "Here we introduce an alternative but related set of methods.." to clarify that they're different approaches.

  • Line 40/Introduction: "latent variable" not defined until later. Could just define here instead.

  • Line 41/Introduction: I would remove "data-driven" here as both EFA and CFA are data-driven techniques.

  • Line 54/An example: Call this section "Student scores" for consistency with other episodes?

  • Line 74/Advantages and disadvantages of Factor Analysis: As in the last episode, I think it is hard to understand
    the advantages and disadvantages of FA without understanding what it is. I think this should come at the end of the episode.

  • Line 190/Performing EFA: A brief statement summarising the interpretation of factors/loadings in this example may be useful here just to clarify why you might use EFA.

  • Line 198/Performing EFA: Could add a section heading here for consistency with PCA episode/consistency. Could also
    back reference to PCA in Line 200 to highlight the similarities between the approaches here.

  • Line 200/Performing EFA: "In practise, we repeat the factor analysis using different values in the factors argument." ->
    "In practice, we repeat the factor analysis for different numbers of factors (by specifying different values in the factors argument) since the upshot is that we're changing the number of factors.

  • Line 206/Performing EFA: the hypothesis test wording. Should it be:

"If the p-value is less than 0.05, we reject the null hypothesis that the number of factors is sufficient. If the p-value
is greater than 0.05, we do not reject the null hypothesis that the number of factors used captures variation in the data. We
often therefore conclude that this number of factors is sufficient"

rather than

"If the p-value is less than 0.05, we reject the null hypothesis and accept that the number of factors included is too small. If the p-value is greater than 0.05, we accept the null hypothesis that the number of factors used captures variation in the data."

Would also add "and we repeat the analysis with more factors. When the p-value is greater than 0.05..." after the first sentence to make it clear that this is iterative.

Also, I know we don't want to complicate things, but it may be more accurate to say "if the p-value is less than our significance level..." instead of using 0.05 as a hard threshold. If the p-value was 0.06, you'd probably also reject in practice?

  • Line 261: This feels incomplete. Maybe could include a brief statement about what this plot tells us about the relationship between variables and factors to tie things together.

  • Line 269/Challenge 2: "discuss in groups" should maybe be adapted for the individual learner. "Consider or discuss in groups" as proposed by my review of episode 3?

Minor comments

  • Captions and alt text.

  • Line 43/Introduction: "a-priori" -> "a priori".

  • Line 100/Prostate cancer patient data: I like that the prostate data is used as a simple example in these episodes but again think that it needs to be made clear that it's not high-dimensional and is used for pedagogical purposes!

  • Line 200/Performing EFA: "In practise" -> "In practice"

  • Line 203/Performing EFA: "output shows" -> "output then shows"

  • Line 224/Performing EFA: "explaind" -> "explained"

@alanocallaghan
Copy link
Collaborator

alanocallaghan commented Mar 1, 2024

Shouldn't include the index variable in factor analysis...

Generally use column names to index any variables, not indices, imo this is R 101

@alanocallaghan
Copy link
Collaborator

Also remove the mention of rotations, because it's not explained at all

@mallewellyn
Copy link
Contributor Author

mallewellyn commented Mar 5, 2024

Convert to task list:

  • 1. Line 32/Introduction: This possibly needs to differentiate between FA and PCA and when you may use them more clearly. It's sort of covered towards the end of the introduction, but I think it needs to be more explicit.

  • 2. Line 35: "Here, we introduce more general set of methods..." -> "Here we introduce an alternative but related set of methods.." to clarify that they're different approaches.

  • 3. Line 40/Introduction: "latent variable" not defined until later. Could just define here instead.

  • 4. Line 41/Introduction: I would remove "data-driven" here as both EFA and CFA are data-driven techniques.

  • 5. Line 43/Introduction: "a-priori" -> "a priori".

  • 6. Line 54/An example: Call this section "Student scores" for consistency with other episodes?

  • 7. Line 74/Advantages and disadvantages of Factor Analysis: As in the last episode, I think it is hard to understand the advantages and disadvantages of FA without understanding what it is. I think this should come at the end of the episode.

  • 8. Line 100/Prostate cancer patient data: I like that the prostate data is used as a simple example in these episodes but again think that it needs to be made clear that it's not high-dimensional and is used for pedagogical purposes!

  • 9. Line 190/Performing EFA: A brief statement summarising the interpretation of factors/loadings in this example may be useful here just to clarify why you might use EFA.

addressed in challenge

  • 10. Line 198/Performing EFA: Could add a section heading here for consistency with PCA episode/consistency. Could also back reference to PCA in Line 200 to highlight the similarities between the approaches here.

  • 11. Line 198: key points about identifying the number of factors are discussed at the start but I think should also be mentioned here where choosing the number of factors is discussed

  • 12. Line 200/Performing EFA: "In practise" -> "In practice"

  • 13. Line 200/Performing EFA: "In practise, we repeat the factor analysis using different values in the factors argument." -> "In practice, we repeat the factor analysis for different numbers of factors (by specifying different values in the factors argument) since the upshot is that we're changing the number of factors.

  • 14. Line 203/Performing EFA: "output shows" -> "output then shows"

  • 15. Line 206/Performing EFA: the hypothesis test wording. Should it be:

"If the p-value is less than 0.05, we reject the null hypothesis that the number of factors is sufficient. If the p-value is greater than 0.05, we do not reject the null hypothesis that the number of factors used captures variation in the data. We often therefore conclude that this number of factors is sufficient"

rather than

"If the p-value is less than 0.05, we reject the null hypothesis and accept that the number of factors included is too small. If the p-value is greater than 0.05, we accept the null hypothesis that the number of factors used captures variation in the data."

Would also add "and we repeat the analysis with more factors. When the p-value is greater than 0.05..." after the first sentence to make it clear that this is iterative.

Also, I know we don't want to complicate things, but it may be more accurate to say "if the p-value is less than our significance level..." instead of using 0.05 as a hard threshold. If the p-value was 0.06, you'd probably also reject in practice?

  • 16. Line 224/Performing EFA: "explaind" -> "explained"

  • 17. Line 261: This feels incomplete. Maybe could include a brief statement about what this plot tells us about the relationship between variables and factors to tie things together.

addressed in challenge

  • 18. Line 269/Challenge 2: "discuss in groups" should maybe be adapted for the individual learner. "Consider or discuss in groups" as proposed by my review of episode 3?

  • 19. Captions and alt text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants