Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlate is incorrect with numeric vectors as x,y #120

Closed
antoine-sachet opened this issue Oct 28, 2020 · 2 comments
Closed

Correlate is incorrect with numeric vectors as x,y #120

antoine-sachet opened this issue Oct 28, 2020 · 2 comments

Comments

@antoine-sachet
Copy link
Contributor

Problem

correlate does not work at all when x and y are numeric vectors.

I am assuming it should work because the doc for correlate says:

x	a numeric vector, matrix or data frame.
y	NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).

However, I realise more and more that the whole package pretty much assumes x is a data.frame. Maybe the doc should be updated to reflect that x should be a data.frame and remove the now useless y argument? It is ok if correlate only works with data.frames and is not a drop-in replacement for cor in all situations. It would be much clearer IMO.

Reproducible example

A simple correlation between 2 vectors returns a 1x1 correlation matrix, with a single element which happens to be diagonal.
The correlation will take the value of the diagonal argument.

You won't be able to reproduce yet due to #119 (which I fixed in my fork) but you get the gist:

corrr::correlate(x = 1:10, y = 10:1, quiet = T)
#> # A tibble: 1 x 2
#>   rowname     x
#>   <chr>   <dbl>
#> 1 x          NA
corrr::correlate(x = 1:10, y = 10:1, diagonal = 0.66, quiet = T)
#> # A tibble: 1 x 2
#>   rowname     x
#>   <chr>   <dbl>
#> 1 x        0.66

Created on 2020-10-28 by the reprex package (v0.3.0)

antoine-sachet added a commit to antoine-sachet/corrr that referenced this issue Oct 28, 2020
As a side effect, correlate() now works with numeric vectors and one-column dfs.

Close tidymodels#120. Close tidymodels#121.
@thisisdaryn
Copy link
Collaborator

However, I realise more and more that the whole package pretty much assumes x is a data.frame. Maybe the doc should be updated to reflect that x should be a data.frame and remove the now useless y argument? It is ok if correlate only works with data.frames and is not a drop-in replacement for cor in all situations. It would be much clearer IMO.

Yes, this is something that needs to be resolved. I will submit a PR when there is a decision.

The original package author commented here on the usage of the y argument in response to a previous issue. As is pointed out there, removing y would be a breaking change.

I agree that the use of y was not among the common expected use cases and, additionally, is seemingly very rarely used.

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants