-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Series.corr/3 and Series.cov/2 #630
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! 🚢
I think we could discuss the naming, but the features looks good to me!
lib/explorer/backend/lazy_series.ex
Outdated
corr: 3, | ||
cov: 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it correlation
and covariance
, since they are short names and we usually prefer to avoid abbreviations. But let's hear what José thinks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
Thank you for the feedback! I've changed the names to |
Oh, there was one issue I ran into when developing this. I don't think it's related to the features here, but I may as well report it before this gets merged. This works: DF.new(a: [1.0, 8.0, 3.0, nil], b: [4.0, 5.0, 2.0, nil])
|> DF.mutate(c: covariance(a, b), d: correlation(a, b))
# #Explorer.DataFrame<
# Polars[4 x 4]
# a float [1.0, 8.0, 3.0, nil]
# b float [4.0, 5.0, 2.0, nil]
# c float [3.0, 3.0, 3.0, 3.0]
# d float [0.5447047794019223, 0.5447047794019223, 0.5447047794019223,
# 0.5447047794019223]
# > but this panics: DF.new(a: [1.0, 8.0, 3.0], b: [4.0, 5.0, 2.0])
|> DF.concat_rows(DF.new(a: [nil], b: [nil]))
|> DF.mutate(c: covariance(a, b), d: correlation(a, b))
# thread '<unnamed>' panicked at 'validity must be equal to the array's length', [...]/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow2-0.17.2/src/array/primitive/mod.rs:241:5
# ** (ErlangError) Erlang error: :nif_panicked
# ... I would've thought that |
You are using it correctly. This looks like a polars bug. It is failing to merge two dataframes when one has nils and the other does not. Can you isolate it on the Rust side? |
💚 💙 💜 💛 ❤️ |
Honestly, I don't know enough about Polars to debug this, but changing the following stops the panic:
I think it's something to do with the pub fn rechunk(&mut self) -> &mut Self {
if self.should_rechunk() {
self.as_single_chunk_par()
} else {
self
}
} What do you think? Would it be worth fixing the bug by skipping their |
I think so. By definition there is more than one chunk, so we can always force it. |
This is based on corr and cov. For
corr
, I did not implement thespearman_rank_corr
function, because it needs thepropagate_nans
Polars feature. I wonder what's the appetite for adding new Polars features? Should we do it in this case?