Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Series.in/2 function #420

Merged
merged 4 commits into from
Dec 11, 2022
Merged

Conversation

kimjoaoun
Copy link
Contributor

@kimjoaoun kimjoaoun commented Nov 19, 2022

This PR closes #273.
It creates the function Explorer.Series.in/2 that uses Polars' IsIn trait to check if values from a list are contained in a Series.

I really don't know how to describe the function in the docstring, can someone help me?

@kimjoaoun kimjoaoun changed the title Jpos/adds isin Adds Series.is_in/2 function Nov 19, 2022
@kimjoaoun kimjoaoun marked this pull request as ready for review November 19, 2022 11:49
@kimjoaoun kimjoaoun marked this pull request as draft November 19, 2022 11:50
lib/explorer/series.ex Outdated Show resolved Hide resolved
@josevalim
Copy link
Member

I really don't know how to describe the function in the docstring, can someone help me?

Yeah, this is a good question. To be honest, I am not the biggest fan of the name. What does dplyr call it? It feels like within?, included?, etc would be better but I am not sure.

@kimjoaoun
Copy link
Contributor Author

I really don't know how to describe the function in the docstring, can someone help me?

Yeah, this is a good question. To be honest, I am not the biggest fan of the name. What does dplyr call it? It feels like within?, included?, etc would be better but I am not sure.

In R we don't have this operation in dplyr, because it is natively supported by the %in% operator. But also in base R there's a rarely used function called within that does exactly the same.
Between within? and included?, I think the former is the better.

Note: Pandas has pd.Series.isin.

@josevalim
Copy link
Member

I see. So I think we should simply call it in/2. We can document it as "Check if the elements of the series in the left exists in the series in the right."

@josevalim
Copy link
Member

Note that, if we call it in, then you will have to write it as:

def %Series{} = left in %Series{} = right do

and

def %Series{} = left in right when is_list(right) do

The callback will be @callback (s() in s()) :: s() and so on.

lib/explorer/series.ex Outdated Show resolved Hide resolved
@kimjoaoun
Copy link
Contributor Author

I see. So I think we should simply call it in/2. We can document it as "Check if the elements of the series in the left exists in the series in the right."

Would not it be too similar to Kernel.in?

@josevalim
Copy link
Member

I think that's the idea! Then it will automatically work in Explorer.Query as: series in [1, 2, 3] and left in right, similar to R.

@kimjoaoun kimjoaoun changed the title Adds Series.is_in/2 function Adds Series.in/2 function Nov 28, 2022
@cigrainger
Copy link
Member

Given that this is failing for 1.13... should we go ahead and bump Explorer to minimum 1.14 for the next release? I'm surprised we're not already doing that.

@josevalim
Copy link
Member

@cigrainger fixing for v1.13 should be easy and mostly a matter of using K.in(left, right) instead of in(left, right). I believe the biggest blocker for this PR is if you actually agree with the API. :)

@cigrainger
Copy link
Member

Hm okay. Jumping into this... it's a vectorised op right? So what we get back is a boolean series where each element is true if in the right and false if in the left? I like in/2. I think the confusion is coming from the fact that it's vectorised. But yeah... let's go with in/2

In terms of the docstring:

@doc """
Checks if each element of the series in the left exists in the series in the right, returning a boolean mask.

  ## Examples

      iex> left = Explorer.Series.from_list([1, 2, 3])
      iex> right = Explorer.Series.from_list([1, 2])
      iex> Series.in(left, right)
      #Explorer.Series<
        Polars[3]
        boolean [true, true, false]
      >
"""

@josevalim josevalim marked this pull request as ready for review December 11, 2022 23:44
@josevalim josevalim merged commit f42f99d into elixir-explorer:main Dec 11, 2022
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@kimjoaoun kimjoaoun deleted the jpos/adds_isin branch December 12, 2022 13:43
liamdiprose pushed a commit to liamdiprose/explorer that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support filtering on a list of values
3 participants