Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Series.from_list([1, 2, 3], dtype: :null) #806

Closed
josevalim opened this issue Jan 9, 2024 · 14 comments
Closed

Support Series.from_list([1, 2, 3], dtype: :null) #806

josevalim opened this issue Jan 9, 2024 · 14 comments

Comments

@josevalim
Copy link
Member

We should also discuss if the default for a list of nils should still be :f64 or if we should swap it to :null:

Series.from_list([nil, nil, nil])
@josevalim
Copy link
Member Author

cc @lkarthee in case you want to take a look at this one too :)

@lkarthee
Copy link
Member

lkarthee commented Jan 9, 2024

Was just testing this exact thing out in livebook. 👍

@philss
Copy link
Member

philss commented Jan 9, 2024

FYI, we may need to use Series::new_null in the crate side, to create a null series:

use polars::prelude::*;

fn main() {
    let s = Series::new_null("a", 0);

    println!("dtype is: {}", &s.dtype())
}

@lkarthee
Copy link
Member

lkarthee commented Jan 9, 2024

I fixed from_list, then realised DF.concat_rows and Series.concat are broken.

null -> any conversion needs to be added.

I have noticed that:

  • type conversion between distinct numeric series converts to floats.
  • existing logic assumes that there are only 2 numeric types - integer and float ?
  • This logic needs to be revisited after introduction of multiple numeric types ?

@philss
Copy link
Member

philss commented Jan 9, 2024

existing logic assumes that there are only 2 numeric types - integer and float ?

Yes, but I'm changing this in #794

This logic needs to be revisited after introduction of multiple numeric types ?

Probably it's easier to revisit after I finish that work, yeah.

@lkarthee
Copy link
Member

One question about type casting of nulls when concat rows. In Polars, concat works as:

  • DataFrame has different conditions and types of concat apart from type.
  • Type ++ Null can be appended.
  • Null ++ Type fails with Polars Error.

Should we support both cases without throwing error by handling casting if needed or should we only support Type ++ Null ?

I am trying to fix Series.concat for Type ++ Null as it is failing.

@josevalim
Copy link
Member Author

josevalim commented Jan 10, 2024

There is a chance that Null + Types not working is a Polars bug, so I would take a look and perhaps do a report there. If that’s the case, I would write your code as if both styles are support and then let Polars blow up (until it is fixed).

@lkarthee
Copy link
Member

lkarthee commented Jan 10, 2024

My line of thought is Type can have value nil, while concatenation why can't it be treated a set of rows having nil values being added ? Null ++ Type failure does not make sense for me particularly if a cast can solve it. I might be missing something. Please share your thoughts on this.

Edit: did not see Jose reply. Posted this message.

@billylanchantin
Copy link
Contributor

write your code as if both styles are support and then let Polars blow up (until it is fixed).

I'd also be ok with a cast. We had to do something similar when Polars supported date + duration but not duration + date. Then we removed the cast when they accepted our patch.

@lkarthee
Copy link
Member

https://github.com/pola-rs/polars/blob/a8bdc76000c059afdac1f215e3b95654a0057712/crates/polars-core/src/datatypes/dtype.rs#L450-L455

// if returns
// `Ok(true)`: can extend, but must cast
// `Ok(false)`: can extend as is
// Error: cannot extend.
pub(crate) fn can_extend_dtype(left: &DataType, right: &DataType) -> PolarsResult<bool> {
   match(left, right) {
       ...
        (DataType::Null, DataType::Null) => Ok(false),
        // Other way around we don't allow because we keep left dtype as is.
        // We don't go to supertype, and we certainly don't want to cast self to null type.
        (_, DataType::Null) => Ok(true),
        (l, r) => {
            polars_ensure!(l == r, SchemaMismatch: "cannot extend/append {:?} with {:?}", left, right);

@lkarthee
Copy link
Member

Polars allows only concat of series with same data type. They don't get into type promotion. Only exception is Type ++ Null.

> f32 = pl.Series([1, 2, 3], dtype=pl.Float32) 
> s16 = pl.Series([1, 2, 3], dtype=pl.Int16) 
> s8= pl.Series([1, 2, 3], dtype=pl.Int8) 
> f32.append(s8)
SchemaError: cannot extend/append Float32 with Int8
> s16.append(s8)
SchemaError: cannot extend/append Int16 with Int8

We have some kind of type promotion to float for numeric types in Series.concat .

@josevalim
Copy link
Member Author

Good find. So apparently Polars logic is that it must preserve the type on the left side, rather than trying to find a magic type that satisfies both sides. I'd say that's a sane we choose to mimic if we want to. Overall, we have to choose between:

  1. Disregard Polars and define our own casting logic that aims to unify both sides
  2. Align with Polars and document that the left side is preserved

Any opinions?

@lkarthee
Copy link
Member

#812 about Series.concat. PR fallbacks to existing behaviour of casting to {:f, 64} when

Null ++ Type and Type ++ Null can be done.

We can cast to a higher type if there is no loss of information. {:s, _} to {:u, _} casting should be left to user.

@josevalim
Copy link
Member Author

Thank you, I will close this one, we can the discussion in the PR. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants