Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splice, concat, join, hstack and vstack #10

Open
librasteve opened this issue Feb 7, 2023 · 3 comments
Open

splice, concat, join, hstack and vstack #10

librasteve opened this issue Feb 7, 2023 · 3 comments

Comments

@librasteve
Copy link
Owner

librasteve commented Feb 7, 2023

Dan::Polars

  • now implemented hstack and vstack
  • next is join

Dan

  • has splice (row-wise & column-wise)
  • has concat

this issue to weigh the benefit of implementing as splice & concat for Dan::Polars

and thus to mask Dan splice & concat

Series

  • Dan has ...s.concat: t;
  • Pandas has ... concat
  • Polars has ... pub fn append(&mut self, other: &Series);

these are both "in place"

DataFrame

Dan has ...

dfa.concat: dfc, join => 'inner';
#`[
      letter  number
 0    a       1
 1    b       2
 0⋅1  c       3
 1⋅1  d       4
#]
  • concat is the method call for join operations
  • splice is used for append, hstack, vstack

Polars has...

  • concat as an alternate interface for hstack, vstack

Options

  1. take the Polars approach, deprecate splice & concat, replace with append, hstack, vstack, join
  2. take the Dan approach, implement splice & concat as wrapper

Conclusion

In the light of implementation, it appears the best common solution is to take a 3rd path that replaces this method zoo with .concat and .join. This is detailed below....

@librasteve
Copy link
Owner Author

librasteve commented Aug 14, 2023

These tables compare the API methods:

Table 1: Combining functions for DataFrames (Pandas and Polars)

Function Description Pandas Polars Dan
vstack Stack vertically pd.concat([df1, df2], axis=0) pl.vstack([df1, df2]) or pl.concat([df1, df2]) df1.concat(df2)
hstack Stack horizontally pd.concat([df1, df2], axis=1) pl.hstack([df1, df2]) or pl.concat([df1, df2], how="horizontal") df1.concat(df2, :axis(1))
concat Concatenate along an axis pd.concat([df1, df2], axis=0/1) pl.concat([df1, df2], axis=0/1) df1.concat(df2, axis=>0/1)
join Join on a column df1.join(df2, on="col") or pd.merge(df1, df2, how="inner", on="col") df1.join(df2, how="inner", on="col") df1.join(df2, how=>'inner', on=>'col')

Table 2: Combining functions for Series (Pandas and Polars)

Function Description Pandas Polars Dan
concat Append one Series to another n/a n/a series1.concat( series2 )
append Append one Series to another series1.append(series2) n/a series1.concat( series2 )
join Join Series on index series1.join(series2, how='inner') pl.join([series1, series2], on='index_column') n/a

Sources:

@librasteve
Copy link
Owner Author

librasteve commented Sep 14, 2023

The solution is:

Table 1: Combining functions for DataFrames

Function Description Dan
concat Concatenate along an axis df1.concat(df2, axis=>0/1)
join Join on a column df1.join(df2, how=>'inner', on=>'col')

Table 2: Combining functions for Series

Function Description Dan
concat Append one Series to another series1.concat( series2 )

@librasteve
Copy link
Owner Author

Notes:

  1. use concat in place of hstack, vstack
    a. concat – diagonal is not provided
    b. concat - multiple is not provided
  2. merge (Python) becomes join
    a. join - right is not provided (you need to swap arguments)
    b. join - [semi, asof] are not (yet) provided
  3. Dan and Dan::Pandas concat to be refactored out to concat and join
  4. Dan splice to be replaced with some combination of 'concat', ‘with_columns’ and ‘drop’
  5. Dan set ignore-index as default 1 (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant