-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Polars GroupBy #1836
Docs: Polars GroupBy #1836
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,4 +6,5 @@ seaborn | |
scipy | ||
scikit-learn | ||
polars==1.1.0 | ||
pyarrow | ||
pyarrow | ||
hvplot | ||
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -100,20 +100,18 @@ The specific methods that will be demonstrated are: | |||||||
* Quantiles | ||||||||
|
||||||||
* Grouping | ||||||||
* Protected Group Keys | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Sphinx needs a line break between indent levels for correct rendering. |
||||||||
* Public Group Keys | ||||||||
* Public Group Lengths | ||||||||
|
||||||||
* Grouping By Multiple Variables | ||||||||
* Filtering | ||||||||
This section explains strategies for how to release statistics on grouped data. | ||||||||
|
||||||||
* Public vs. Private Grouping Lengths | ||||||||
* Data Preparation | ||||||||
|
||||||||
This section will explain the implications and limitations of having public and private keys and/or lengths when grouping. | ||||||||
* using ``with_columns`` | ||||||||
* using ``filter`` | ||||||||
|
||||||||
* Data Preparation Limitations | ||||||||
|
||||||||
* Limitations with ``with_columns`` | ||||||||
* Limitations with ``filter`` | ||||||||
|
||||||||
This section will explain the limitations and properties of common Polars functions that are unique to their usage in OpenDP. | ||||||||
This section explains how to build stable dataframe transformations with Polars. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to use RST toctrees here? I could do that, if you don't have all the installs for the doc build. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could do this once the misc notebooks are merged? |
||||||||
|
||||||||
Compositor Overview | ||||||||
------------------- | ||||||||
|
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,6 +66,7 @@ where | |
lazyframe_utility(&lf, alpha) | ||
} | ||
|
||
#[derive(Clone)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer not to edit the Rust in this PR, if at all possible: if the examples in the docs rely on changes here, then we should get another release out before we point people to the nightly docs. It's adding more steps. (But if this is a change we really need, don't let me block!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Broke it out into a separate PR. It also killed the commit history here, unfortunately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh: Github seems to be confused. The base PR in the stack is merged, and then checked out I'm going to try diffing this branch with |
||
struct UtilitySummary { | ||
pub name: String, | ||
pub aggregate: String, | ||
|
@@ -188,26 +189,38 @@ fn expr_utility<'a>( | |
}]); | ||
} | ||
|
||
match expr { | ||
Expr::Len => Ok(vec![UtilitySummary { | ||
name, | ||
Ok(match expr { | ||
Expr::Len => vec![UtilitySummary { | ||
name: name.clone(), | ||
aggregate: "Len".to_string(), | ||
distribution: None, | ||
scale: None, | ||
accuracy: alpha.is_some().then_some(0.0), | ||
threshold: t_value, | ||
}]), | ||
}], | ||
|
||
Expr::Function { input, .. } => Ok(input | ||
Expr::Function { input, .. } => input | ||
.iter() | ||
.map(|e| expr_utility(e, alpha, threshold.clone())) | ||
.collect::<Fallible<Vec<_>>>()? | ||
.into_iter() | ||
.flatten() | ||
.collect()), | ||
.collect(), | ||
|
||
_ => fallible!(FailedFunction, "unrecognized primitive"), | ||
Expr::BinaryExpr { left, op: _, right } => [ | ||
expr_utility(&left, alpha, threshold.clone())?, | ||
expr_utility(&right, alpha, threshold)?, | ||
] | ||
.concat(), | ||
|
||
e => return fallible!(FailedFunction, "unrecognized primitive: {:?}", e), | ||
} | ||
.into_iter() | ||
.map(|mut summary| { | ||
summary.name = name.clone(); | ||
summary | ||
}) | ||
.collect()) | ||
} | ||
|
||
fn expr_aggregate(expr: &Expr) -> Fallible<&'static str> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had prepared a PR that gets rid of a separate requirements file for notebooks:
That doesn't need to move forward, but if it did, we'd probably want to be more conservative about adding new dependencies.