-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-9547] Add support for xs on DataFrame and Series #15078
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15078 +/- ##
==========================================
+ Coverage 83.76% 83.78% +0.01%
==========================================
Files 439 439
Lines 59176 59211 +35
==========================================
+ Hits 49566 49607 +41
+ Misses 9610 9604 -6
Continue to review full report at Codecov.
|
R: @rohdesamuel |
if axis in ('columns', 1): | ||
return frame_base.DeferredFrame.wrap( | ||
expressions.ComputedExpression( | ||
'xs', | ||
lambda df: df.xs(key, axis=axis, **kwargs), [self._expr], | ||
requires_partition_by=partitionings.Arbitrary(), | ||
preserves_partition_by=partitionings.Arbitrary())) | ||
elif axis not in ('index', 0): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments for these conditionals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm just needs a little more comments
Run Python PreCommit |
1 similar comment
Run Python PreCommit |
This PR adds support for
xs
on DataFrame and Series instances.xs
is really just a specialized filter on the index, which at first glance is trivially elementwise. However it is complicated by the fact that it raises aKeyError
when the keyed value does not exist. In order to raise this error correctly (at execution time) in a distributed context, we need to know when we're processing the partition that should contain the key. In order to do this we wrap the key in a DeferredSeries and use Index partitioning to co-locate it, thus if the key is present in a partition, we know to raise an error if it doesn't exist in the target frame.ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.