You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A join specification can now be created through join_by(). This allows
you to specify both the left and right hand side of a join using unquoted
column names, such as join_by(sale_date == commercial_date). Join
specifications can be supplied to any *_join() function as the by
argument.
Join specifications allow for new types of joins:
Equality joins: The most common join, specified by ==. For example,
join_by(sale_date == commercial_date).
Inequality joins: For joining on inequalities, i.e.>=, >, <, and
<=. For example, use join_by(sale_date >= commercial_date) to find
every commercial that aired before a particular sale.
Rolling joins: For "rolling" the closest match forward or backwards when
there isn't an exact match, specified by using the rolling helper,
closest(). For example,
join_by(closest(sale_date >= commercial_date)) to find only the most
recent commercial that aired before a particular sale.
Overlap joins: For detecting overlaps between sets of columns, specified
by using one of the overlap helpers: between(), within(), or
overlaps(). For example, use
join_by(between(commercial_date, sale_date_lower, sale_date)) to
find commercials that aired before a particular sale, as long as they
occurred after some lower bound, such as 40 days before the sale was made.
multiple is a new argument for controlling what happens when a row
in x matches multiple rows in y. For equality joins and rolling joins,
where this is usually surprising, this defaults to signalling a "warning",
but still returns all of the matches. For inequality joins, where multiple
matches are usually expected, this defaults to returning "all" of the
matches. You can also return only the "first" or "last" match, "any"
of the matches, or you can "error".
keep now defaults to NULL rather than FALSE. NULL implies
keep = FALSE for equality conditions, but keep = TRUE for inequality
conditions, since you generally want to preserve both sides of an
inequality join.
unmatched is a new argument for controlling what happens when a row
would be dropped because it doesn't have a match. For backwards
compatibility, the default is "drop", but you can also choose to
"error" if dropped rows would be surprising.
consecutive_id() for creating groups based on contiguous runs of the
same values
case_match() is a "vectorised switch" variant of case_when() that matches
on values rather than logical expressions. It is like a SQL "simple"
CASE WHEN statement, whereas case_when() is like a SQL "searched"
CASE WHEN statement
cross_join() is a more explicit and slightly more correct replacement for
using by = character() during a join
pick() makes it easy to access a subset of columns from the current group.
pick() is intended as a replacement for across(.fns = NULL), cur_data(),
and cur_data_all(). We feel that pick() is a much more evocative name when
you are just trying to select a subset of columns from your data.
symdiff() computes the symmetric difference.
cur_data() and cur_data_all() are soft-deprecated in favour of
pick()
across(), c_across(), if_any(), and if_all() now require the _cols and _fns arguments. In general, we now recommend that you use
pick() instead of an empty across() call or across() with no _fns
(e.g. across(c(x, y)). (see also Quietly deprecate optional .cols and .fns cases tidyverse/dplyr#6523).
Feature Type
Adding new functionality to datar
Changing existing functionality in datar
Removing existing functionality in datar
Problem Description
Feature Description
*_join()
A join specification can now be created through
join_by()
. This allowsyou to specify both the left and right hand side of a join using unquoted
column names, such as join_by(sale_date == commercial_date). Join
specifications can be supplied to any *_join() function as the by
argument.
Join specifications allow for new types of joins:
Equality joins: The most common join, specified by ==. For example,
join_by(sale_date == commercial_date).
Inequality joins: For joining on inequalities, i.e.>=, >, <, and
<=. For example, use join_by(sale_date >= commercial_date) to find
every commercial that aired before a particular sale.
Rolling joins: For "rolling" the closest match forward or backwards when
there isn't an exact match, specified by using the rolling helper,
closest(). For example,
join_by(closest(sale_date >= commercial_date)) to find only the most
recent commercial that aired before a particular sale.
Overlap joins: For detecting overlaps between sets of columns, specified
by using one of the overlap helpers: between(), within(), or
overlaps(). For example, use
join_by(between(commercial_date, sale_date_lower, sale_date)) to
find commercials that aired before a particular sale, as long as they
occurred after some lower bound, such as 40 days before the sale was made.
multiple
is a new argument for controlling what happens when a rowin x matches multiple rows in y. For equality joins and rolling joins,
where this is usually surprising, this defaults to signalling a "warning",
but still returns all of the matches. For inequality joins, where multiple
matches are usually expected, this defaults to returning "all" of the
matches. You can also return only the "first" or "last" match, "any"
of the matches, or you can "error".
keep
now defaults to NULL rather than FALSE. NULL implieskeep = FALSE for equality conditions, but keep = TRUE for inequality
conditions, since you generally want to preserve both sides of an
inequality join.
unmatched
is a new argument for controlling what happens when a rowwould be dropped because it doesn't have a match. For backwards
compatibility, the default is "drop", but you can also choose to
"error" if dropped rows would be surprising.
consecutive_id()
for creating groups based on contiguous runs of thesame values
case_match()
is a "vectorised switch" variant of case_when() that matcheson values rather than logical expressions. It is like a SQL "simple"
CASE WHEN statement, whereas case_when() is like a SQL "searched"
CASE WHEN statement
cross_join()
is a more explicit and slightly more correct replacement forusing by = character() during a join
pick()
makes it easy to access a subset of columns from the current group.pick() is intended as a replacement for across(.fns = NULL), cur_data(),
and cur_data_all(). We feel that pick() is a much more evocative name when
you are just trying to select a subset of columns from your data.
symdiff()
computes the symmetric difference.cur_data()
andcur_data_all()
are soft-deprecated in favour ofpick()
across()
,c_across()
,if_any()
, andif_all()
now require the_cols
and_fns
arguments. In general, we now recommend that you usepick() instead of an empty across() call or across() with no
_fns
(e.g. across(c(x, y)). (see also Quietly deprecate optional
.cols
and.fns
cases tidyverse/dplyr#6523).Passing
**kwargs
to across() is deprecated because it's ambiguous whenthose arguments are evaluated. (see also Deprecate
across(, ...)
tidyverse/dplyr#6073).Additional Context
No response
The text was updated successfully, but these errors were encountered: