Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Support dplyr 1.1.0~1.1.2 #189

Closed
7 of 12 tasks
pwwang opened this issue Sep 4, 2023 · 0 comments
Closed
7 of 12 tasks

[ENH] Support dplyr 1.1.0~1.1.2 #189

pwwang opened this issue Sep 4, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@pwwang
Copy link
Owner

pwwang commented Sep 4, 2023

Feature Type

  • Adding new functionality to datar

  • Changing existing functionality in datar

  • Removing existing functionality in datar

Problem Description

Feature Description

  • *_join()

    • A join specification can now be created through join_by(). This allows
      you to specify both the left and right hand side of a join using unquoted
      column names, such as join_by(sale_date == commercial_date). Join
      specifications can be supplied to any *_join() function as the by
      argument.

      Join specifications allow for new types of joins:

      • Equality joins: The most common join, specified by ==. For example,
        join_by(sale_date == commercial_date).

      • Inequality joins: For joining on inequalities, i.e.>=, >, <, and
        <=. For example, use join_by(sale_date >= commercial_date) to find
        every commercial that aired before a particular sale.

      • Rolling joins: For "rolling" the closest match forward or backwards when
        there isn't an exact match, specified by using the rolling helper,
        closest(). For example,
        join_by(closest(sale_date >= commercial_date)) to find only the most
        recent commercial that aired before a particular sale.

      • Overlap joins: For detecting overlaps between sets of columns, specified
        by using one of the overlap helpers: between(), within(), or
        overlaps(). For example, use
        join_by(between(commercial_date, sale_date_lower, sale_date)) to
        find commercials that aired before a particular sale, as long as they
        occurred after some lower bound, such as 40 days before the sale was made.

      • multiple is a new argument for controlling what happens when a row
        in x matches multiple rows in y. For equality joins and rolling joins,
        where this is usually surprising, this defaults to signalling a "warning",
        but still returns all of the matches. For inequality joins, where multiple
        matches are usually expected, this defaults to returning "all" of the
        matches. You can also return only the "first" or "last" match, "any"
        of the matches, or you can "error".

      • keep now defaults to NULL rather than FALSE. NULL implies
        keep = FALSE for equality conditions, but keep = TRUE for inequality
        conditions, since you generally want to preserve both sides of an
        inequality join.

      • unmatched is a new argument for controlling what happens when a row
        would be dropped because it doesn't have a match. For backwards
        compatibility, the default is "drop", but you can also choose to
        "error" if dropped rows would be surprising.

  • consecutive_id() for creating groups based on contiguous runs of the
    same values

  • case_match() is a "vectorised switch" variant of case_when() that matches
    on values rather than logical expressions. It is like a SQL "simple"
    CASE WHEN statement, whereas case_when() is like a SQL "searched"
    CASE WHEN statement

  • cross_join() is a more explicit and slightly more correct replacement for
    using by = character() during a join

  • pick() makes it easy to access a subset of columns from the current group.
    pick() is intended as a replacement for across(.fns = NULL), cur_data(),
    and cur_data_all(). We feel that pick() is a much more evocative name when
    you are just trying to select a subset of columns from your data.

  • symdiff() computes the symmetric difference.

  • cur_data() and cur_data_all() are soft-deprecated in favour of
    pick()

  • across(), c_across(), if_any(), and if_all() now require the
    _cols and _fns arguments. In general, we now recommend that you use
    pick() instead of an empty across() call or across() with no _fns
    (e.g. across(c(x, y)). (see also Quietly deprecate optional .cols and .fns cases tidyverse/dplyr#6523).

  • Passing **kwargs to across() is deprecated because it's ambiguous when
    those arguments are evaluated. (see also Deprecate across(, ...) tidyverse/dplyr#6073).

Additional Context

No response

@pwwang pwwang added the enhancement New feature or request label Sep 4, 2023
@pwwang pwwang changed the title [ENH] Support dplyr 1.1.0 [ENH] Support dplyr 1.1.0~1.1.2 Sep 4, 2023
pwwang added a commit that referenced this issue Oct 5, 2023
pwwang added a commit to pwwang/datar-pandas that referenced this issue Oct 8, 2023
pwwang added a commit to pwwang/datar-pandas that referenced this issue Oct 8, 2023
pwwang added a commit to pwwang/datar-pandas that referenced this issue Oct 8, 2023
pwwang added a commit to pwwang/datar-pandas that referenced this issue Oct 8, 2023
@pwwang pwwang closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant