Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cur_group() and size zero grouped data frame edge case bug #6304

Closed
DavisVaughan opened this issue Jun 19, 2022 · 1 comment · Fixed by #6423
Closed

cur_group() and size zero grouped data frame edge case bug #6304

DavisVaughan opened this issue Jun 19, 2022 · 1 comment · Fixed by #6423
Labels
bug an unexpected problem or unintended behavior grouping 👨‍👩‍👧‍👦

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Jun 19, 2022

This has to do with the number of rows returned by group_data(), and therefore by group_keys()

library(dplyr)

df <- tibble(x = integer())
gdf <- group_by(df, x)

mutate(df, y = cur_group())
#> # A tibble: 0 × 2
#> # … with 2 variables: x <int>, y <tibble[,0]>

mutate(gdf, y = cur_group())
#> Error in `mutate()`:
#> ! Problem while computing `y = cur_group()`.
#> Caused by error in `vec_slice()`:
#> ! Can't subset elements past the end.
#> ℹ Location 1 doesn't exist.
#> ℹ There are only 0 elements.

# Has 1 row
group_keys(df)
#> # A tibble: 1 × 0

# Has 0 rows
group_keys(gdf)
#> # A tibble: 0 × 1
#> # … with 1 variable: x <int>

We do this workaround when there are zero groups, but it only applies to the group rows

dplyr/R/data-mask.R

Lines 4 to 8 in 55dfc1c

rows <- group_rows(data)
# workaround for when there are 0 groups
if (length(rows) == 0) {
rows <- list(integer())
}

It seems like we need to make a similar kind of patch to group_keys() as well

private$keys <- group_keys(data)

Maybe it should be set to vec_init(group_keys(), n = 1) if there are no groups? That would allow cur_group() to return a size 1 result, which would then be recycled back to size 0

That would give this result, where you can see the initialized 1 row keys if you really want to

library(dplyr)

df <- tibble(x = integer())
gdf <- group_by(df, x)

mutate(gdf, y = print(cur_group()))
#> # A tibble: 1 × 1
#>       x
#>   <int>
#> 1    NA

#> # A tibble: 0 × 2
#> # Groups:   x [0]
#> # … with 2 variables: x <int>, y <tibble[,1]>
@DavisVaughan
Copy link
Member Author

Or maybe current_key(), used by cur_group(), needs to be aware of the case where the keys are size 0, and be implemented like this:

    current_key = function() {
      keys <- private$keys

      if (vec_size(keys) == 0L) {
        private$keys
      } else {
        vec_slice(private$keys, self$get_current_group())
      }
    },

Which would give this result which feels more intuitive in this case:

library(dplyr)

df <- tibble(x = integer())
gdf <- group_by(df, x)

mutate(gdf, y = print(cur_group()))
#> # A tibble: 0 × 1
#> # … with 1 variable: x <int>

#> # A tibble: 0 × 2
#> # Groups:   x [0]
#> # … with 2 variables: x <int>, y <tibble[,1]>

DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Jun 28, 2022
DavisVaughan added a commit to DavisVaughan/dplyr that referenced this issue Jun 28, 2022
@hadley hadley added bug an unexpected problem or unintended behavior grouping 👨‍👩‍👧‍👦 labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior grouping 👨‍👩‍👧‍👦
Projects
None yet
2 participants