Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] dplyr 1.1.0 compatibility #33894

Closed
DavisVaughan opened this issue Jan 26, 2023 · 3 comments
Closed

[R] dplyr 1.1.0 compatibility #33894

DavisVaughan opened this issue Jan 26, 2023 · 3 comments

Comments

@DavisVaughan
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

dplyr 1.1.0 is being sent to CRAN tomorrow. I reran the revdeps again today and arrow popped up all of a sudden as a new broken revdep. I wish we could give you all more time, but unfortunately there were only 3 new revdep failures detected today, so we plan to continue with the release tomorrow anyways.

There seem to be quite a few test failures, many of which look to be overly strict tests. I've put the full test output in the details section at the very bottom of this issue. I'll also list a few problems I see along with the corresponding NEWS bullet to try and help you out:


Error in `union_all(as_tibble(sub_df1), as_tibble(sub_df2))`: `x` and `y` are not compatible.
x Cols in `y` but not `x`: `z`.
x Cols in `x` but not `y`: `y`.

union_all(), like union(), now requires that data frames be compatible: i.e. they have the same columns, and the columns have compatible types.


A few distinct() failures, probably related to:

distinct() returns columns ordered the way you request, not the same as the input data (#6156).


── Failure ('test-dplyr-join.R:30'): left_join ─────────────────────────────────
`compare_dplyr_binding(...)` did not throw the expected message.

Probably related to the fact that joins in dplyr now throw a warning by default if multiple matches are detected. Set multiple = "all" to silence this.


── Error ('test-dplyr-join.R:178'): semi_join ──────────────────────────────────
<rlib_error_dots_nonempty/rlib_error_dots/rlang_error/error/condition>
Error in `semi_join(., to_join, by = "some_grouping", keep = TRUE)`: `...` must be empty.
x Problematic argument:
* keep = TRUE

Seems like a possible typo on your end, as semi_join() doesn't have the keep arg and we actually check for empty dots now.


── Error ('test-dplyr-slice.R:145'): slice_* not supported with groups ─────────
Error in `slice_min(grouped, n = 5)`: `order_by` is absent but must be supplied.

slice_min() and friends do better checking for absent arguments so maybe you were expecting a different error message. We also do this error checking in the generic which is before anything on your end can run.


── Error ('test-dplyr-summarize.R:301'): Functions that take ... but we only accept a single arg ──
Error in `summarize(., distinct = n_distinct())`: i In argument: `distinct = n_distinct()`.
Caused by error in `n_distinct()`:
! `...` is absent, but must be supplied.

n_distinct() now errors if 0 inputs are provided, because that is almost always a mistake.


R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements.  See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership.  The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License.  You may obtain a copy of the License at
> #
> #   http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing,
> # software distributed under the License is distributed on an
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> # KIND, either express or implied.  See the License for the
> # specific language governing permissions and limitations
> # under the License.
> 
> library(testthat)
> library(arrow)

Attaching package: 'arrow'

The following object is masked from 'package:testthat':

    matches

The following object is masked from 'package:utils':

    timestamp

> library(tibble)
> 
> verbose_test_output <- identical(tolower(Sys.getenv("ARROW_R_DEV", "false")), "true") ||
+   identical(tolower(Sys.getenv("ARROW_R_VERBOSE_TEST", "false")), "true")
> 
> if (verbose_test_output) {
+   arrow_reporter <- MultiReporter$new(list(CheckReporter$new(), LocationReporter$new()))
+ } else {
+   arrow_reporter <- check_reporter()
+ }
> test_check("arrow", reporter = arrow_reporter)
[0;1;31mSystem has not been booted with systemd as init system (PID 1). Can't operate.[0m
[0;1;31mFailed to create bus connection: Host is down[0m
Joining with `by = join_by(some_grouping)`
[ FAIL 13 | WARN 16 | SKIP 71 | PASS 8541 ]

══ Skipped tests ═══════════════════════════════════════════════════════════════
• ARROW-12632: ExecuteScalarExpression cannot Execute non-scalar expression (1)
• ARROW-13364 (1)
• ARROW-14045 (1)
• ARROW-17043 (date/datetime arithmetic with integers) (1)
• Arrow C++ not built with substrait (1)
• Flight server is not running (1)
• Implement more aggressive implicit casting for scalars (ARROW-11402) (1)
• Ingest_POSIXct only implemented for REALSXP (1)
• Need halffloat support: https://issues.apache.org/jira/browse/ARROW-3802 (1)
• On CRAN (47)
• Parquet test data missing (1)
• RE2 does not support backreferences in pattern (https://github.com/google/re2/issues/101) (1)
• TODO (ARROW-16630): make sure BottomK can handle NA ordering (1)
• TODO: (if anyone uses RangeEquals) (1)
• TODO: ARROW-14071 (1)
• Table with 0 cols doesn't know how many rows it should have (1)
• This OS either does not support changing languages to fr or it caches translations (2)
• Work around masking of data type functions (ARROW-12322) (1)
• environment variable ARROW_LARGE_MEMORY_TESTS (1)
• floor_date(as.Date(NA), '1 day') is no longer NA on latest R-devel (1)
• https://issues.apache.org/jira/browse/ARROW-7653 (1)
• minio is not installed. (1)
• pyarrow not available for testing (1)
• tolower(Sys.info()[["sysname"]]) != "windows" is TRUE (1)

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-dataset.R:584'): UnionDataset can merge schemas ────────────────
Error in `union_all(as_tibble(sub_df1), as_tibble(sub_df2))`: `x` and `y` are not compatible.
x Cols in `y` but not `x`: `z`.
x Cols in `x` but not `y`: `y`.
Backtrace:1. ├─arrow:::expect_equal(actual, union_all(as_tibble(sub_df1), as_tibble(sub_df2))) at test-dataset.R:584:2
 2. │ └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
 3. │   └─testthat::quasi_label(enquo(expected), expected.label, arg = "expected")
 4. │     └─rlang::eval_bare(expr, quo_get_env(quo))
 5. ├─dplyr::union_all(as_tibble(sub_df1), as_tibble(sub_df2))
 6. └─dplyr:::union_all.data.frame(as_tibble(sub_df1), as_tibble(sub_df2))
 7.   └─dplyr:::check_compatible(x, y)
 8.     └─rlang::abort(c("`x` and `y` are not compatible.", compat), call = error_call)
── Failure ('test-dplyr-distinct.R:24'): distinct() ────────────────────────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "lgl"           "some_grouping"
`names(expected)`: "some_grouping" "lgl"          
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:24:2
 2.   └─arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:24'): distinct() ────────────────────────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "lgl"           "some_grouping"
`names(expected)`: "some_grouping" "lgl"          
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:24:2
 2.   └─arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:54'): distinct() can retain groups ──────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int"           "lgl" "some_grouping"
`names(expected)`: "some_grouping" "int" "lgl"          
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:54:2
 2.   └─arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:54'): distinct() can retain groups ──────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int"           "lgl" "some_grouping"
`names(expected)`: "some_grouping" "int" "lgl"          
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:54:2
 2.   └─arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:64'): distinct() can retain groups ──────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int" "y"   "x"
`names(expected)`: "y"   "int" "x"
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:64:2
 2.   └─arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:64'): distinct() can retain groups ──────────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int" "y"   "x"
`names(expected)`: "y"   "int" "x"
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:64:2
 2.   └─arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:83'): distinct() can contain expressions ────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int" "lgl" "x"
`names(expected)`: "lgl" "int" "x"
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:83:2
 2.   └─arrow:::expect_equal(via_batch, expected, ...) at tests/testthat/helper-expectation.R:115:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-distinct.R:83'): distinct() can contain expressions ────
`object` (`actual`) not equal to `expected` (`expected`).

`names(actual)`:   "int" "lgl" "x"
`names(expected)`: "lgl" "int" "x"
Backtrace:1. └─arrow:::compare_dplyr_binding(...) at test-dplyr-distinct.R:83:2
 2.   └─arrow:::expect_equal(via_table, expected, ...) at tests/testthat/helper-expectation.R:129:4
 3.     └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:42:4
── Failure ('test-dplyr-join.R:30'): left_join ─────────────────────────────────
`compare_dplyr_binding(...)` did not throw the expected message.
── Error ('test-dplyr-join.R:178'): semi_join ──────────────────────────────────
<rlib_error_dots_nonempty/rlib_error_dots/rlang_error/error/condition>
Error in `semi_join(., to_join, by = "some_grouping", keep = TRUE)`: `...` must be empty.
x Problematic argument:
* keep = TRUE
Backtrace:1. ├─arrow:::compare_dplyr_binding(...) at test-dplyr-join.R:178:4
  2. │ └─rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(.input = tbl))) at tests/testthat/helper-expectation.R:97:2
  3. ├─... %>% collect()
  4. ├─dplyr::collect(.)
  5. ├─dplyr::semi_join(., to_join, by = "some_grouping", keep = TRUE)
  6. └─dplyr:::semi_join.data.frame(., to_join, by = "some_grouping", keep = TRUE)
  7.   └─rlang::check_dots_empty0(...)
  8.     └─rlang::check_dots_empty(call = call)
  9.       └─rlang:::action_dots(...)
 10.         ├─base (local) try_dots(...)
 11.         └─rlang (local) action(...)
── Error ('test-dplyr-slice.R:145'): slice_* not supported with groups ─────────
Error in `slice_min(grouped, n = 5)`: `order_by` is absent but must be supplied.
Backtrace:1. ├─testthat::expect_error(slice_min(grouped, n = 5), "Slicing grouped data not supported in Arrow") at test-dplyr-slice.R:145:2
 2. │ └─testthat:::expect_condition_matching(...)
 3. │   └─testthat:::quasi_capture(...)
 4. │     ├─testthat (local) .capture(...)
 5. │     │ └─base::withCallingHandlers(...)
 6. │     └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
 7. └─dplyr::slice_min(grouped, n = 5)
 8.   └─rlang::check_required(order_by)
 9.     └─rlang::abort(msg, call = call)
── Error ('test-dplyr-summarize.R:301'): Functions that take ... but we only accept a single arg ──
Error in `summarize(., distinct = n_distinct())`: i In argument: `distinct = n_distinct()`.
Caused by error in `n_distinct()`:
! `...` is absent, but must be supplied.

[ FAIL 13 | WARN 16 | SKIP 71 | PASS 8541 ]
Error: Test failures
Execution halted

Component(s)

R

@paleolimbot
Copy link
Member

I'm happy to report that - at least as of today - we've fixed all of those! (I also just ran pak::pak("tidyverse/dplyr") and devtools::test() to double check). We're in the process of preparing a CRAN release in the next week or so (if all goes well).

@kou kou changed the title dplyr 1.1.0 compatibility [R] dplyr 1.1.0 compatibility Jan 27, 2023
@DavisVaughan
Copy link
Contributor Author

That's great! Thanks so much

@DavisVaughan
Copy link
Contributor Author

Ah it looks like Lionel did the bulk of the work back in December #14947

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants