-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Permission error on Windows when deleting file previously accessed with open_dataset #31796
Comments
Will Jones / @wjones127: |
Will Jones / @wjones127: |
Weston Pace / @westonpace: |
Dewey Dunnington / @paleolimbot: |
Weston Pace / @westonpace: |
Will Jones / @wjones127: |
Will Jones / @wjones127: library(arrow)
#> Warning: package 'arrow' was built under R version 4.1.3
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(readr)
# warning: 5 MB
test_data <- read_csv("https://cn.dataone.org/cn/v2/resolve/knb.92098.1")
#> Rows: 104497 Columns: 9
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (7): Y_CNAREA, CITY, A_RESID, local, Y_Rural, P_FSHY, P_TYPE
#> dbl (2): YEAR, Permit_Count
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
write_dataset(test_data, "test_data", partitioning = "CITY")
# Original example was with DuckDB, but that's not necessarily the issue
# con <- open_dataset("test_dataset") |> to_duckdb()
con <- open_dataset("test_data")$NewScan()$Finish()$ToRecordBatchReader()
files <- dir("test_data", full.names = TRUE, recursive = TRUE)
rm(con)
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 1052314 56.2 2227298 119 1220182 65.2
#> Vcells 2795421 21.4 8388608 64 3736313 28.6
lapply(files, file.remove)
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=CORNER BAY/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=COUNCIL/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=CRAIG/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=CROOKED CREEK/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=DEEP BAY/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=DEERING/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=DELTA JUNCTION/
#> part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file 'test_data/CITY=DENALI PARK/
#> part-0.parquet', reason 'Permission denied'
#> [[1]]
#> [1] TRUE
#>
#> [[2]]
#> [1] TRUE
#>
#> [[3]]
#> [1] TRUE
#>
#> ...
#>
#> [[343]]
#> [1] TRUE
#>
#> [[344]]
#> [1] TRUE |
|
Weston Pace / @westonpace: |
Weston Pace / @westonpace: |
Miles Granger / @milesgranger: Is my understanding correct? If so, it seems like it could be related to this issue, but suppose there are reasons for not doing such? |
Weston Pace / @westonpace: (I'm specifically referring to IpcFileFormat::Inspect here) |
Joris Van den Bossche / @jorisvandenbossche: But given the reported issues on Windows, that might not work sufficiently in all cases? |
Miles Granger / @milesgranger: |
Antoine Pitrou / @pitrou: |
Yes. In C++ there is no "with" or "using" statement like there is in other languages so relying on scope is the safe alternative. Explicitly calling close usually leaves open the risk of missing a close on a failure path or especially if an exception is thrown.
I'm not convinced this is the case. I'm pretty sure we are calling close on Windows too. How are we confirming the file is open? |
Weston Pace / @westonpace:
|
Joris Van den Bossche / @jorisvandenbossche: And the workaround we can give to users: put some kind of try-except around the delete, and try again after some pause? |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: For example, our filesystem tests create, write and delete files and we don't seem to get spurious deletion errors on them. |
Antoine Pitrou / @pitrou: |
Apache Arrow JIRA Bot: |
Hello! This issue has reemerged with Arrow v11.0.0.3. It was previously fixed in an older arrow version but we are now encountering it again. Please let me know if you'd prefer I open a new issue. The issue is a bit stranger now in that it seems to require multiple dplyr and base-R functions to be triggered. I played around with it a little and was only able to trigger the issue with a combination of a library(arrow)
#> Warning: package 'arrow' was built under R version 4.2.3
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
test_data1 <- expand.grid(A=1:150, B=1:10, C=1:50)
test_data2 <- data.frame(A=1:100, D=1:100)
write_dataset(test_data1, "test_data1", partitioning = "A")
write_dataset(test_data2, "test_data2", partitioning = "A")
test<-open_dataset("test_data1")%>%
dplyr::inner_join(open_dataset("test_data2")) %>%
head()%>%
collect()
rm(list=ls())
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 1184245 63.3 2479418 132.5 1365350 73.0
#> Vcells 2047220 15.7 8388608 64.0 3304441 25.3
files1 <- dir("test_data1", full.names = TRUE, recursive = TRUE)
files2 <- dir("test_data2", full.names = TRUE, recursive = TRUE)
files1_leftover<-lapply(files1, file.remove)
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data1/A=96/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data1/A=97/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data1/A=98/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data1/A=99/part-0.parquet', reason 'Permission denied'
which(!unlist(files1_leftover))
#> [1] 147 148 149 150
files1[!unlist(files1_leftover)]
#> [1] "test_data1/A=96/part-0.parquet" "test_data1/A=97/part-0.parquet"
#> [3] "test_data1/A=98/part-0.parquet" "test_data1/A=99/part-0.parquet"
files2_leftover<-lapply(files2, file.remove)
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data2/A=96/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data2/A=97/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data2/A=98/part-0.parquet', reason 'Permission denied'
#> Warning in FUN(X[[i]], ...): cannot remove file
#> 'test_data2/A=99/part-0.parquet', reason 'Permission denied'
which(!unlist(files2_leftover))
#> [1] 97 98 99 100
files2[!unlist(files2_leftover)]
#> [1] "test_data2/A=96/part-0.parquet" "test_data2/A=97/part-0.parquet"
#> [3] "test_data2/A=98/part-0.parquet" "test_data2/A=99/part-0.parquet" Created on 2023-03-29 with reprex v2.0.2 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.2 (2022-10-31 ucrt)
#> os Windows 10 x64 (build 19045)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/Los_Angeles
#> date 2023-03-29
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> arrow * 11.0.0.3 2023-03-08 [1] CRAN (R 4.2.3)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1)
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.2)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.3)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#> dplyr * 1.1.1 2023-03-22 [1] CRAN (R 4.2.3)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
#> fs 1.6.0 2023-01-23 [1] CRAN (R 4.2.2)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.2)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.2)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.2)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.2)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.3)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.2)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.2)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.3)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.2)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.1)
#> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.3)
#> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
#> styler 1.9.1 2023-03-04 [1] CRAN (R 4.2.3)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.3)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.2)
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.2)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
#> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.3)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.2)
#> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
#>
#> [1] C:/Users/sbashevkin/AppData/Local/R/win-library/4.2
#> [2] C:/Program Files/R/R-4.2.2/library
#>
#> ────────────────────────────────────────────────────────────────────────────── |
The error does seem to be related to the head() with `[`(1:6,) the error disappears. |
On Windows this fails:
But on MacOS it does not:
Reporter: Will Jones / @wjones127
Related issues:
Note: This issue was originally created as ARROW-16421. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: