Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] write_dataset gets a "zsh: illegal hardware instruction R" error #37034

Closed
devinrkeane opened this issue Aug 7, 2023 · 4 comments
Closed

Comments

@devinrkeane
Copy link

devinrkeane commented Aug 7, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Hi - we use arrow in R and python for a wide range of projects at my company but are running into a datset issue with R-arrow I am not sure how to track down, it may be something with my system. Recently updated to 12.0.1.1 on an M1 Max x86 running on Ventura 13.4

library(arrow)
library(testthat)
library(ggplot2)

Running the following I'm able to read/write parquet file no problem:


data <- ggplot2::diamonds
tmpdir <- tempdir()

# read parquet works fine
write_parquet(data, file.path(tmpdir, "test.parquet"))
data2 <- read_parquet(file.path(tmpdir, "test.parquet"))
expect_equal(data$carat, data2$carat)

But trying to write a dataset I get an immediate "Fatal Error" crash in Rstudio. Running in R, This happens:

# fatal error
write_dataset(data, path = tmpdir, partitioning = "cut")

I get what appears to be a prompt asking me to make a Selection (see image), with no options. I just put in "1" and get the

zsh: illegal hardware instruction R error`

Tracing the error i made it to Schema__WithMetadata() before it broke.

Screenshot 2023-08-07 at 11 59 06 AM

Some context:

open_dataset also has the same result on some preexisting datasets we've created at my company, this is actually where I came across the error and then found write_dataset was doing something similar.

I dont know why sessionInfo says Im running under Big Sur/Monterrey below, maybe that has something to do with it? My company has software manager (kandji) and updated this mac to Ventura a while ago. Could just be me :/

sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] ggplot2_3.3.6   testthat_3.1.10 arrow_12.0.1.1 

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1   tidyselect_1.1.0 bit_4.0.4        munsell_0.5.0   
 [5] colorspace_1.4-1 R6_2.4.1         rlang_1.1.1      fansi_0.4.1     
 [9] dplyr_1.0.7      tools_4.1.3      grid_4.1.3       gtable_0.3.0    
[13] utf8_1.1.4       DBI_1.1.0        cli_3.6.1        withr_2.5.0     
[17] ellipsis_0.3.2   yaml_2.2.1       bit64_4.0.5      assertthat_0.2.1
[21] tibble_3.1.7     lifecycle_1.0.3  crayon_1.4.1     brio_1.1.2      
[25] purrr_1.0.1      vctrs_0.6.2      glue_1.6.2       pillar_1.7.0    
[29] compiler_4.1.3   generics_0.1.3   scales_1.1.1     renv_0.17.3     
[33] pkgconfig_2.0.3 


arrow_info()

Arrow package version: 12.0.1.1

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait FALSE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                          
SIMD Level          sse4_2
Detected SIMD Level sse4_2

Build:
                                    
C++ Library Version           12.0.1
C++ Compiler              AppleClang
C++ Compiler Version 14.0.3.14030022

  • I have no problems running this test script on my other mac, which is an Mac Pro Intel chip on Ventura 13.5. Others have had no issue but they're all Intel. We all run on the same renv project.

  • ran brew install apache-arrow as well, which updated/installed 12.0.1_4, but my understanding is the mac binary for R-arrow comes with everything, I assume that version wouldnt be an issue anyway but could there be confusion there?

Component(s)

R

@paleolimbot
Copy link
Member

We have had many reports of this error ( #36685 ), which primarily surfaces on the x86 flavour of R running on MacOS M1. I believe a fix has been merged; however, it won't be released for a few months (it's slated for 14.0.0). A few workarounds:

  • export ARROW_RUNTIME_SIMD_LEVEL=NONE before starting the R process
  • Use the arm64 build of R for MacOS

@paleolimbot
Copy link
Member

I would be curious if the ARROW_RUNTIME_SIMD_LEVEL=NONE fix works (one user reported that it did not; however, in theory it should prevent this issue from occurring).

@devinrkeane
Copy link
Author

devinrkeane commented Aug 18, 2023

@paleolimbot thanks for this suggestion, I added to my zshrc file, made sure zsh is my default shell/confirmed the env var, restarted, tried and still got the error, though it added this number, perhaps the process id, not sure.

 12898 illegal hardware instruction  R

jonkeane added a commit that referenced this issue Sep 19, 2023
)

Resolves #33807 and #37034

### Rationale for this change

If someone is running R under emulation, arrow segfaults without error. We can detect this when we load so can also warn people that this is not recommended. Though the version of R being run is not directly an arrow issue, arrow fails very quickly in this configuration.

### What changes are included in this PR?

Detect when running under rosetta (on macOS only) and warn when the library is attached

### Are these changes tested?

No, given the paucity of ARM-based mac CI, testing this organically would be difficult. But the logic is straightforward.

### Are there any user-facing changes?

Yes, a warning when someone loads arrow under emulation.
* Closes: #33807

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
@thisisnic
Copy link
Member

From what others have reported, I'm not sure changing variables will get this to work assuming you're running an Intel version of R on a non-Intel mac. With other similar issues like this, it's been resolved by installing a native (arm64) build of R. Assuming the above is correct, I'm going to close this issue, but feel free to reopen if it's not the case and we can discuss more!

loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
apache#37777)

Resolves apache#33807 and apache#37034

### Rationale for this change

If someone is running R under emulation, arrow segfaults without error. We can detect this when we load so can also warn people that this is not recommended. Though the version of R being run is not directly an arrow issue, arrow fails very quickly in this configuration.

### What changes are included in this PR?

Detect when running under rosetta (on macOS only) and warn when the library is attached

### Are these changes tested?

No, given the paucity of ARM-based mac CI, testing this organically would be difficult. But the logic is straightforward.

### Are there any user-facing changes?

Yes, a warning when someone loads arrow under emulation.
* Closes: apache#33807

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
apache#37777)

Resolves apache#33807 and apache#37034

If someone is running R under emulation, arrow segfaults without error. We can detect this when we load so can also warn people that this is not recommended. Though the version of R being run is not directly an arrow issue, arrow fails very quickly in this configuration.

Detect when running under rosetta (on macOS only) and warn when the library is attached

No, given the paucity of ARM-based mac CI, testing this organically would be difficult. But the logic is straightforward.

Yes, a warning when someone loads arrow under emulation.
* Closes: apache#33807

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants