Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.onLoad fails in loadNamespace on read-only filesystem (AWS lambda) #611

Closed
FMKerckhof opened this issue Apr 26, 2022 · 10 comments · Fixed by #748
Closed

.onLoad fails in loadNamespace on read-only filesystem (AWS lambda) #611

FMKerckhof opened this issue Apr 26, 2022 · 10 comments · Fixed by #748
Labels
bug an unexpected problem or unintended behavior

Comments

@FMKerckhof
Copy link

I am trying to read a pin from an Rsconnect board in a containerized AWS lambda function (runtime: R 4.1.3, pins 1.0.1).
Although I am specifying the cache directory to be on the ephemeral storage of lambda (/tmp) as follows: board <- pins::board_rsconnect(server = Sys.getenv("CONNECT_SERVER"), cache = "/tmp") (the CONNECT_SERVER and CONNECT_API_KEY have been added as environmental variables to the lambda function configuration), still I get an error that on the loading of the namespace an attempt at creating a directory in /home is being made that is not writable. Is there a way to circumvent this behavior? I am loading at least 20 other packages in the runtime without any problems - it is specific to the behavior of pins:

Error: package or namespace load failed for ‘pins’:
.onLoad failed in loadNamespace() for 'pins', details:
call: NULL
error: [EROFS] Failed to make directory '/home/sbx_user1051': read-only file system

I checked r-pkgs on side effects on load and in zzz.R I noticed that the function board_register_local gets called .onLoad, without any options
https://github.com/rstudio/pins/blob/2a1a0d89af8e9e3b696c656d5bb166502cec1ee0/R/zzz.R#L12 . Is there a way to pass /tmp as a cache directory here? Could it be read from an environmental variable?

@FMKerckhof
Copy link
Author

Based upon zzz.R it would appear I could "trick" the lambda runtime to pretend to be R CMD CHECK by setting a non-empty value to the _R_CHECK_PACKAGE_NAME_ cf. https://github.com/rstudio/pins/blob/2a1a0d89af8e9e3b696c656d5bb166502cec1ee0/R/utils.R#L115 which will indeed call tempdir() which on a linux system will use /tmp . However, lambda does not allow for environmental variables that start with "_"
image

@FMKerckhof
Copy link
Author

I have been able to resolve this by setting the environmental variables in the lambda runtime before loading the pins library as follows:

Sys.setenv(R_USER_CACHE_DIR = tempfile())
Sys.setenv(R_USER_DATA_DIR = tempfile())

library(pins)

This was inspired by the R CMD CHECK fix of zzz.R .

I am closing the issue (since it is resolved), but it may be useful to document these environmental variables more clearly since they define the .onLoad behavior of the pins namespace and can lead to errors on read-only filesystems.

@FMKerckhof
Copy link
Author

FMKerckhof commented Jul 18, 2022

I am re-opening this issue since the fix above appears to not work any longer. I used to get warnings for not being able to create a directory from normalizePath that still allowed the code to be called:

normalizePath("~") :
path[1]="/home/sbx_user1051": No such file or directory
Warning message:
In normalizePath("~") :
path[1]="/home/sbx_user1051": No such file or directory

Now, with more recent versions of R and some dependent packages (and maybe updates on AWS side) I get an error that stops code execution:

.onLoad failed in loadNamespace() for 'pins', details:
--
call: NULL
error: [EROFS] Failed to make directory '/home/sbx_user1051': read-only file system

I know it is a pretty niche case - but is there a way to load the pins package on a read-only file system? Afterwards the cache dirs can take over (above) but how can I assure loading the package will not become an issue?

@juliasilge
Copy link
Member

Thanks for reopening @FMKerckhof! I don't believe there is currently a workaround if the environment variables don't work anymore. 😞 We will work on a fix for this after our big conference, so you can look out for that in August. We'd definitely appreciate your input at that time.

@juliasilge juliasilge added the bug an unexpected problem or unintended behavior label Jul 18, 2022
@mdneuzerling
Copy link

mdneuzerling commented Aug 14, 2022

I've had success by setting the PINS_USE_CACHE environment variable to "true" (any other value would work). The relevant function is board_cache_path. I still get some warnings about normalizePath("~") (a directory that doesn't exist) but no errors.

@juliasilge
Copy link
Member

Having worked a little more with AWS Lambda, I do think that the PINS_USE_CACHE env variable is working pretty well, and seems like a reasonable use for an env var. However, it is weird/confusing that PINS_USE_CACHE = "true" works when what we are doing is really turning off the cache in favor of using the temp directory. Here are some questions:

  • Should we change the board_cache_path() function so PINS_USE_CACHE = "true" does in fact opt in to using cache_dir() (vs. not using it)?
  • Alternatively or in addition, should we add one more env variable inside of that board_cache_path() function that has a better/clearer name? Like PINS_USE_TEMP or PINS_NO_CACHE_DIR?
  • Where should we document this behavior? We could put it in this caching section? @FMKerckhof or @mdneuzerling would you have thought to look there based on the errors you have dealt with? Any ideas on a better place to document this?

@FMKerckhof
Copy link
Author

Thanks for including me @juliasilge

In the end, I went with setting the following system environmental variables in my runtime before loading the pins package (or packages that call pins) did the trick in an image based on lambda/provided:al2 :

Sys.setenv(R_USER_CACHE_DIR = tempdir())
Sys.setenv(R_USER_DATA_DIR = tempdir())
Sys.setenv(HOME = tempdir())

While this admittedly not ideal, from my perspective it is a bit more sensible (?) than the PINS_USE_CACHE env var because it provide pins along with other packages that may rely on those env vars with a writable directory on the lambda's ephemeral storage .

That being said, I haven't verified if this would/could work on actual read-only filesystems or on other linux flavors. W.r.t. documentation - maybe the caching section of getting started is a bit to much targeted towards a general audience for it? It seems like this is a quite niche problem, maybe better suited under the article section (e.g. "using pins on read-only filesystems/with lambda")?

@juliasilge
Copy link
Member

Thanks so much @FMKerckhof! 🙌 I decided to go ahead and add a new pins-specific env variable as well, and to do some documentation on a function page. Take a look at #748 if you are interested in giving feedback.

@juliasilge
Copy link
Member

You can check out new documentation (including the new env var for the pins cache) here.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants