Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deployment through AWS Lambda #128

Closed
mdneuzerling opened this issue Jul 28, 2022 · 20 comments
Closed

Support deployment through AWS Lambda #128

mdneuzerling opened this issue Jul 28, 2022 · 20 comments
Labels
feature a feature request or enhancement

Comments

@mdneuzerling
Copy link

I'm following up on a tweet from a long time ago about implementing this. All the rstudio::conf tweets make me want to build something!

The end goal of this feature would be to simplify and automate the creation of a Dockerfile, the image of which can be deployed as an AWS Lambda function. Users could then access the model through any of the supported AWS integrations (including a HTTP API Gateway).

Like vetiver_write_docker, the goal is not to do the actual deployment and configuration of the endpoint, but to produce a Dockerfile that can be built into an image which is deployed:

Deployment First use... Then create a Dockerfile with...
Docker vetiver_write_plumber vetiver_write_docker
AWS Lambda vetiver_write_lambda_runtime vetiver_write_lambda

If the maintainers are okay with this proposal I'm happy to implement this (with an unknown time frame!). The following changes would need to be made:

  • introduce lambdr as a "Suggests" dependency.
  • write the vetiver_write_lambda_runtime function with an API identical to that of vetiver_write_plumber. The generated runtime file would need to read the model pin, source packages, and then run some sort of predict function.
  • create an internal vetiver_write_dockerfile function that would be used by both vetiver_write_docker and vetiver_write_lambda. This is so we can have more flexible options for writing Dockerfiles, especially using different parent images. We're aiming for a Dockerfile like the one shown here.
@juliasilge
Copy link
Member

juliasilge commented Jul 28, 2022

This would be fantastic @mdneuzerling and we would love to work together with you to get this implemented. I would want to use renv for package management instead of install.packages() to ensure the model has the right versions for predictions.

@juliasilge juliasilge added the feature a feature request or enhancement label Jul 28, 2022
@mdneuzerling
Copy link
Author

Absolutely. The Dockerfile will be very similar to the one for plumber, except it will be a different parent. There's some additional complexities there, like installing R and using CentOS commands instead of Debian.

I'll put something together!

@jonthegeek
Copy link

Have you made progress on this @mdneuzerling? I'd very much like to help/test!

@mdneuzerling
Copy link
Author

Sorry, I've been held up with personal stuff (lots of corgis).

I've made a start on the Dockerfile work (https://github.com/mdneuzerling/vetiver-r/tree/lambda) and I expect to finish this over the weekend. However, this feature will be 20% code and 80% testing. If you have a Vetiver model handy then I would truly appreciate you trying to deploy when I'm done with the vetiver_write_lambda function.

@jonthegeek
Copy link

@mdneuzerling unfortunately my REAL use cases are going to need some other work before {vetiver} will be happy with them, I think... but right now I'm learning how things work, so I plan to prep a couple simple models that work with {vetiver} to use as part of my testing/learning process working toward my real models... which is all to say, ya, I think I can do that! My goal today is to deploy a model like the vetiver deploy example but without actual {vetiver}, using {lambdr}, so I can grok how the basics of predict on lambda will work. In theory we should be able to use that as a direct comparison for the workflow you're putting together.

@jonthegeek
Copy link

Check out rstudio/pins-r#611 if you haven't seen that! I beat my head against that a bit today. We should presumably go to S3 preferably from Lambda, so maybe there's a better option than using {pins}, but it might make things complicated.

@mdneuzerling
Copy link
Author

Oof, that may be an issue. A Lambda instance in a container can only write to the /tmp directory. Thanks for the heads up about this.

I think pins and vetiver are meant to work with one another, but I agree that someone deploying a model on Lambda is probably storing their model on S3 too.

@mdneuzerling
Copy link
Author

It's not quite there yet, but there should be enough material for you to get started, @jonthegeek: https://github.com/mdneuzerling/vetiver-r/tree/lambda

I'm using the below repo as a test for Vetiver. At the moment I'm struggling to build the Dockerfile because renv isn't picking up on a knitr dependency, but that will be solvable I'm sure. Consider the deploy.R file for defining the vetiver object, and creating the Lambda runtime.R and Dockerfile.
https://github.com/mdneuzerling/simpsons-vetiver

I'm using the following build command. I've created an ECR repository by this point, so if you plan to follow along you'll need to do the same and use the URI below:

docker build --platform amd64 -t <aws_account_number>.dkr.ecr.<region>.amazonaws.com/<repository>:latest .

@mdneuzerling
Copy link
Author

After a few more changes I can build the Dockerfile. I've been able to deploy the Lambda and I'm now hitting the pins error as above. I'll look into that next.

I noticed that the individual boards have package dependencies which may need to be captured when writing the renv lockfile. For lambda I just hardcoded a paws.storage dependency.

@jonthegeek
Copy link

@mdneuzerling i haven't had a chance to play since Friday, but I'd try a fork of pins without the onload to see what happens. If we can hard-code past that we'll know what needs to change.

@juliasilge
Copy link
Member

juliasilge commented Aug 15, 2022

For now in my demos, I am manually adding the required pins packages as well. I'll need to look into what is needed to make that work in the long term.

@jonthegeek
Copy link

Woot!

success

(I installed your dev version and updated things to hit the ::: versions of your functions, and I had to manually tell it to install knitr for some reason that I haven't fully diagnosed yet... but it worked with the demo function from https://vetiver.rstudio.com/get-started/deploy.html )

@mdneuzerling
Copy link
Author

I’ll write up the docstrings and unit tests before submitting a PR. I’d also like to make it so that lambdr_predict wraps handler_predict. Otherwise we’re going to be doubling up on a lot of code.

@juliasilge
Copy link
Member

I'm working to see if I can get this at least running myself and have a question that I hope one of you can help me with; I haven't worked with AWS Lambda much.

I'm running into problems getting Lambda to have access to a pin on S3.

Let's say I have pinned to an S3 bucket called "pins-testing-julia". Locally, I would use something like this to access it:

## `my-sso-profile` is in ~/.aws/config:
b <- board_s3(bucket = "pins-testing-julia", profile = "my-sso-profile", region = "us-east-2")

What do I do so that the pin is accessible to Lambda? I have a runtime.R that currently looks like this:

Sys.setenv(PINS_USE_CACHE = "true")

library(pins)
library(lambdr)

get_pin_contents <- function(name) {
  b <- board_s3(bucket = "pins-testing-julia", region = "us-east-2")
  pin_read(b, name)
}

start_lambda()

I made a Dockerfile with CMD ["get_pin_contents"], etc, and created a Lambda function. I gave it the s3:GetObject permission for arn:aws:s3:::pins-testing-julia/* (I also tried bumping up the S3 permission to, say, all read and write). If I do a call with "name": "name-of-my-pin-that-already-exists", then the call fails, without much helpful in the logs. (I do see the warning messages about In normalizePath("~") : that indicate it at least loaded the pins library.)

@jonthegeek
Copy link

@juliasilge Hmm. I assumed it was a permission thing, but you did that step. I'm technically doing mine via an IAM role that has the Lambda basics + a wide set of S3 read permissions, but I'm pretty sure you're doing something equivalent.

Can you share the full error message when you do a test (redacting anything that needs to be redacted but as far as I can remember there's nothing private there)? It might not be much but it'd be helpful to see what it DOES do.

@mdneuzerling
Copy link
Author

@juliasilge I was able to get something almost identical working: https://github.com/mdneuzerling/test-get-pins

After building the image, pushing to ECR, and creating the Lambda function I made the following changes in the AWS web GUI:

  • Bumped the resources up to 512MB of memory with a 15 second timeout
  • Attached an inline policy to the automatically generated IAM role for the Lambdr that granted read and list privileges to all objects in the mdneuzerling bucket. For some reason this had to be an inline policy, since attaching an existing policy didn't work (I have no idea why it needs to be inline. But could this be the issue?)

Running the test: {"name":"penguins"} through either the AWS console or directly invoking the Lambda through the AWS CLI returned the best data set ever.

As for the PR, I'm pretty happy with the code, but now I need to take care of the tests and documentation. That's a huge task.

@juliasilge
Copy link
Member

AH OK I got this to work! 🎉

Before, I had been trying to add a second permission policy to the execution role but this did not work for me. Instead, I had to edit the existing permission policy, to add read and list for the S3 bucket to the default Lambda execution role.

Thank you both so much! 🙌

@mdneuzerling If you are interested in opening a draft PR soon, I don't mind starting to test it and writing some documentation. No rush, but whenever you are ready!

@mdneuzerling
Copy link
Author

Apologies folks, but I very rarely code in my free time these days and I feel guilty for leaving an issue open on such an important repository. Given that this has stagnated for over a year I think I will close this.

@juliasilge
Copy link
Member

No worries at all @mdneuzerling! For my own context setting, do you plan to continue maintaining lambdr, for example if folks report bugs and all that? Or should we think about that as not actively maintained? Because I believe this vetiver support was pretty close to working and we could possibly finish it off, especially if other folks are interested in using it.

@mdneuzerling
Copy link
Author

I hope to keep lambdr on CRAN (and I've just submitted a patch to satisfy their new documentation requirements).

I did make some progress, although the vetiver package has changed a lot since then: main...mdneuzerling:vetiver-r:lambda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
Development

No branches or pull requests

3 participants