-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A target archetype that runs on AWS through Metaflow #8
Comments
I need to figure out how to write A bigger issue is probably the way |
Quick note on this: Metaflow doesn't support anonymous functions as written here. I think it's an easy, non-breaking change and I've drafted some code. I'll clean it up and submit that as a pull request to the repo. |
Seems straightforward to work around if we define a function from inside the command for the target.
Thank you so much, David! Really looking forward to this! If it works out, it could be a huge win-win. |
My opinion is changing on this one. I think |
I read up more on AWS ParallelCluster, AWS Batch, and Metaflow's HPC, and I no longer think ParallelCluster is something that makes sense to integrate with directly. I think a Metaflow target archetype makes more sense to start with, and the versioning could still help even after Some future development ideas:
|
Just learned some neat stuff from experimenting with metaflow.org/sandbox. It gave me another idea for AWS S3 integration in |
Update: thanks to http://metaflow.org/sandbox, I think I figured out what AWS S3 integration in AWS Batch integration is going to be a lot harder. What we really need is a If/when we get that far, the value added from |
On reflection, I am closing this issue. The maintainers of |
Prework
tarchetypes
' code of conduct.tarchetypes
' contributing guidelines.Proposal
targets
(anddrake
) are currently tied to traditional HPC (Slurm, SGE, etc.). That's enough for me and my team right now, but not for the increasing number of people like @MilesMcBain who rely on AWS and other cloud platforms. There is not yet much cloud support native to the R ecosystem, and since I don't use AWS for my own work, I am not prepared to do much at a low level.Metaflow not only runs computation on the cloud and stores the results on S3, it also abstracts away the devops overhead that comes along with that, and it supports a sophisticated versioning system for code and data. I think we will gain a ton of power and convenience if we leverage Metaflow's rich and potentially complementary feature set in situations where
targets
needs the cloud.Earlier, I proposed a
targets
-within-Metaflow approach, which I think would be useful for people with multiple {targets} pipelines in the same project. Here, I would like to explore the reverse: a target archetype that that runs some R code as a single AWS Metaflow step. Sketch:cc @savingoyal (for when you return) @jasonge27, @bcgalvin.
The text was updated successfully, but these errors were encountered: