-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc machine
: implement initial terraform support
#6462
Conversation
Not really sure what we want to do with the terraform CLI output. It seems like something we should suppress, but there's no other useful way for us to display progress since we have to wrap their CLI (and operations to manage cloud machines will generally be slow) I left out a @dberenbaum take a look and see what you think so far |
I'm also not sold on the I think the eventual workflow will basically be something along the lines of
|
Also we probably don't actually need the jinja dependency, since the tf files we will generating are simple enough, but I left it in for now since the initial dvc-tpi example also used it. |
Great stuff! Responses to your comments:
Not sure either. Should it take a long time to create/destroy machines? Maybe @skshetry can weigh in here.
Why not start with just the repo config and we can add more later? Even if the output format has to change, I don't see much downside unless I'm missing something. See also https://www.notion.so/iterative/Remote-Executors-215a0767aa0341d892b52767e58e2ac6#2281a6b533e4433496d0dfb8db6bf96b for ideas.
No strong feelings here. We can get some other opinions on this before we commit to the name.
Okay. We might need jinja anyway when we introduce Other thoughts:
|
This is fine with me, but this type of stuff can also be split out into separate tasks
I'm not really sure, it will depend on where we want to go with usage I think. I was thinking along the lines of
We will probably want a
Mostly just for config schema validation reasons. It's possible to modify it with
It seems like we will need one, but I left it out for now for the same reasons as
|
Also, I should note that for anyone testing this, I'm pretty sure that creating more than one executor will not work properly since the current (this will need to be addressed in the future) |
Also, regarding jinja, we could also drop it if we opt to make DVC-generated terraform configs non-human-readable (or at least less readable) and output them as JSON ( I guess the relevant question here would be whether or not we intentionally want the DVC-generated terraform configs to be user-editable. Users are always going to be able to manually modify the files, but writing them in JSON will make it more obvious that they are not supposed to be manually edited (and that users should use |
Regarding managing multiple executors, it seems like what we will need is separate terraform directories per executor (so We want to individually provision/destroy the machines per DVC executor (i.e. a single terraform resource), but terraform is intended to be used on a per-configuration (encompassing all possible resources) basis rather than a per-resource basis. The thing to note here would be that doing separate configurations per executor means we will have multiple terraform states to manage rather than a single state w/all of our possible resources. |
Sounds good.
Consistent with above, let's leave it out for now and come back to it.
My concern with the usage is that there could be times when
We can leave
π |
/CC @iterative/cml |
If we plan to stick with Go, as per iterative/terraform-provider-iterative#164 (comment), we could provide a Python wrapper package including precompiled wheels. |
In this case I think |
@@ -224,10 +225,16 @@ class RelPath(str): | |||
"row_limit": All(Coerce(int), Range(1)), # obsoleted | |||
"row_cleanup_quota": All(Coerce(int), Range(0, 100)), # obsoleted | |||
}, | |||
"machine": { | |||
str: { | |||
"cloud": All(Lower, Choices("aws", "azure")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
"cloud": All(Lower, Choices("aws", "azure")), | |
"cloud": All(Lower, Choices("aws", "azure", "gcp", "kubernetes")), |
/CC @iterative/cml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also
Line 91 in f77a5d2
CLOUD_BACKENDS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do gcp and kubernetes work with the existing TPI? I was just going off of what's documented in the readme right now
The Iterative Provider currently supports AWS and Azure. Google Cloud Platform is not currently supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, this kind of thing can also be addressed later (#6480)
This PR is not meant to be a complete/finished implementation in any way
The main purpose here is to get the general API/workflow laid out so we aren't blocking other DVC team members that will be filling out the actual implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes gcp
& kubernetes
are in the same category as dvclive
- they should work even though the docs aren't there yet.
The question is, should we support DVC logging in through SSH to an unknown environment and acting like it knows what is there and how to find it? Cloud execution is "easy" because we provide the system images and know how to provision them, but accessing a custom system is a completely different scenario. We can provide reference Ansible roles for do it yourself users, though. I'm inclined to think that Kubernetes should be our goβto abstraction layer for onβpremises executor support; trying to operate at lower levels could be risky for maintenance, support and user experience. |
We aren't going to ssh into an existing machine anytime soon most likely, but keeping it in mind as a possible option is what matters for now. Both |
Yeah, it's |
It would be better to have this kind of discussion in notion - in this case there's a dedicated story for managing existing/custom resources, but it's not a requirement that we are planning on addressing any time in the near future. |
dvc machine
: implement initial terraform supportdvc machine
: implement initial terraform support
windows tests should pass now in theory :) iterative/terraform-provider-iterative#179 (comment) |
This initial PR should be ready to merge, parts of |
How should we document |
Using a wiki page makes the most sense for now I think |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Related to #6440
Related to: #6263
To test:
pip install dvc[terraform]
orpip install -e .[terraform]
dvc config feature.executor true
(--global
config recommended)Currently the only supported TPI config options are
name
andcloud
(both required as a part ofdvc executor add
). Eventually all of the TPI options should be supported in.dvc/config