Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept scripts without requiring additional Base64 encoding #91

Closed
0x2b3bfa0 opened this issue Mar 21, 2021 · 24 comments
Closed

Accept scripts without requiring additional Base64 encoding #91

0x2b3bfa0 opened this issue Mar 21, 2021 · 24 comments
Assignees
Labels
p2-nice-to-have Low priority resource-machine iterative_machine TF resource resource-runner iterative_runner TF resource technical-debt Refactoring, linting & tidying ui/ux User interface/experience

Comments

@0x2b3bfa0
Copy link
Member

Terraform support for heredocs and files would play nicely with literal scripts and wouldn't require users to encode them in advance or resort to file and string encoding functions. We might still use the latter internally for foolproof escaping of templated files, but users should be able of writing plain scripts manually.

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Mar 23, 2021

@0x2b3bfa0 I think is not a great idea.
All the vendors enforces you to base64 is not nothing new. Also for the cml-runner command is going to be a mess

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Mar 23, 2021

Ideally, performing the following modification to the pertaining template line would be enough on the CML side:
${startup_script ? `startup_script = base64decode("${startup_script}")` : ''}
I thought it could be a good move for the user experience, but... 🤷‍♂️

@DavidGOrtega
Copy link
Contributor

@0x2b3bfa0 Its breaking CML but also basic terraform provider. Now the TPI is not accepting the base64 encoded script.

@DavidGOrtega
Copy link
Contributor

@0x2b3bfa0 one of the main reasons also is that base64 is far more robust than this heredoc. It's terribly painful to make this work through a template

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Jul 26, 2021

but users should be able of writing plain scripts manually.

Users have been able to use custom script in Azure and AWS in base64 so we are not worse!

@0x2b3bfa0
Copy link
Member Author

@DavidGOrtega, given that the provider still is in its early stages of development, I think that we can afford introducing this change without breaking anything.

As stated above, CML wouldn't have to use the here document syntax and can continue to provide scripts in Base64 to avoid string escaping issues.

@0x2b3bfa0 0x2b3bfa0 self-assigned this Jul 26, 2021
@0x2b3bfa0 0x2b3bfa0 added enhancement New feature or request ui/ux User interface/experience labels Jul 26, 2021
@0x2b3bfa0 0x2b3bfa0 added bug Something isn't working p1-important High priority and removed enhancement New feature or request labels Sep 4, 2021
@0x2b3bfa0
Copy link
Member Author

Follow-up of iterative/cml#168

Currently, resource_cml_runner takes Base64-encoded scripts, while resource_machine takes raw scripts. Independently of the approach we choose, it should be the same for both resources. I would prefer to modify the former to behave like the latter.

@omesser
Copy link
Contributor

omesser commented Apr 10, 2022

@0x2b3bfa0 @DavidGOrtega
Why not have both in a consistent, explicit manner - 2 separate fields: startup_script and startup_script_base64 .
If both fields are non-"" it will fail (no guessing!)

@0x2b3bfa0
Copy link
Member Author

Why not have both in a consistent, explicit manner - 2 separate fields

Because Terraform has an inbuilt base64decode function with this exact purpose.

@omesser
Copy link
Contributor

omesser commented Apr 10, 2022

Yeah (obviously 😬 ) but not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not (meaning: from the user).
b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Apr 10, 2022

not sure I follow what are you suggesting wrt the choice of whether to expect file contents encoded or not

I propose1 to accept raw scripts in both iterative_machine & iterative_cml_runner and drop support for Base64–encoded data.

Currently, iterative_machine expects raw scripts and iterative_cml_runner expects Base64–encoded scripts. 🙃

b64 codec'ing is easy both for the user and for us, so why take away the benefit of giving the user the flexibility to provide the script either base64 encoded or not?

Expecting only raw scripts doesn't “take away any benefits”, it just avoids polluting the API with a redundant argument. See the collapsed comment below for some examples.

Footnotes

  1. https://github.com/iterative/terraform-provider-iterative/issues/91#issuecomment-912951829, not that important since chore: dependancron update dependencies cml#461 anyway.

@0x2b3bfa0

This comment was marked as outdated.

@omesser
Copy link
Contributor

omesser commented Apr 11, 2022

@0x2b3bfa0 - due to this I suggest to think whether it's beneficial to drop the b64 encoded field support. Hence the suggestion to have both co-exist.
I can think of a few scenarios where it's worth to have this base64 support in place, I'm not sure how relevant they are though:

  • When programmatically generating tf configs and templating values into them, much easier to work and template in a b64 string
  • CLI support - If this can be populated in any scenario by an input var e.g. terraform apply -var 'X.Y.startup_script_base64=<stuff>

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Apr 11, 2022

When programmatically generating tf configs and templating values into them

See #91 (comment); startup_script = base64decode("{{startup_script_base64}}") is equally convenient. In fact, that comment proposes a verbatim drop–in replacement for our current template.

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Apr 11, 2022

CLI support - If this can be populated in any scenario by an input var e.g. terraform apply -var 'startup_script_base64=<stuff>'

When using terraform -var UTF–8 strings which don't contain \0 don't need Base64 encoding. Anyway, the inbuilt functions provide the required functionality:

variable "input" {
}

output "output" {
  value = base64decode(var.input)
}
terraform apply -var input="$(base64 input)"

@omesser
Copy link
Contributor

omesser commented Apr 11, 2022

With the risk of repeating the point a bit 😄

Agreed the bas64decode function exists and things can be decoded in various places and achieved either way.
The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.

I don't think that this

startup_script = base64decode("{{startup_script_base64}}")

is an equally convenient alternative to this:

startup_script_base64 = "{{startup_script_base64}}"

And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift here, it can be very common.

It might be worth "polluting the api" with another (optional) field.
Granted, this is opinionated, but do consider please - making lives easier for users with a clean interface they will actually look at vs 1 less optional field (what are the operational costs of this? for them? for us?).

Same thing in CLI -var input="$(base64 input)" vs -var input_base64="input" - first option is cumbersome and most people don't enjoy shell fun-times. This would have been much worse in a non-toy example (i.e. real contents in input) - escaping a script's content to put it as a raw string in a shell command is not a nice sight to behold (hence not a good practice), base64 is very often used as a catch-all to avoid escaping issues even though the contents can be properly escaped if you care to spend the time (ML engineers won't, right?).

This might save the users a few minutes scratching their heads building the tf config and the code/commands around it, and more importantly, keeping terraform configs as flat+simple as possible on the user end, so they can debug their inputs outside of terraform and get full easy visibility to what the provider gets as inputs / vars.
Also, there's the upside of eliminating ambiguity as per what does startup_script expect (raw or base64), due to the existence of startup_script_base64 - seeing both in the docs makes it all clear. Same goes, btw for startup_script_file - these are basically convenient wrappers. The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it

@casperdcl
Copy link
Contributor

casperdcl commented Apr 11, 2022

I'm also leaning towards this

  • startup_script: change to expect raw (affects CML)
  • startup_script_base64: new, expects base64 (CML would switch to using this)
  • error on both defined (conflicts)

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Apr 13, 2022

The point I'm making is about user experience - Giving users what they expect and keeping their syntax clean and flat.

If we're talking of existing Terraform power users, it looks like we have irreconcilliable differences with regard to their alleged expectations. ⚔️ 😅

I don't think that this [startup_script = base64decode("data")] is an equally convenient alternative to this [startup_script_base64 = "data"]

Neither do I: to my mind, the former is much more convenient both for users and for ourselves. 😈

And saving the user the need to convert their input is a good perk UX wise, when hitting a common use cases, and if I catch the drift1 here, it can be very common.

Providing a Base64–encoded script is not a valid2 use case, unless proven otherwise. It's just something we do internally on CML to avoid the need of escaping our runner templates.

If this argument accepted arbitrary binary data (as opposed to UTF–8 text), it would make sense to use Base64 as the only/preferred input format, but this is not the case.

This argument is not mapped to the cloud–init user data value read by cloud instances. It's just being injected into a bigger templated script, and providing binary data wouldn't have other effect beyond upsetting the shell interpreter:

{{- if .runner_startup_script}}
{{.runner_startup_script}}
{{- end}}

Moreover, the current approach is doubly inconvenient: the primary3 consumer of the iterative_cml_runner resource is the cml runner command, and this particular argument is exposed through the --cloud-startup-script command–line option.

Currently, this option “run[s] the provided Base64-encoded Linux shell script during the instance initialization”, so users have4 to execute cml runner --cloud-startup-script="$(base64 --wrap 0 <<< 'echo hello world')" and take care of encoding for no reason; that is bad user experience.

User Data

Just for reference, the official AWS provider accepts both forms, exactly like in your proposal, although other providers only take plain text.

Footnotes

  1. The best cloud infrastructure pun I have ever seen! 😂

  2. Replace with “common” if it sounds like a baseless value judgement

  3. And probably the only, by the way (insert tumbleweed emoji)

  4. See cml#536 (comment) and cml#772 (comment) for hitorical evidence

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Apr 13, 2022

The fact that 2 approaches were mixed means that there is place for ambiguity and it might be nice to reduce it

Indeed, it might be nice to reduce the ambiguity, by aligning their behavior 😄

@0x2b3bfa0 0x2b3bfa0 added resource-runner iterative_runner TF resource resource-machine iterative_machine TF resource labels Apr 24, 2022
@0x2b3bfa0
Copy link
Member Author

Given the status of the current API, this can be safely demoted to p2-nice-to-have and will be addressed tangentially when reworking the cml runner --cloud internals.

@0x2b3bfa0 0x2b3bfa0 added technical-debt Refactoring, linting & tidying p2-nice-to-have Low priority and removed bug Something isn't working p1-important High priority labels Oct 10, 2022
@dacbd
Copy link
Contributor

dacbd commented Oct 10, 2022

I think this can just be closed.

  • iterative_machine is essentially deprecated / will be significantly reworked if used when we make another go at the dvc machine ideas.
  • iterative_runner is not intended to be used by a user directly and this type of script logic can just be handled by cml when it builds it's main.tf internally.

@dacbd dacbd closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2022
@casperdcl
Copy link
Contributor

casperdcl commented Oct 12, 2022

I thought this issue about allowing cml runner launch --cloud-startup-script to be plaintext?

@dacbd
Copy link
Contributor

dacbd commented Oct 12, 2022

I thought this issue about allowing cml runner launch --cloud-startup-script to be plaintext?

even if it is, that shouldn't require any changes to tpi like has been discussed in this thread. There is already a related issue on cml to accept a file for that parameter. plaintext, base64, or file don't need different interfaces in the terraform scheme for the iterative_runner resource, they can all use the same one.

@casperdcl
Copy link
Contributor

iterative/cml#1167 indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-nice-to-have Low priority resource-machine iterative_machine TF resource resource-runner iterative_runner TF resource technical-debt Refactoring, linting & tidying ui/ux User interface/experience
Projects
None yet
Development

No branches or pull requests

5 participants