Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interpolation #4733

Open
anderseknert opened this issue Jun 3, 2022 · 11 comments
Open

String interpolation #4733

anderseknert opened this issue Jun 3, 2022 · 11 comments

Comments

@anderseknert
Copy link
Member

anderseknert commented Jun 3, 2022

Having string interpolation similar to Python f-strings in Rego would allow for more succint policy, while arguably improving readability by removing the ceremony associated with sprintf.

Another issue potentially addressed by this would be the common mistake of not handling undefined when referencing input/data in sprintf arguments. These will halt evaluation as any other undefined reference, but in the likely most common use case for sprintf — building a "message" or "reason" string to populate a partial rule set, whether a value is undefined or not is likely not interesting enough that you'd want it to affect the actual outcome of the evalutation.

deny[reason] {
    not "developer" in input.user.groups

    # fails if input.user.name is undefined, and the anonymous user will not be denied
    # this is likely not what the policy author intended
    reason := sprintf("%v must have role 'developer'", [input.user.name])
}
# less ceremony, and always evaluates if conditions in the body holds
# interpolation step could handle undefined (similar to `print`)
deny["{input.user.name} must have role 'developer'"] {
    not "developer" in input.user.groups
}

This would arguably improve string handling not just with regards to sprintf, but improve things like simple concatenation, which is currently somewhat verbose:

name := concat(", ", [input.first_name, input.last_name])

# vs.

name := "{input.first_name}, {input.last_name}"

Some design decisions would obviously need to be made here, like how to differentiate an "interpolation string" from a regular one, whether (simple?) expressions should be allowed in the interpolated values or only direct references, and perhaps there could be an "interpolation form" where undefined does fail evaluation, etc..

I don't have all the answers, but here's at least a starting point :)

@ericjkao
Copy link
Contributor

ericjkao commented Jun 6, 2022

+1 on the benefits of both readability and having a form which tolerates undefined input!

Here are a couple considerations/opinions I'd like to write down which might help us in the design process (Anders probably thought through already).

  • Since all built-in functions currently (I think) return undefined if any argument is undefined, it is a good idea to introduce this as a syntax which is not a function. So we can continue to make the sharp statement that all built-in functions behave tis way.
  • For backwards compatibility, we would probably want a new syntax that is not the same as a regular string today.

Brainstorming syntaxes:

  • f"{input.user.name} must have role 'developer'" (Python f-string) Con: A bit incongruous to suddenly have the concept of prefixed literal in Rego? Are there other areas we would like to use prefixed literal in Rego?
  • '{input.user.name} must have role \'developer\'' (Use '-enclosed literal) Con: Deviates from convention in other languages, where ' is either the same as " or does LESS interpolation than ".
  • `{input.user.name} must have role 'developer'` (Use `-enclosed literal) Con: complex expressions or even deep references can make the string hard to read)
  • "%s must have role %s" % (input.user.name, role) (Python % syntax) Con: does not allow the expression in the string itself.
  • "{{input.user.name}} must have role {0} or {1}" % (role1, role2) (Extension of python % syntax) Con: Requires the introduction of a ()-tuple in Rego which may complicate other things. The overloading of the % operator may make implementation of compiler/fmt/linter/syntax highlighting more complex?
  • "{{input.user.name}} must have role {role1} or {role2}" % {"role1": role1, "role2": role2} (Extension of python % syntax) Con: Does not allow the use of unnamed {} or {0} or %v.

Update: reference to syntax from other languages: https://petamind.com/string-interpolation-in-different-programming-languages/
Also PHP: "${year}-${month}-${day}"
Also Ruby: "#{year}-#{month}-#{day}"

@anderseknert
Copy link
Member Author

Awesome feedback, @EKCs! I had not given too much thought to syntax before, so your brainstorming was much appreciated. Some spontaneous thoughts on the above:

For backwards compatibility, we would probably want a new syntax that is not the same as a regular string today.

Or we could simply require import future.interpolation (or whatever we'll call it)? We'd probably not want all strings to be interpolated (i.e none of those provided in the input by the user), but maybe any string literal / constant defined in policy? If not, I think I'm leaning towards the first option, i.e. a prefix to distinguish these strings from others. I'd definitely prefer if the values are embedded directly in the string. That to me makes it feel more natural that undefined isn't going to halt evaluation, since technically "{input.foo}" is still just a string, while input.foo is a direct reference.

Are there other areas we would like to use prefixed literal in Rego?

I could imagine one option to be how undefined should be handled in interpolation, i.e. if it should be printed or halt evaluation.

As for the backticks option, that actually exists in Rego already :) Used for multiline strings and regex.

@srenatus
Copy link
Contributor

srenatus commented Jun 7, 2022

We'd probably not want all strings to be interpolated (i.e none of those provided in the input by the user), but maybe any string literal / constant defined in policy?

💯 Definitely only string literals.

My preferred way here would be to not introduce new string delimiters, but go with some import future thing like @anderseknert proposed. But I also doubt that anyone has been using {foo} is {bar} or ${foo} is ${bar} in polices right now, without interpolation; so the path forward should not be breaking many users. (Meaning maybe we can even not do this via future import.) WDYT?

@anderseknert
Copy link
Member Author

Yeah, I agree that it's very unlikely. But maybe better to play it safe? I don't have a strong opinion on that. One concern with having this enabled for all string literals is if this would have any noticeable impact on performance? I doubt the scan is going to be expensive, but maybe there are some policies with a lot of strings defined? For a large JSON string, there's going to be a lot of { in there, but I guess the interpolation would just stop at the next character anyway, since it's going to be a " (+/- whitespace). Probably negligible.

@srenatus
Copy link
Contributor

srenatus commented Jun 7, 2022

🤔 Well this might be a good point for using something that's not valid JSON.

I.e. if we used foo is ${input.foo}, we'd not have to worry about large JSON strings.

I'm mostly thinking that at some point, I'm confident we'll want to put more than plain refs into that... print("is foo in xs: ${"foo" in xs}") (oh nested "... eek...)

Thinking more about this, maybe we should pick a common implementation (JS? Ruby? Python?), and mimic that closely. For one thing, much thought has probably already been put into this; for another thing, we'll be able to say "it just like JS" 🙃

@anderseknert
Copy link
Member Author

Sounds like a good way forward! And yeah, I can definitely see how people would want to have at least some simple logic included in interpolation. I think resolving refs would be a very good first step though! Thinking more about it, it's probably not very common to have large JSON strings in literals 🤔 The only use case I could think of is tests, but even there there aren't many reasons I can think of to do that instead of providing deserialized objects directly.

Hah! Had not thought of nested " :D I suppose having to escape those would be OK, or simply assign "foo" to a var outside of the string and reference that.

Backtick-strings could also remain "uninterpolated", so that there's an alternative to use if someone really wants to have {input.foo} interpreted literally.

@srenatus
Copy link
Contributor

srenatus commented Jun 7, 2022

Thinking more about it, it's probably not very common to have large JSON strings in literals 🤔

I think JWKS are the exception here. I think some built-ins expect their JWK in string-of-json format 😬

@anderseknert
Copy link
Member Author

That's true, but I wonder how common it is to have those hard coded in the policy, as opposed to provided either via http.send or somewhere in data?

@stale
Copy link

stale bot commented Jul 7, 2022

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Jul 7, 2022
@anderseknert
Copy link
Member Author

Bumping for 1.0 @ashutosh-narkar @johanfylling :)

Most of the concerns raised about this have either been about backwards compatibility (i.e. can't use existing string syntax) or about introducing new syntax. Neither should be a problem for a 1.0 release.

My vote is on using existing string syntax for this, and something like "foo is {foo}" to reference a var.

Copy link

stale bot commented Nov 17, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@stale stale bot added the inactive label Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants