-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify that malformed Unicode code points result in an error #420
Comments
Perhaps a worse facet of this in As a consequence, this returns env, _ := cel.NewEnv(cel.Variable("name", cel.StringType))
ast, _ := env.Compile("bytes(name) == b'\\x80'")
prg, _ := env.Program(ast)
out, _, err := prg.Eval(map[string]any{
"name": "\x80",
})
if err != nil {
log.Fatalln(err)
}
str, _ := out.ConvertToNative(reflect.TypeOf(true));
fmt.Println(str.(bool)) And this results in a runtime error1: env, _ := cel.NewEnv(cel.Variable("name", cel.StringType))
ast, _ := env.Compile("string(bytes(name))")
prg, _ := env.Program(ast)
out, _, err := prg.Eval(map[string]any{
"name": "\x80",
})
if err != nil {
log.Fatalln(err)
}
str, _ := out.ConvertToNative(reflect.TypeOf(""));
fmt.Println(str.(string)) Footnotes
|
Hi @hudlow, There's an expectation that strings are valid Unicode code points (I could spell out UTF-8 specifically, perhaps). From the langdef.md file:
Regarding, where invalid utf-8 strings are provided to the CEL runtime, CEL's position would be that this is undefined behavior. The cel-go behavior of evaluating I've filed google/cel-go#1093 to track the Golang issue. Is there a specific update to the language regarding UTF-8 string expectations? |
Actually, the language is probably pretty good since string literals accept escaped code points, and not escaped UTF-8 bytes. I think I would like the spec to be more explicit that a CEL parser ought to reject a string literal with invalid code points (versus it somehow only getting rejected during validation or evaluation). If you're amenable to this, I can try to draft a clarification. Thank you for the quick fix!
Okay, I think it makes sense to me that this is out of scope for the language definition because by providing invalid UTF-8 strings as string data, you're essentially providing corrupt data. Still, in terms of the I guess what I'm getting at is what the best practice for an implementation is with respect to ensuring that you don't inadvertently pass corrupt data from the implementing environment to the CEL environment. Thinking through it more though, since Golang doesn't have a native type for "only well-formed Unicode strings," I'm struggling to think of a way that you could catch the issue at compile time (for either the Go code or the CEL expression), and Unicode validation could be an enormous performance tax at runtime, especially considering that it is not particularly unlikely that your runtime values will be coming from a source which itself guarantees correct Unicode and only transiently be handled as Go strings that don't. Still, the lack of guardrails makes me twitch a little. Is there nothing more that can be done? |
Currently,
cel-go
suppresses errors by silently coercing bad code points to\ufffd
. As a result, this expression evaluates totrue
:In contrast,
cel-java
throws an explicit exception:I believe the Java behavior here is far more desirable, at least as a default, and should be the specified behavior.
The text was updated successfully, but these errors were encountered: