You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The representation of engine-provided strings is not consistent, which makes it hard for plugins to get searching/matching correct when dealing with non-ASCII string.
Engine-provided strings are sent to the container using a Protobuf message, which is then unpacked by the Delphix wrapper code. This gives us string objects which we pass along to the plugin code as-is. According to the Protobuf documentation, these strings can be in the following formats:
If all of the characters in the string are ASCII-representable, then the string object will be of type str and will contain the ASCII-encoded bytes that represent the string.
If there is at least one non-ASCII-representable character in the string, then the string object can be in one of two types (it's not guaranteed which one we might get)
a) A unicode object, containing the characters in the string.
b) A str object, containing the UTF8-encoded bytes that represent the string
So, imagine a string that begins with the character ë. And, imagine a plugin wants to check that, indeed, the string begins with that character. You might think the plugin could just do this:
This will work fine for case (2a). But, it will not work for case (2b). After all, in case (2b) we've only got a str object. The str object does not contain characters, it contains bytes. So, the first two bytes here are c3-ab (the UTF-8 encoding for our character ë)
Also, there's no way for the re module to know what encoding might be in play. So, the re module cannot know that c3-ab should be interpreted as ë. So, for case (2b), the plugin would need to do something like this:
But, of course, this code does not work for case (2a). So, now the plugin needs to have special code to do different things for cases (2a) and (2b). For example, they could write a function like this that they call for every single string that they ever receive from the engine:
Describe the solution you'd like
The plugin shouldn't have to jump through hoops like the above just to do string searching. It'd be better if the Delphix wrappers could give a consistent string representation to the plugin.
I think the rules should be:
When the wrapper provides a string to the plugin, it will always supply a unicode string to the plugin. Never a str string.
When the plugin provides a string to the wrapper, the wrapper will accept either a unicode string, or an ASCII- or UTF8-encoded str string. (The wrapper already supports this)
Describe alternatives you've considered
Another alternative would be for the wrapper to always provide UTF8-encoded str objects. At least that would be consistent. However, this still makes searching/matching a bit cumbersome, since now the plugin needs to worry about encoding and decoding rules.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The representation of engine-provided strings is not consistent, which makes it hard for plugins to get searching/matching correct when dealing with non-ASCII string.
Engine-provided strings are sent to the container using a Protobuf message, which is then unpacked by the Delphix wrapper code. This gives us string objects which we pass along to the plugin code as-is. According to the Protobuf documentation, these strings can be in the following formats:
str
and will contain the ASCII-encoded bytes that represent the string.a) A
unicode
object, containing the characters in the string.b) A
str
object, containing the UTF8-encoded bytes that represent the stringSo, imagine a string that begins with the character
ë
. And, imagine a plugin wants to check that, indeed, the string begins with that character. You might think the plugin could just do this:This will work fine for case (2a). But, it will not work for case (2b). After all, in case (2b) we've only got a
str
object. Thestr
object does not contain characters, it contains bytes. So, the first two bytes here arec3-ab
(the UTF-8 encoding for our characterë
)Also, there's no way for the
re
module to know what encoding might be in play. So, there
module cannot know thatc3-ab
should be interpreted asë
. So, for case (2b), the plugin would need to do something like this:But, of course, this code does not work for case (2a). So, now the plugin needs to have special code to do different things for cases (2a) and (2b). For example, they could write a function like this that they call for every single string that they ever receive from the engine:
Describe the solution you'd like
The plugin shouldn't have to jump through hoops like the above just to do string searching. It'd be better if the Delphix wrappers could give a consistent string representation to the plugin.
I think the rules should be:
unicode
string to the plugin. Never astr
string.unicode
string, or an ASCII- or UTF8-encodedstr
string. (The wrapper already supports this)Describe alternatives you've considered
Another alternative would be for the wrapper to always provide UTF8-encoded
str
objects. At least that would be consistent. However, this still makes searching/matching a bit cumbersome, since now the plugin needs to worry about encoding and decoding rules.The text was updated successfully, but these errors were encountered: