Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF-8 strings for read and lookup outputs while using ProtocolBuffer encoding #170

Open
jaeyeol-moloco opened this issue Dec 26, 2023 · 2 comments
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jaeyeol-moloco
Copy link

jaeyeol-moloco commented Dec 26, 2023

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
When I use ProtocolBuffer encoding, I'm frustrated by string value encoded in unreadable bytes.
For example, a Korean string "경동나비엔" is printed as "\352\262\275\353\217\231\353\202\230\353\271\204\354\227\224".

Describe the solution you'd like
I would like cbt to support UTF-8 string in read or lookup output.

Describe alternatives you've considered
I found that the unreadable byte sequence is from message.MarshalTextIndent()(link). If we use message.MarshalJSONIndent() instead, a UTF-8 string can be correctly printed like "경동나비엔". So it would be also good if cbt allows users to choose prototext or protojson as the output format. Then prototext will still output bytes in octal, but I can choose protojson to see UTF-string.

Additional context

@jaeyeol-moloco jaeyeol-moloco added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Dec 26, 2023
@jaeyeol-moloco jaeyeol-moloco changed the title Support UTF-8 strings while using ProtocolBuffer encoding Support UTF-8 strings for read and lookup outputs while using ProtocolBuffer encoding Dec 27, 2023
@jaeyeol-moloco
Copy link
Author

I think octal outputs for UTF-8 characters are intended according to https://protobuf.dev/reference/protobuf/textformat-spec/. So fixing the output for ProtocolBuffer format wouldn't be an option for this issue. In #171, I added one more format ProtocolBufferJSON for marshaling a protocol buffer value in JSON format which prints UTF-8 strings normally.

@jaeyeol-moloco
Copy link
Author

I realized that text format spec itself supports UTF-8, so I closed #171 and open a new PR #172 which changes the formatter package from https://github.com/jhump/protoreflect to https://pkg.go.dev/google.golang.org/protobuf/encoding/prototext.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
1 participant