Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length validators: Support all unicode characters #3211

Merged
merged 5 commits into from
Sep 29, 2023
Merged

Length validators: Support all unicode characters #3211

merged 5 commits into from
Sep 29, 2023

Conversation

DybekK
Copy link
Contributor

@DybekK DybekK commented Sep 27, 2023

Follow up to #3201. This PR adds optional support for counting unicode characters during validation.

-added optional parameter during validation
@adamw
Copy link
Member

adamw commented Sep 27, 2023

There are some compilation failures. Some user-facing scaladoc on what the parameter means would also be nice.

Can you maybe also share what the specs (RFC) say about emojis as query/header/path values? Are they allowed, should they be encoded?

Copy link
Member

@kciesielski kciesielski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming the PR "Length validators: Support all unicode characters", because "Add support for unicode characters" doesn't really say where exactly we're adding this support, and it suggest that unicode wasn't supported at all.

@DybekK
Copy link
Contributor Author

DybekK commented Sep 28, 2023

The RFC 3986 specifies that URIs should consist of a limited set of characters: basic Latin alphabet letters, digits, and a few special characters. It doesn't explicitly mention emojis, but according to Section 2.1 of RFC 3986 any character outside the allowed set should be percent-encoded, meaning that emojis can be a part of URI syntax when they're properly encoded.

@DybekK DybekK changed the title Add support for unicode characters Length validators: Support all unicode characters Sep 28, 2023
@kciesielski
Copy link
Member

Emojis can be encoded, but then we get for example %F0%9F%98%8D to represent "😍" in an URI, so length of such text should equal 12, because for URI we treat it as length in ascii characters, right?

@adamw
Copy link
Member

adamw commented Sep 28, 2023

@DybekK thanks, I would put that in the docs - that the only scenario where this should be used is for body validators, where one may wish to count code points instead of characters (maybe giving an example of emojis ;) )

@adamw
Copy link
Member

adamw commented Sep 28, 2023

@kciesielski ah good point, we validate decoded strings. So this is also useful for queries etc., just depends on what you want to count.

MaxLength and MinLength validators have been extended with custom this, copy, apply methods in order to maintain binary compatibility after adding new field
@adamw adamw merged commit b0213bb into softwaremill:master Sep 29, 2023
@adamw
Copy link
Member

adamw commented Sep 29, 2023

Thanks!

@DybekK DybekK deleted the support-unicode branch September 29, 2023 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants