Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify behavior of strings with invalid UTF-8 byte sequences #13314

Merged

Conversation

HertzDevil
Copy link
Contributor

Includes a blanket statement that Regex might not like Strings with invalid UTF-8 byte sequences (not just PCRE and PCRE2, but also any future replacement if the day ever comes), and encourages people to use #scrub if applicable.

@@ -5254,11 +5261,17 @@ class String
# Returns the underlying bytes of this String.
#
# The returned slice is read-only.
#
# May contain invalid UTF-8 byte sequences; `#scrub` may be used to first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest being more explicit.

Suggested change
# May contain invalid UTF-8 byte sequences; `#scrub` may be used to first
# WARNING: May contain invalid UTF-8 byte sequences; `#scrub` may be used to first

def to_slice : Bytes
Slice.new(to_unsafe, bytesize, read_only: true)
end

# Returns a pointer to the underlying bytes of this String.
#
# May contain invalid UTF-8 byte sequences; `#scrub` may be used to first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@straight-shoota straight-shoota added this to the 1.8.0 milestone Apr 13, 2023
@straight-shoota straight-shoota merged commit 2c9dba7 into crystal-lang:master Apr 13, 2023
@HertzDevil HertzDevil deleted the doc/string-invalid-utf8 branch April 14, 2023 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants