You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following up on #19194 and discussion with @aturon, I took a look at how things in the std::ascii module are used in the Rust repository and in Servo.
The std::ascii::Ascii type is a newtype of u8 that enforces (unless unsafe code is used) that the value is in the ASCII range, similar to char with u32 and the range of Unicode scalar values. [Ascii] is naturally a string of bytes entirely in the ASCII range.
Using the type system like this to enforce data invariants is interesting, but in practice [Ascii] is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)
Every single use of the Ascii type that I’ve found was only to use the to_lowercase or to_uppercase method, then immediately convert back to u8 or char.
Therefore, I suggest:
Moving the Ascii type as well as the AsciiCast, OwnedAsciiCast, AsciiStr, and IntoBytes traits into a new ascii Cargo package on crates.io
Marking them as deprecated in std::ascii, and removing them at some point before 1.0
Reworking the rest of the module to provide the functionality on u8, char, [u8] and str. Specifically:
Keep the AsciiExt and OwnedAsciiExt traits. (Maybe rename them?)
Implement AsciiExt on char and u8 (in addition to the existing impls for str and [u8])
Add is_ascii() -> bool. Maybe on AsciiExt? It’s mostly used on u8 and char, but it also makes sense on str and [u8].
Maybeis_ascii_lowercase, is_ascii_uppercase, is_ascii_alphabetic, or is_ascii_alphanumeric could be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with .is_ascii() && and the corresponding UnicodeChar method, which in most cases has an ASCII fast path.
I don’t think the remaining Ascii methods are valuable.
is_digit and is_hex are identical to Char::is_digit(10) and Char::is_digit(16).
is_blank, is_control, is_graph, is_print, and is_punctuation are never used.
How does this sound? I can help with the implementation work. Should this go through the RFC process?
The text was updated successfully, but these errors were encountered:
@SimonSapin this sounds great to me, thanks for taking charge on this! For most major redesigns recently we tend to prefer pushing it through the RFC process to get comments rather than pushing a PR and getting comments, would you be ok writing an RFC for this?
Following up on #19194 and discussion with @aturon, I took a look at how things in the
std::ascii
module are used in the Rust repository and in Servo.The
std::ascii::Ascii
type is a newtype ofu8
that enforces (unlessunsafe
code is used) that the value is in the ASCII range, similar tochar
withu32
and the range of Unicode scalar values.[Ascii]
is naturally a string of bytes entirely in the ASCII range.Using the type system like this to enforce data invariants is interesting, but in practice
[Ascii]
is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)Every single use of the
Ascii
type that I’ve found was only to use theto_lowercase
orto_uppercase
method, then immediately convert back tou8
orchar
.Therefore, I suggest:
Ascii
type as well as theAsciiCast
,OwnedAsciiCast
,AsciiStr
, andIntoBytes
traits into a newascii
Cargo package on crates.iostd::ascii
, and removing them at some point before 1.0u8
,char
,[u8]
andstr
. Specifically:AsciiExt
andOwnedAsciiExt
traits. (Maybe rename them?)AsciiExt
onchar
andu8
(in addition to the existing impls forstr
and[u8]
)is_ascii() -> bool
. Maybe onAsciiExt
? It’s mostly used onu8
andchar
, but it also makes sense onstr
and[u8]
.is_ascii_lowercase
,is_ascii_uppercase
,is_ascii_alphabetic
, oris_ascii_alphanumeric
could be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with.is_ascii() &&
and the correspondingUnicodeChar
method, which in most cases has an ASCII fast path.Ascii
methods are valuable.is_digit
andis_hex
are identical toChar::is_digit(10)
andChar::is_digit(16)
.is_blank
,is_control
,is_graph
,is_print
, andis_punctuation
are never used.How does this sound? I can help with the implementation work. Should this go through the RFC process?
The text was updated successfully, but these errors were encountered: