std::ascii reform #19350

SimonSapin · 2014-11-27T00:05:47Z

Following up on #19194 and discussion with @aturon, I took a look at how things in the std::ascii module are used in the Rust repository and in Servo.

The std::ascii::Ascii type is a newtype of u8 that enforces (unless unsafe code is used) that the value is in the ASCII range, similar to char with u32 and the range of Unicode scalar values. [Ascii] is naturally a string of bytes entirely in the ASCII range.

Using the type system like this to enforce data invariants is interesting, but in practice [Ascii] is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)

Every single use of the Ascii type that I’ve found was only to use the to_lowercase or to_uppercase method, then immediately convert back to u8 or char.

Therefore, I suggest:

Moving the Ascii type as well as the AsciiCast, OwnedAsciiCast, AsciiStr, and IntoBytes traits into a new ascii Cargo package on crates.io
Marking them as deprecated in std::ascii, and removing them at some point before 1.0
Reworking the rest of the module to provide the functionality on u8, char, [u8] and str. Specifically:
- Keep the AsciiExt and OwnedAsciiExt traits. (Maybe rename them?)
- Implement AsciiExt on char and u8 (in addition to the existing impls for str and [u8])
- Add is_ascii() -> bool. Maybe on AsciiExt? It’s mostly used on u8 and char, but it also makes sense on str and [u8].
- Maybe is_ascii_lowercase, is_ascii_uppercase, is_ascii_alphabetic, or is_ascii_alphanumeric could be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with .is_ascii() && and the corresponding UnicodeChar method, which in most cases has an ASCII fast path.
- I don’t think the remaining Ascii methods are valuable.
  - is_digit and is_hex are identical to Char::is_digit(10) and Char::is_digit(16).
  - is_blank, is_control, is_graph, is_print, and is_punctuation are never used.

How does this sound? I can help with the implementation work. Should this go through the RFC process?

The text was updated successfully, but these errors were encountered:

alexcrichton · 2014-11-27T02:21:28Z

@SimonSapin this sounds great to me, thanks for taking charge on this! For most major redesigns recently we tend to prefer pushing it through the RFC process to get comments rather than pushing a PR and getting comments, would you be ok writing an RFC for this?

SimonSapin · 2014-11-27T13:21:03Z

Moving to rust-lang/rfcs#486.

nodakai mentioned this issue Nov 27, 2014

Improve the AsciiStr trait #17790

Closed

SimonSapin closed this as completed Nov 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::ascii reform #19350

std::ascii reform #19350

SimonSapin commented Nov 27, 2014

alexcrichton commented Nov 27, 2014

SimonSapin commented Nov 27, 2014

std::ascii reform #19350

std::ascii reform #19350

Comments

SimonSapin commented Nov 27, 2014

alexcrichton commented Nov 27, 2014

SimonSapin commented Nov 27, 2014