Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::ascii reform #19350

Closed
SimonSapin opened this issue Nov 27, 2014 · 2 comments
Closed

std::ascii reform #19350

SimonSapin opened this issue Nov 27, 2014 · 2 comments

Comments

@SimonSapin
Copy link
Contributor

Following up on #19194 and discussion with @aturon, I took a look at how things in the std::ascii module are used in the Rust repository and in Servo.

The std::ascii::Ascii type is a newtype of u8 that enforces (unless unsafe code is used) that the value is in the ASCII range, similar to char with u32 and the range of Unicode scalar values. [Ascii] is naturally a string of bytes entirely in the ASCII range.

Using the type system like this to enforce data invariants is interesting, but in practice [Ascii] is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)

Every single use of the Ascii type that I’ve found was only to use the to_lowercase or to_uppercase method, then immediately convert back to u8 or char.

Therefore, I suggest:

  • Moving the Ascii type as well as the AsciiCast, OwnedAsciiCast, AsciiStr, and IntoBytes traits into a new ascii Cargo package on crates.io
  • Marking them as deprecated in std::ascii, and removing them at some point before 1.0
  • Reworking the rest of the module to provide the functionality on u8, char, [u8] and str. Specifically:
    • Keep the AsciiExt and OwnedAsciiExt traits. (Maybe rename them?)
    • Implement AsciiExt on char and u8 (in addition to the existing impls for str and [u8])
    • Add is_ascii() -> bool. Maybe on AsciiExt? It’s mostly used on u8 and char, but it also makes sense on str and [u8].
    • Maybe is_ascii_lowercase, is_ascii_uppercase, is_ascii_alphabetic, or is_ascii_alphanumeric could be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with .is_ascii() && and the corresponding UnicodeChar method, which in most cases has an ASCII fast path.
    • I don’t think the remaining Ascii methods are valuable.
      • is_digit and is_hex are identical to Char::is_digit(10) and Char::is_digit(16).
      • is_blank, is_control, is_graph, is_print, and is_punctuation are never used.

How does this sound? I can help with the implementation work. Should this go through the RFC process?

@alexcrichton
Copy link
Member

@SimonSapin this sounds great to me, thanks for taking charge on this! For most major redesigns recently we tend to prefer pushing it through the RFC process to get comments rather than pushing a PR and getting comments, would you be ok writing an RFC for this?

@SimonSapin
Copy link
Contributor Author

Moving to rust-lang/rfcs#486.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants