Skip to content

Commit

Permalink
Rollup merge of rust-lang#93686 - dbrgn:trim-on-byte-slices, r=joshtr…
Browse files Browse the repository at this point in the history
…iplett

core: Implement ASCII trim functions on byte slices

Hi `````@rust-lang/libs!````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
  • Loading branch information
matthiaskrgr authored Feb 19, 2022
2 parents 2df232c + f7448a7 commit 460a8b4
Showing 1 changed file with 78 additions and 0 deletions.
78 changes: 78 additions & 0 deletions library/core/src/slice/ascii.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,84 @@ impl [u8] {
pub fn escape_ascii(&self) -> EscapeAscii<'_> {
EscapeAscii { inner: self.iter().flat_map(EscapeByte) }
}

/// Returns a byte slice with leading ASCII whitespace bytes removed.
///
/// 'Whitespace' refers to the definition used by
/// `u8::is_ascii_whitespace`.
///
/// # Examples
///
/// ```
/// #![feature(byte_slice_trim_ascii)]
///
/// assert_eq!(b" \t hello world\n".trim_ascii_start(), b"hello world\n");
/// assert_eq!(b" ".trim_ascii_start(), b"");
/// assert_eq!(b"".trim_ascii_start(), b"");
/// ```
#[unstable(feature = "byte_slice_trim_ascii", issue = "94035")]
pub const fn trim_ascii_start(&self) -> &[u8] {
let mut bytes = self;
// Note: A pattern matching based approach (instead of indexing) allows
// making the function const.
while let [first, rest @ ..] = bytes {
if first.is_ascii_whitespace() {
bytes = rest;
} else {
break;
}
}
bytes
}

/// Returns a byte slice with trailing ASCII whitespace bytes removed.
///
/// 'Whitespace' refers to the definition used by
/// `u8::is_ascii_whitespace`.
///
/// # Examples
///
/// ```
/// #![feature(byte_slice_trim_ascii)]
///
/// assert_eq!(b"\r hello world\n ".trim_ascii_end(), b"\r hello world");
/// assert_eq!(b" ".trim_ascii_end(), b"");
/// assert_eq!(b"".trim_ascii_end(), b"");
/// ```
#[unstable(feature = "byte_slice_trim_ascii", issue = "94035")]
pub const fn trim_ascii_end(&self) -> &[u8] {
let mut bytes = self;
// Note: A pattern matching based approach (instead of indexing) allows
// making the function const.
while let [rest @ .., last] = bytes {
if last.is_ascii_whitespace() {
bytes = rest;
} else {
break;
}
}
bytes
}

/// Returns a byte slice with leading and trailing ASCII whitespace bytes
/// removed.
///
/// 'Whitespace' refers to the definition used by
/// `u8::is_ascii_whitespace`.
///
/// # Examples
///
/// ```
/// #![feature(byte_slice_trim_ascii)]
///
/// assert_eq!(b"\r hello world\n ".trim_ascii(), b"hello world");
/// assert_eq!(b" ".trim_ascii(), b"");
/// assert_eq!(b"".trim_ascii(), b"");
/// ```
#[unstable(feature = "byte_slice_trim_ascii", issue = "94035")]
pub const fn trim_ascii(&self) -> &[u8] {
self.trim_ascii_start().trim_ascii_end()
}
}

impl_fn_for_zst! {
Expand Down

0 comments on commit 460a8b4

Please sign in to comment.