-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: str
methods that accept/return byte offsets should have byte
in the name
#10044
Comments
(Or maybe these methods could go away and we could require users to call |
The current string methods check that they're only operating on byte-boundaries, which From a satefy point-of-view, what about defining a new-type wrapper around struct ByteIndex {
priv idx: uint
}
impl ByteIndex {
/* unsafe? */ fn new(idx: uint) -> ByteIndex { idx: idx }
}
fn slice(&'a str, from: ByteIndex, to: ByteIndex) -> &'a str { ... }
fn find_char(&'a str, c: char) -> Option<ByteIndex> { ... }
struct CharRange {
ch: char,
next: ByteIndex
}
fn char_range_at(&str, idx: ByteIndex) -> CharRange { ... } so one either gets a "blackbox" index from functions like Cons: it makes manipulating strings by hand more complicated. |
Would there be merit to creating a StringIndex type which is either a ByteIndex or CharIndex and getting rid of all the public _byte/_char variants? |
That might be a nice simplification/unification, but that seems like it might be a source of hidden performance problems if everything returns/takes (Another downside: using an enum would knock it over one word, which would also be slower.) |
I second the sentiment of putting let s = "ë"; // two combined code points
s.byte_len(); // length of the string in bytes (3)
s.char_len(); // length of the string in code points (2)
s.rune_len(); // length of the string in grapheme clusters (1) All of these have their uses, and assuming that we're going to provide them all it doesn't make sense to silently and misleadingly bless the |
FWIW, Go uses "rune" to be what we call |
(NB. |
@huonw: |
@kballard: The |
@kballard with optimisations LLVM will likely perform SROA (scalar replacement of aggregates, i.e. split a struct into its fields) and optimise each part separately, including the actual construction. (Also, the memory cost (The only performance problem might be that |
@thestinger Oh those are iterators? Then that's fine. |
@kballard: Yeah, there are still some allocating methods left around but the iterator ones recently had |
This issue should be addressed (if it hasn't already) as part of the stabilization of cc @aturon |
This thread isn't the place to track this kind of thing today. Many of these methods have already undergone stabilization, and for those that haven't, that's the correct place. So I'm giving this a close. If anyone still feels strongly about this, please persue in the RFCs repo, thanks. |
Fix overflow ICE in large_stack/const_arrays Change `maximum_allowed_size` config variable type from `u64` to `u128`, and converting total array sizes to `u128` to avoid overflow during multiplication. Fixes rust-lang#10044 changelog: Fix: [`large_const_arrays`] and [`large_stack_arrays`]: avoid integer overflow when calculating total array size rust-lang#10103
People shouldn't accidentally write code that works only for ascii input and 'surprisingly' fails when users input non-ascii-only text, so I figure the byte-offset methods shouldn't have the shorter names or be the more obvious choice than the char-offset methods.
So, like,
len
=>byte_len
,slice
=>slice_bytes
, but alsofind
=>find_byte_pos
or something.Of course all the docs already mention that offsets are given in bytes, but it doesn't always occur to me that I might even be passing the wrong kind of offset to make me look it up in the first place.
The text was updated successfully, but these errors were encountered: