-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: Add FromByteStr
trait with blanket impl FromStr
#287
Comments
cc original reviewer @joshtriplett zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/addr_parse_ascii.20feature |
FromBytes feels like it might be too general of a name, it implies binary deserialization rather than textual parsing to me. |
Agreed after rereading it. I had |
FromBytes
trait with default impl to FromStr
FromBytes
/FromByteStr
trait with blanket impl FromStr
The lack of parsing working on My inclination is to approve this, but since it was just opened, I figure I'll give it a little time to bake to give others a chance to chime in. |
One complaint I’ve heard about |
That is an interesting thought. Do you mean a signature like this? trait FromByteStr<'a>: Sized {
type Err;
fn from_byte_str(bytes: &'a [u8]) -> Result<Self, Self::Err>;
}
struct Foo<'a>(&'a [u8]);
impl<'a> FromByteStr<'a> for Foo<'a> {
type Err = ();
fn from_byte_str(bytes: &'a [u8]) -> Result<Self, Self::Err> {
Ok(Self(bytes))
}
}
impl FromByteStr<'_> for i32 {
type Err = std::num::ParseIntError;
fn from_byte_str(bytes: &[u8]) -> Result<Self, Self::Err> {
std::str::from_utf8(bytes).unwrap().parse()
}
}
// ...I think static is correct here?
impl<T: FromByteStr<'static>> FromStr for T {
type Err = T::Err;
fn from_str(s: &str) -> Result<Self, Self::Err> {
todo!()
}
} That seems reasonable to me if we are okay with it not being a 1:1 matchup with I think that |
I like the naming direction of "from byte str", since it helps emphasize it's like |
FromBytes
/FromByteStr
trait with blanket impl FromStr
FromByteStr
trait with blanket impl FromStr
I updated the proposal to It seems like everyone is either neutral or in favor, so I'll make a PR for this in the next week just to see how well everything fits together. @shepmaster's question is probably the biggest thing to answer. |
cc @epage ; would this work for clap, as a basis for parsing user types without going through strings? |
We discussed this in today's @rust-lang/libs-api meeting. We think it's important to distinguish between two possible interpretations of this:
We do still need to make it clear that this is parsing text, not binary; Please consider those two cases, document the methods accordingly, and we'll re-evaluate it accordingly. Thank you. |
Speaking for myself here (not the team consensus): I think case 2 is something that there's pent-up demand for in the ecosystem, and it seems unlikely that we'd be able to enforce exclusively case 1. I think there's value in providing and embracing case 2. |
@joshtriplett maybe? clap would be working with
If someone made |
I'm mostly against this because I think that something better should be designed than
Overall, this motivates me to conclude that a more generic parsing trait would be a fit replacement for At one point I was trying to experiment with what a useful API would look like for this, and this is about as far as I got. It's not complete or battle-tested, which is why I haven't submitted any ACPs or RFCs yet, but it's probably a decent example of why I think this approach as-is isn't very desirable. |
@clarfonthey I'm not sure that direction is the right path for std. That's a ton of infrastructure and surface area. (I realize what you have is a draft, so I'm speaking in broad strokes here.) I think there's room for a simpler solution that gets us 80% of the way there in std. |
What I have is 100% a draft, and that's the main reason why I haven't shared it until now. ;) Honestly, the only reason I share it at all is because I want to demonstrate that an alternative is possible, it just needs some extra work. I mostly share what I have to demonstrate the kind of direction I was going in. You're right that the goal should be simple, and that's why I think the current version I have is insufficient. However, I think that there are a two simple operations which should be possible under a theoretical final version:
Neither of these is a particularly big ask for any particular implementation, and they don't provide the ability to create complex parser combinators on their own. However, they're both vital to a proper parser implementation, and I don't think that anyone writing their own parsers should have to reimplement libstd parsers (for things like IP addresses) or hack in weird cases just because the API isn't really compatible. To counter the API surface: I wouldn't consider these much more complex than iterators: the potential API surface is large, but it doesn't have to be. I think that an ideal API would be one that offers a relatively simple trait that's implemented by libstd types, but leaves out all the extra adapters and fancy methods to ecosystem crates. |
I decided to actually write up #296 to explain my design process behind the example I shared, particularly what features I felt would be useful to have from a potential design and why it was designed the way it was. If you follow the motivation, you'll get something very similar to what I designed, with a few potential changes. It mostly runs counter to this ACP since it explicitly explains why I don't think that adding differently-named copies of Ultimately, hopefully it'll help us decide what we finally want to do with |
While I'm unsure if I personally have a use case for With #296, you are effectively proposing taking on the core of a full, general purpose parser into the standard library. I feel like a case would need to be made for why this is important vs third-party, why a specific design is picked (especially a new design) compared to the different third-party parser trait designs, etc. As an outsider to the libs team, I assume this would need to be vetted in a third-party package and likely collaborated on with parser maintainers. This feels like a completely different beast than |
The thing is, what exactly does Speaking of which… I believe that most things that implement Standalone methods for IP addresses and such would also be nice. |
Proposal
Problem statement
Many data forms that can be parsed from a string representation do not need UTF-8. Here,
FromStr
is unnecessarily restrictive because a byte slice&[u8]
cannot be parsed directly. Instead, a UTF-8&str
must be constructed to use.parse()
.This is inconvenient when working with any raw buffers where one cannot assume that
str::from_utf8
will be successful, nor is there any reason to incur the UTF-8 verification overhead. An example is IP addresses, for which there is an unstablefrom_bytes
function: rust-lang/rust#101035Motivating examples or use cases
Any input data where UTF-8 cannot be guaranteed:
stdin
, file paths, data fromRead
, network packets,no_std
without UTF tables, any data read one byte at a time, etc.Any output data that doesn't require specific knowledge of UTF-8: integers, floating point, IP/socket addresses, MAC addresses, UUIDs, etc.
Solution sketch
Add a trait that mirrors
FromStr
but works with&[u8]
byte slices:This will get a corresponding
parse
on&[u8]
Since
&str
is necessarily represented as&[u8]
, we can provide a blanket impl so no types need to be duplicated:If this is done, almost all types in
std
that implementFromStr
will be able to switch to aFromByteStr
implementation:NonZeroX
IpAddr
andSocketAddr
(Tracking Issue for addr_parse_ascii feature rust#101035)Alternatives
TryFrom
- this was decided against in the case ofIpAddr
, see Support parsing IP addresses from a byte string rust#94890 (comment)FromBytes::from_bytes
:from_byte_str
is proposed instead to make this clear that this parses text-encoded data, as opposed to binary serialization (ACP: AddFromByteStr
trait with blanket implFromStr
#287 (comment) and ACP: AddFromByteStr
trait with blanket implFromStr
#287 (comment))FromAscii
and place it instd::ascii
: if this name were selected, users may expect this to parse something like[ascii::Char; N]
rather than&[u8]
. I don't think we want this since&[u8] -> &[ascii::Char]
requires a validation step, and most implementations should be able to just raise an error if input is invalid.Open Questions
Err
be able to reference the source bytes? (ACP: AddFromByteStr
trait with blanket implFromStr
#287 (comment) and its followup)Error
rather thanErr
? This isn't consistent withFromStr
but is more consistent withTryFrom
andTryInto
, as well as the rest of the ecosystem (whereErr
is usually onlyResult::Err
,Error
is an error type)Links and related work
IpAddr::from_bytes
Tracking Issue for addr_parse_ascii feature rust#101035 and its pre-implementation discussion Support parsing IP addresses from a byte string rust#94890unsafe
needs in conversions #179What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: