Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_bytes_radix function #469

Closed
Dushistov opened this issue Oct 26, 2024 · 1 comment
Closed

Add from_bytes_radix function #469

Dushistov opened this issue Oct 26, 2024 · 1 comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@Dushistov
Copy link

Proposal

Problem statement

If you have bytes (&[u8]) that contain ASCII coded number you can NOT parse it directly with current stdlib.
You need to use str::from_utf8 before calling from_str_radix.

This is a useless conversion, because from_str_radix should check if input contains only digits (0-9, a-z, A-Z),
meaning it automatically validates if input is a valid utf-8 again.

And, in fact, internally from_str_radix works with bytes: https://github.com/rust-lang/rust/blob/80d0d927d5069b67cc08c0c65b48e7b6e0cdeeb5/library/core/src/num/mod.rs#L1477

So from ergonomic point of view you do the useless thing (call std::str::from_utf8 and handle its errors),
and from the speed optimization point of view you also have to do the useless thing.

Motivating examples or use cases

Need to convert &[u8] to i16,i32,etc. frequently occurs in
parsers of ASCII based protocols and file formats (like nmea or arinc-424 and so on). In these formats and protocols you need to convert ASCII digits to numbers.

For example, in rust nmea parser there is a conversion to utf-8 before start of any other parsing: https://github.com/AeroRust/nmea/blob/832895945a82f5248473d0809dca46d805541132/src/parse.rs#L187

However, NMEA is not a UTF-8-based protocol, it is just too complex to call str::from_utf8 for any byte range
that are coded numbers, so it is simpler to convert all input to utf-8 and only then parse it.

Solution sketch

Add <NumberType>::from_bytes_radix(src: &[u8], radix: u32) -> Result<NumberType, ParseIntError>.
This function almost already exists, the main part of from_str_radix deals with bytes, not utf-8 characters.

Links and related work

#287 suggests the similar thing, but much wider with an introduction of a new trait,
I suggest to make just one more function.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@Dushistov Dushistov added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Oct 26, 2024
@Amanieu
Copy link
Member

Amanieu commented Nov 5, 2024

We discussed this in the libs-api meeting. We're happy to add this method, but we feel that it should include ascii somewhere in the name, which is in line with other functions that treat bytes as ASCII characters.

@Amanieu Amanieu added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Nov 12, 2024
@Amanieu Amanieu closed this as completed Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

2 participants