Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consistency of from_bytes, try_from_bytes, from_str, try_from_str #4931

Closed
1 of 4 tasks
sffc opened this issue May 23, 2024 · 28 comments
Closed
1 of 4 tasks

Fix consistency of from_bytes, try_from_bytes, from_str, try_from_str #4931

sffc opened this issue May 23, 2024 · 28 comments
Labels
C-meta Component: Relating to ICU4X as a whole S-medium Size: Less than a week (larger bug fix or enhancement)

Comments

@sffc
Copy link
Member

sffc commented May 23, 2024

In our docs there are 95 and 63 hits for from_bytes and try_from_bytes (the former includes counts from the latter), and there's also a mix of from_str and try_from_str.

We should try to fix this in 2.0. I suggest:

  • By default, fallible conversions (they are usually fallible) should have try_from_bytes and FromStr::from_str. In the cases where they are not fallible, from_bytes is fine.
  • Special cases can be discussed when they are encountered.

Alternatively, we could rename everything to try_from_utf8 instead of try_from_bytes.

EDIT: I think my preference has changed to also include try_from_str constructors. See discussion.

Feedback needed from:

@sffc sffc added C-meta Component: Relating to ICU4X as a whole discuss Discuss at a future ICU4X-SC meeting labels May 23, 2024
@sffc sffc added this to the ICU4X 2.0 milestone May 23, 2024
@sffc sffc added this to icu4x 2.0 May 23, 2024
@sffc sffc moved this to Needs discussion to unblock in icu4x 2.0 May 23, 2024
@robertbastian robertbastian self-assigned this May 23, 2024
@robertbastian
Copy link
Member

robertbastian commented May 23, 2024

I think FromStr::from_str should dictate naming. It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I like the TinyAsciiStr model: it has a FromStr::from_str, a pub const fn from_str(s: &str) -> Result<Self>, which shadows the trait method, and an analogous from_bytes. Having an inherent from_str shadowing the trait is good for two reasons: it can be const, which the trait method cannot be, and it can be called without importing FromStr (which is not in the prelude). I think this method should not be called try_from_str, because it is FromStr::from_str. Giving it a second name is confusing, and it removes the shadow, so callers would have to either use the different name, or import core::str::FromStr.

So my suggestion is:

  • Types with fallible conversions
    • Implement (const) fn from_bytes(&[u8]) -> Result<Self, E> and FromStr
    • If conversion can be const, add an inherent const fn from_str
    • Maybe add an inherent fn from_str anyway, for calling convenience
  • Types with infallible conversions
    • Implement (const) fn from_str(&str) -> Self, (const) fn from_bytes(&[u8]) -> Self
    • Maybe implement FromStr with Err = Infallible
  • Types will either have fallible or infallible conversions, so these won't clash

These methods would have to be renamed to follow that scheme:

  • icu::locid::{lots of types}::try_from_bytes (2.0-breaking)
  • icu::timezone::{CustomTimeZone, GmtOffset}::try_from_bytes (2.0-breaking)
  • zerovec::ZeroSlice::try_from_bytes (util)

These methods should be FromStr::from_str anyway, will send a PR:

  • icu::experimental::relativetime::provider::SingularSubPattern::try_from_str (experimental, provider)
  • icu_pattern::Pattern::try_from_str (util, unreleased)

@Manishearth
Copy link
Member

I roughly agree with Robert's stance with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type (so the "name something the same to get it const" is not that great, for me)

@sffc
Copy link
Member Author

sffc commented May 24, 2024

I strongly prefer using try_ for fallible non-trait methods.

It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I consider FromStr to be a bit legacy not only because of the lack of try_ but also because its error type is called Err instead of Error as used in newer traits like TryFrom, whose convention I prefer to follow.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

Anecdotally, I have found it misleading when the only constructor for various ICU4X types is impl FromStr. I would rather look at the docs and see a try_from_... and then I instantly know that's how to construct the thing.

const fn from_str is interesting; I hadn't thought of that case in the OP. I think I prefer try_from_str because the function shows up in our docs as a constructor.

@sffc
Copy link
Member Author

sffc commented May 24, 2024

with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type

@Manishearth If you hold this position, should we reconsider #4590?

@sffc
Copy link
Member Author

sffc commented May 24, 2024

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

How about this revised proposal:

  • Fallible String Constructors:
    • impl FromStr
    • pub fn try_from_bytes -- const if possible
    • pub fn try_from_str for discoverability -- const if possible.
  • Infallible String Constructors:
    • impl FromStr with type Err = Infallible
    • pub from from_bytes -- const if possible
    • Do not implement from_str because the signature is different than the trait function. May optionally consider a name such as new_from_str or from_id or something.

@robertbastian
Copy link
Member

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

I'm fine not implementing FromStr in the infallible case.

@robertbastian
Copy link
Member

Actually I'm wholly against FromStr for infallible. from_str should be an inherent method, and it should not implement the trait method because the signature is different than the inherent function, we don't have infallible unwrapping (so this will be annoying to use), and there is usually no "parsing", just rewrapping (cf UnvalidatedStr), so this being available through str::parse is weird.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

We actually use from_str in our docs a lot. From these three declarations, I vastly prefer the first:

  • let foo = Foo::from_str("foo")?;
  • let foo = "foo".parse::<Foo>()?;
  • let foo: Foo = "foo".parse()?;

@zbraniecki
Copy link
Member

I think try_from_utf8 is slightly better than try_from_bytes as it makes a normalized API space for utf8/utf16 (which is what we'll want in some hot paths like Locale parsing) and aligns with https://doc.rust-lang.org/std/str/fn.from_utf8.html .

Suggestion:

  • try_from_utf8 / try_from_utf16 + TryFrom<&[u8]> / TryFrom<&[u16]>
  • from_utf8 / from_utf16 + From<&[u8]> / From<&[u16]>
  • FromStr + TryFrom<&str>/From<&str>

We don't need to implement all of them for all constructors - just normalize the naming scheme so that we can introduce the ones we find useful incrementally.

@robertbastian robertbastian removed their assignment May 24, 2024
@robertbastian
Copy link
Member

+1 on using _utf8 instead of _bytes.

@Manishearth
Copy link
Member

@Manishearth If you hold this position, should we reconsider #4590?

It's a very weakly held position, I think #4590 is okay. I'm also fine with the proposal here for similar reasons.

@zbraniecki
Copy link
Member

I'm working on locid now in context of icu_preferences, and once I'm done I'd like to unify and improve docs on handling of to/from u8/u16/str for all subtags for preparation for locid &[u16] handling.

Any thoughts on my proposal several comments above?

@sffc
Copy link
Member Author

sffc commented May 30, 2024

  • @hsivonen - I think the try prefix makes sense when fallible. I think utf8 makes sense for consistency with the standard library and pairing with utf16.
  • @robertbastian - I think, from a purity point, it should be utf8, but it seems like it would make it less discoverable.
  • @robertbastian not a fan of duplicating methods, try_from_str would be the same method as from_str
  • @sffc i think FromStr is nice to have. I don't actually like it much since it's not in the prelude and it doesn't have the try prefix
  • @robertbastian i agree from_str is bad. we should use it as little as possible, and this includes not using it in examples. I would be happy with this if we always used .parse or try_from_str but never from_str.
  • @sffc also don't like that it's undiscoverable
  • @Manishearth: would really like rust to allow you to "promote" trait methods to inherent (in docs and in resolution behavior)
  • @sffc - I'm a bit worried about making utf16 a requirement; it seems like it would be an implementation challenge that isn't jusatified all the time
  • @robertbastian - I don't think it needs to be a requirement; we should add it if we want to
  • @nekevss - It might be easy sometimes but it depends on the architecture of the parse function
  • @sffc - Should we include TryFrom<[u8]> as @zbraniecki suggested?
  • @robertbastian - It's not clear from such an impl that it expects utf-8. Also, try_into is not a very ergonomic method, so if it's called as Foo::try_from(&[u8]), they should just call try_from_utf8
  • @sffc - TryFrom<[u8]> implies that there is one way to build this from a [u8], but we have multiple ways, such as Postcard deserialization.
  • @robertbastian - FromStr gives us parse, which some users might expect, what does TryFrom<[u8]> give us? I don't see a use case for using it generically
  • @hsivonen - Is it useful for using generically, like if you have potentially ill-formed UTF-8 or UTF-16?
  • @sffc - In such a case, we should implement TryFrom<PotentialUtf8>. We can add this later, perhaps.

Concrete proposal:

All types that are fallibly created from a string have the following functions:

  • try_from_str -> Result<Self>
  • try_from_utf8 -> Result<Self>
  • FromStr::from_str -> Result<Self> (only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)
  • Optional: try_from_utf16 -> Result<Self> (only if the impl would benefit from such a function)

All types that are infallibly created from a string have the following functions:

  • from_str -> Self
  • from_utf8 -> Self
  • Optional: from_utf16 -> Self (only if the impl would benefit from such a function)

LGTM: @hsivonen @sffc @Manishearth @robertbastian

@sffc
Copy link
Member Author

sffc commented Jun 4, 2024

I didn't realize I signed off on the "(only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)" -- I agree with the "only ever called through parse in documentation" but not necessarily the "only if it can be done without turbofishes"

@zbraniecki
Copy link
Member

LGTM with the same alternation as @sffc pointed ou in the last comment. I'm not convinced that turbofish specifier is bad enough to have a blank rule against it in docs.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jun 6, 2024
@sffc sffc moved this from Needs discussion to unblock to Unclaimed for sprint in icu4x 2.0 Jun 6, 2024
zbraniecki added a commit that referenced this issue Jun 10, 2024
Introduces TinyAsciiStr constructors from utf16 and converges on the
consensus from #4931.

---------

Co-authored-by: Robert Bastian <[email protected]>
@robertbastian
Copy link
Member

What about this pattern:

pub struct Era(pub TinyStr16);

impl From<TinyStr16> for Era {
    fn from(x: TinyStr16) -> Self {
        Self(x)
    }
}

impl FromStr for Era {
    type Err = <TinyStr16 as FromStr>::Err;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        s.parse().map(Self)
    }
}

Both the impl From<TinyStr> as well as the impl FromStr are kind of pointless, as TinyStr already has try_from_utf8/try_from_utf16/try_from_str/from_str, and the field is public.

@robertbastian robertbastian self-assigned this Jun 19, 2024
robertbastian added a commit that referenced this issue Jun 20, 2024
#4931

Covers all types that had a `try_from_bytes`.

There are still a lot of types that implement `FromStr`, but don't have
any inherent methods. Those will be next.
robertbastian added a commit that referenced this issue Aug 13, 2024
@robertbastian robertbastian removed their assignment Sep 17, 2024
@robertbastian
Copy link
Member

This is done for all stable components and utils, except for the datetime and plurals reference and runtime modules.

@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Sep 17, 2024
@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

@zbraniecki WDYT about doing the facelift to those modules?

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

Also in 2.0 (can be stretch), apply this style to all stringy APIs such as canonicalize, normalize, ...

@zbraniecki
Copy link
Member

I think we should align them with the decision here.

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label Sep 21, 2024
@sffc
Copy link
Member Author

sffc commented Oct 3, 2024

  • @sffc I think the reason I had this on the agenda was that I think we should fix all functions that take strings, not just try_from functions
  • @robertbastian - yeah some things like normalize accept AsRef<[u8]> to be generic over &[u8] and &str, but that's not great, both documentation-wise, and because the lifetime gets lost through AsRef
  • @sffc And how about the plurals::reference and datetime::reference modules?
  • @robertbastian If they are truly datagen-only, I don't think we should touch them
  • @sffc They are doc(hidden) currently, so let's hold off until we have a decision on Move icu_plurals::rules::reference to datagen? #5181
  • @sffc Is it normalize or normalize_str? Maybe we say that we do _str only for from functions or functions that could take other types of arguments

Conclusion:

  • Update all string-accepting functions to be foo(&str), foo_utf8(&[u8]), foo_utf16(&[u16])
  • Don't touch doc(hidden) modules at this time

LGTM: @sffc (@robertbastian) @Manishearth

@Manishearth
Copy link
Member

@robertbastian How much of this is done? Does this just need a final audit, or are there pieces that still need doing?

@Manishearth Manishearth added the S-medium Size: Less than a week (larger bug fix or enhancement) label Jan 7, 2025
@robertbastian
Copy link
Member

No change since the last update from me.

@robertbastian
Copy link
Member

Blocked on #5181

@Manishearth
Copy link
Member

@robertbastian is that the only case? I'd say we can close this issue then.

@robertbastian
Copy link
Member

Again, #4931 (comment)

@sffc
Copy link
Member Author

sffc commented Feb 12, 2025

datetime::pattern::reference is now datetime::provider::pattern::reference (an unstable module)

@Manishearth
Copy link
Member

Assuming #6103, I think we can close this out once that lands.

@Manishearth
Copy link
Member

That landed, nothing left is in stable code.

@github-project-automation github-project-automation bot moved this from Unclaimed for sprint to Done in icu4x 2.0 Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-meta Component: Relating to ICU4X as a whole S-medium Size: Less than a week (larger bug fix or enhancement)
Projects
Status: Done
Development

No branches or pull requests

4 participants