-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Needle API (née Pattern API) #2500
Conversation
kennytm
commented
Jul 14, 2018
•
edited by Centril
Loading
edited by Centril
- Rendered
- Pre-RFC
- Prototype:
- Source: https://github.com/kennytm/pattern-3
- Docs: https://docs.rs/pattern-3/
🎉 Happy to see you got some use out of my link to https://crates.io/crates/galil-seiferas and cc @bluss who's the author of that crate. |
SharedHaystack seems like a general concept. Could we call it CheapClone or something like that - I'm sure there's lots of other places we'd like to know if a clone is expensive or not. |
@gilescope Interesting, though if we generalize the concept it would raise the question what is meant by "cheap", e.g. is the Suppose we do introduce the pub trait ShallowClone: Clone {}
impl<'a, T: ?Sized + 'a> ShallowClone for &'a T {}
impl<T: ?Sized> ShallowClone for Rc<T> {}
impl<T: ?Sized> ShallowClone for Arc<T> {} the #[deprecated]
trait SharedHaystack = Haystack + ShallowClone; ... as long as Anyway |
text/0000-pattern-3.md
Outdated
|
||
### Consumer | ||
|
||
A consumer provides the `.consume()` method to implement `starts_with()` and `trim_start()`. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to implement
starts_with()
Is that missing from this or outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the starts_with
algorithm, not the trait method :)
It's the most 🚲🏚 thing, but I don't find |
Experience report — implementing
|
Thanks @shepmaster! I'll update the descriptions in the
Could you elaborate what do you mean by "subtle"?
Yes this problem also exists for my unsafe impl<'p> Consumer<str> for RegexSearcher<'p> {
fn consume(&mut self, span: Span<&str>) -> Option<usize> {
let (hay, range) = span.into_parts();
let m = self.regex.find_at(hay, range.start)?;
if m.start() == range.start {
Some(m.end())
} else {
None
}
}
} I don't know if this could be improved performance-wise other than asking the Searcher implementor to provide such primitive too. (If we disregard the performance issue we could default impl a |
Experience report — implementing
|
Explained why Pattern cannot be merged into Searcher. Block on RFC 1672.
UpdateAddressing #2500 (comment).
A general An unsafe version can be done as let (hay, range) = span.into_parts();
let hay = hay.as_inner();
unsafe { Span::from_parts(hay, range) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generally great RFC and I'm overall quite happy with the API and definitely happy with the demonstrated performance improvements.
I would be interested in seeing more detail and benchmarks in regards to the behaviour with owned haystacks. From what I can tell, this makes split
on Vec
take quadratic time, since the trisection needs to copy the entire tail of the Vec
on every split. This seems like a hard problem, and it would be bad to back ourselves into a corner.
One (verbose) solution would be to introduce an intermediate data structure, a PartialVec<T>
that owns the memory of a whole Vec
but only owns a small range worth of elements. It would be possible to convert it into a Vec
by shifting the elements to the start. Then, split_around
would be turned into two variants (where PartialVec
would presumably be specified in another associated type), split_around_forward(...) -> (Vec, Vec, PartialVec)
and split_around_backward(...) -> (PartialVec, Vec, Vec)
. split
would use split_around_forward
and rsplit
would use split_around_backward
.
text/0000-pattern-3.md
Outdated
A hay can *borrowed* from a haystack. | ||
|
||
```rust | ||
pub trait Haystack: Deref<Target: Hay> + Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that, due to its unsafe methods being called from safe code (e.g. in trim_start), Haystack
needs to be an unsafe trait
. Otherwise, without ever writing an unsafe block and therefore promising to uphold the invariants of safe code, an invalid implementation of Haystack
could violate memory safety. The fact that split_around
and split_unchecked
are unsafe capture the fact that the caller, to preserve memory safety, must pass in valid indices, but it does nothing to prevent the callee from doing arbitrary bad behaviour even if the indices are valid.
Hay
probably also needs to be an unsafe trait
. It looks like in practice, Searcher
s are implemented for specific Hay
types, indicating trust of just those implementations, and , so it may not be strictly necessary. Additionally, one of the requirements of a valid Haystack
implementation could be the validity of the associated Hay
type. However, with the proposed impl<'a, H: Hay> Haystack for &'a H
, this is impossible to promise, and I think it would be necessary for Hay
to be an unsafe trait
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, both are now unsafe.
text/0000-pattern-3.md
Outdated
|
||
A `SharedHaystack` is a marker sub-trait which tells the compiler this haystack can cheaply be | ||
cheaply cloned (i.e. shared), e.g. a `&H` or `Rc<H>`. Implementing this trait alters some behavior | ||
of the `Span` structure discussed next section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat uncomfortable by the use of specialization to modify the behaviour of Span
instead of just providing more optimized functions. Admittedly, this changed behaviour seems hard to directly observe, since into_parts
is only available on shared spans. This definitely isn't a big deal.
text/0000-pattern-3.md
Outdated
with invalid ranges. Implementations of these methods often start with: | ||
|
||
```rust | ||
fn search(&mut self, span: SharedSpan<&A>) -> Option<Range<A::Index>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SharedSpan
a relic of a previous version of this proposal? I don't see it defined anywhere and it sounds like Span<H> where H: SharedHaystack
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. (In an ancient version there was SharedSpan<H>
and UniqueSpan<H>
where Haystack
has an associated type Span
to determine which span to use. The resulting code was quite ugly.)
text/0000-pattern-3.md
Outdated
let span = unsafe { Span::from_parts("CDEFG", 3..8) }; | ||
// we can find "CD" at the start of the span. | ||
assert_eq!("CD".into_searcher().search(span.clone()), Some(3..5)); | ||
assert_eq!("CD".into_searcher().consume(span.clone()), Some(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (and the other examples calling .into_searcher().consume(...)
) be .into_consumer().consume(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Copy-and-paste error.
text/0000-pattern-3.md
Outdated
let mut searcher = pattern.into_searcher(); | ||
let mut rest = Span::from(haystack); | ||
while let Some(range) = searcher.search(rest.borrow()) { | ||
let [left, _, right] = unsafe { rest.split_around(range) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very common to call split_around
and then throw away one or more of the components. For owned containers like Vec
, at least, this involves allocating a vector for the ignored elements, copying them to their new location, then finally dropping and deallocating. Would it be possible to add more methods to Haystack
that only return some of the parts? They could have default definitions in terms of split_around
, so they shouldn't cause any more difficulty for implementers, but owned containers would be able to override them for better performance.
It also occurs to me that slice_unchecked
is actually one of these specialized methods, returning only the middle component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need 3 more names for these 😝 ([left, middle, _]
, [left, _, right]
, [_, middle, right]
)
text/0000-pattern-3.md
Outdated
pub trait Haystack: Deref<Target: Hay> + Sized { | ||
fn empty() -> Self; | ||
unsafe fn split_around(self, range: Range<Self::Target::Index>) -> [Self; 3]; | ||
unsafe fn slice_unchecked(self, range: Range<Self::Target::Index>) -> Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could slice_unchecked
have a default implementation as follows?
unsafe fn slice_unchecked(self, range: Range<Self::Target::Index>) -> Self {
let [_, middle, _] = self.split_around(range);
middle
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Added.
text/0000-pattern-3.md
Outdated
|
||
* Implement `Hay` to `str`, `[T]` and `OsStr`. | ||
|
||
* Implement `Haystack` to `∀H: Hay. &H`, `&mut str` and `&mut [T]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern_3
crate also has an implementation for Vec<T>
(though not String
or OsString
). Are those owned implementations intended eventually? Is that just out of scope for this particular RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to add these into the standard library, due to the efficiency concern you've raised.
pattern_3
does implement for Vec<T>
just to illustrate that it can transfer owned data type correctly.
I separated this comment out because it is far more questionable than the rest. I know that I personally tend to go overboard with squeezing out tiny and inconsequential bits of runtime performance at the expense of compile time and ergonomics. That said,
This doesn't feel like the right trade-off to me. Reducing the number of types is definitely useful for implementers of patterns that don't have a special optimization for
If getting the implementation right is an issue, there could just be a wrapper type along the lines of pub struct SearcherConsumer<S> {
inner: S
}
impl<H, S: Searcher<H>> Consumer<H> for SearcherConsumer {
// ...
} This could then be the default type for |
This I disagree. People seldom use
There is zero performance penalty in string matching caused by merging consumer and searcher, because we already have a runtime selection between empty and non-empty needle. In |
@kennytm: Thank you, thank you, thank you! 🎉🎉🎉 This is the kind of end state that I attempted to reach with the various Pattern API sketches, but never could put enough effort into. In general huge 👍 from me, but I had a few thoughts when reading through the RFC and comments right now.
|
Okay, so @Kimundi If impl<H, S> Pattern<H> for S
where
H: Haystack,
S: Searcher<H::Target> + Consumer<H::Target>,
{
type Searcher = Self;
type Consumer = Self;
fn into_searcher(self) -> Self { self }
fn into_consumer(self) -> Self { self }
} because due to backward compatibility we have a different blanket impl to support: impl<'h, F> Pattern<&'h str> for F
where
F: FnMut(char) -> bool,
{ ... } and these two will conflict when a type implements Given these details I'm mildly against renaming |
Experience report — consuming
|
Thanks for the report @shepmaster !
We could rename the trait cc @Centril (1) and @Kimundi (2) who want to rename The name fn contains<H, P>(haystack: H, needle: P) -> bool
where
H: Haystack,
P: Needle<H>; so for those not directly working with Searcher we are indeed "searching for a needle in a haystack".
maybe reading it as "search inside of a ____ of haystack" is better?
This is unfortunately impossible because it will conflict with: impl<'h, T, F> Pattern<&'h [T]> for F
where
F: FnMut(&T) -> bool, as we could |
My thinking is that anything is better than Other than that I don't have any strong opinions (or any opinions at all). |
@kennytm I buy that :) |
👍 for Re: Ability to search a single element: We could provide a newtype-like wrapper type: ext::find(&b"alpha"[..], Needle(b'a')); Would be kind of ugly, but at least be doable. Alternatively, we put the |
This reminds me that the module name
This unfortunately will conflict with the stabilized APIs like |
@SimonSapin Thanks!
Without #1672 some third-party types are not covered by blanket This is fine if we only focus on built-in needle types like
Updated the RFC. These are explained in details in the library docs:
previously docs.rs didn't show them due to outdated compiler, but it has just been fixed and are now visible 🎉
The simplification of ignoring |
Has the It might be interesting to experiment with designing the entire API on generative typing, but without ATC it can be trickier, and there might not be a solution to containers where not every index is valid (e.g. |
@eddyb I tried it before (as something like I also don't think with today's Rust one could safely use |
@kennytm I've prototyped an existential wrapper solution for the AFAIK it's sound when used with I'll have to look into making the construction limited to calling |
@rfcbot resolve double-ended vs reverse |
@rfcbot resolve yagni I still feel that this is an unprecedented amount of complexity for a standard library feature. However, stabilizing some set of traits to "explain" the existing behavior of |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Minor nitpick, but I think a case could be made that tasks and futures are similarly complex. |
Since |
The final comment period, with a disposition to merge, as per the review above, is now complete. |
🎉 Huzzah! This RFC has been merged! 🎉 Tracking issue: rust-lang/rust#56345 |
We discussed this RFC and specifically the implementation PR in the last few libs team meetings and have decided to revert the decision to stabilize this interface as indicated in rust-lang/rust#76901 (comment). |
Withdrawing PR ready at #3154 |