-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex::replace* methods taking &str rather than Cow<str> harms efficiency #676
Comments
Note that, as a workaround, you can write own helper to expand fn replace_cow<'a>(cow: Cow<'a, str>, regex: &Regex, replacement: &str) -> Cow<'a, str> {
match cow {
Cow::Borrowed(s) => regex.replace(s, replacement),
Cow::Owned(s) => Cow::Owned(regex.replace(&s, replacement).into_owned()),
}
} |
@chris-morgan Could you please post a complete example that compiles? Along with your desired variant that doesn't compile? It would also help if you could suggest your desired fix. |
@RReverser That still does an unnecessary clone in the owned case when there are no matches. @BurntSushi Full trivial example that makes an unnecessary allocation in order to compile (otherwise use std::borrow::Cow;
fn sub(input: &str) -> String {
let regex = regex::Regex::new("f").unwrap();
let aleph = regex.replace(input, "m");
let beth = regex.replace(&aleph, "x");
beth.into_owned()
}
fn main() {
println!("{}", sub("foo"));
} What I would like to work, and which would make no allocations in the second use std::borrow::Cow;
fn sub(input: &str) -> Cow<str> {
let regex = regex::Regex::new("f").unwrap();
let aleph = regex.replace(input, "m");
let beth = regex.replace(aleph, "x");
beth
}
fn main() {
println!("{}", sub("foo"));
} The general nature of the proposed alteration would be that the type of the I’m not confident that this would not be a breaking change, because I think it could conceivably mess with inference. If it’s deemed a breaking change, then adding new methods is all I can think of, which is eww. The changes within the body of |
Do you mean when input is already a |
Yes it does: |
Right, good point, the second case also needs to be expanded. FWIW a very similar problem is the reason I've created cow-utils. While it's not strictly Regex-oriented, you can use its If there is enough interest, it should be quite easy to add a special feature to support |
One limitation is that it still intentionally accepts only The reason for that is that consuming |
I see. Thanks for elaborating. That makes things a lot clearer. My general opinion on I think the suggestion of In terms of non-breaking changes, I think that only leaves us with adding new replacement APIs. As you hinted at, I also find this unsatisfactory. I think there are probably already too many replacement APIs. Adding more for a fairly specialized use case would be unfortunate. Personally, if I were going to add more replacement APIs, then I think I would ditch fn replace_with(&self, haystack: &str, replacement: R, dst: &mut String) This way, you don't need to mess with With all of that said, I actually think your solution is much simpler: don't use regex's replace APIs. Implementing your own replace should be pretty easy. It should only be a few lines of code. |
I'm going to close this since it's partially not feasible because of it being a breaking change, and I also personally find the increased complexity of the API not worth it. |
@chris-morgan , |
Here is a workaround, based on rust-lang/rust#65143 (comment) — it's ugly because it relies on the implementation detail that a borrowed string currently only ever returned when nothing was replaced — in theory replacing beginning or end of a string with empty could return a borrowed substring of the original but feel pretty confident that optimizing for these cases is never going to happen in the let orig = String::from("Hello");
let result = regex.replace(&orig, "foo");
let result = if let Cow::Owned(_) = result {
result.into_owned()
} else {
orig
}; As I see it, the Applied for the earlier example by @chris-morgan at #676 (comment) assuming I understood the request correctly: fn sub(input: &str) -> String {
let regex = regex::Regex::new("f").unwrap();
let aleph = regex.replace(input, "m").into_owned();
let beth = regex.replace(&aleph, "x");
if let Cow::Owned(_) = beth {
beth.into_owned()
} else {
aleph
}
} |
Good thinking, I like that workaround. As written, you’ve ended up with an unnecessary let result = match result {
Cow::Owned(result) => result,
Cow::Borrowed(_) => orig,
}; As you say, the use std::borrow::Cow;
use regex::{Regex, Replacer};
/// Extension methods for `Regex` that operate on `Cow<str>` instead of `&str`.
pub trait RegexCowExt {
/// [`Regex::replace`], but taking text as `Cow<str>` instead of `&str`.
fn replace_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str>;
/// [`Regex::replace_all`], but taking text as `Cow<str>` instead of `&str`.
fn replace_all_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str>;
/// [`Regex::replacen`], but taking text as `Cow<str>` instead of `&str`.
fn replacen_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, limit: usize, rep: R) -> Cow<'t, str>;
}
impl RegexCowExt for Regex {
fn replace_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str> {
match self.replace(&text, rep) {
Cow::Owned(result) => Cow::Owned(result),
Cow::Borrowed(_) => text,
}
}
fn replace_all_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str> {
match self.replace_all(&text, rep) {
Cow::Owned(result) => Cow::Owned(result),
Cow::Borrowed(_) => text,
}
}
fn replacen_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, limit: usize, rep: R) -> Cow<'t, str> {
match self.replacen(&text, limit, rep) {
Cow::Owned(result) => Cow::Owned(result),
Cow::Borrowed(_) => text,
}
}
} And sample usage (playground) with my earlier example plus returning use path::to::RegexCowExt; // if not in the same module
fn sub(input: &str) -> Cow<str> {
let regex = regex::Regex::new("f").unwrap();
let aleph = regex.replace(input, "m");
let beth = regex.replace_cow(aleph, "x");
beth
}
fn main() {
println!("{}", sub("foo"));
} A few possible options here:
Option four it is, so far as I’m concerned! |
I thought the contents of the |
Enum variants and their fields are always public. The closest you get to private is (Also if it was private Rust wouldn’t tell you what the fields were at all, only that there were other private fields.) |
Re @chris-morgan 's #676 (comment): I wonder if something like this, using fn replace_all_cow<'t, R: Replacer>(&self, text: Cow<'t, str>, rep: R) -> Cow<'t, str> {
match self.replace_all(&text, rep) {
Cow::Borrowed(borrow) if std::ptr::eq(borrow, &text) => text,
other => other
}
} This could be extended to convert any function |
The Regex::replace* methods return a
Cow<str>
for efficiency, but can't take aCow<str>
. This harms efficiency in cases like multiple sequential regular expressions:Instead you’re stuck with this which necessarily wastes an allocation:
The text was updated successfully, but these errors were encountered: