Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make many extend into other collections that Vec #1621

Merged
merged 9 commits into from
Jan 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 8 additions & 11 deletions benchmarks/benches/ini.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,16 @@ fn key_value(i: &[u8]) -> IResult<&[u8], (&str, &str)> {
}

fn categories(i: &[u8]) -> IResult<&[u8], HashMap<&str, HashMap<&str, &str>>> {
map(
many(
0..,
separated_pair(
category,
opt(multispace),
map(
many(0.., terminated(key_value, opt(multispace))),
|vec: Vec<_>| vec.into_iter().collect(),
),
many(
0..,
separated_pair(
category,
opt(multispace),
map(
many(0.., terminated(key_value, opt(multispace))),
|vec: Vec<_>| vec.into_iter().collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this into_iter().collect() was left in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed it, fixed in 4c6e45e

),
),
|vec: Vec<_>| vec.into_iter().collect(),
)(i)
}

Expand Down
18 changes: 2 additions & 16 deletions benchmarks/benches/ini_str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,30 +39,16 @@ fn key_value(i: &str) -> IResult<&str, (&str, &str)> {
Ok((i, (key, val)))
}

fn keys_and_values_aggregator(i: &str) -> IResult<&str, Vec<(&str, &str)>> {
many(0.., key_value)(i)
}

fn keys_and_values(input: &str) -> IResult<&str, HashMap<&str, &str>> {
match keys_and_values_aggregator(input) {
Ok((i, tuple_vec)) => Ok((i, tuple_vec.into_iter().collect())),
Err(e) => Err(e),
}
many(0.., key_value)(input)
}

fn category_and_keys(i: &str) -> IResult<&str, (&str, HashMap<&str, &str>)> {
pair(category, keys_and_values)(i)
}

fn categories_aggregator(i: &str) -> IResult<&str, Vec<(&str, HashMap<&str, &str>)>> {
many(0.., category_and_keys)(i)
}

fn categories(input: &str) -> IResult<&str, HashMap<&str, HashMap<&str, &str>>> {
match categories_aggregator(input) {
Ok((i, tuple_vec)) => Ok((i, tuple_vec.into_iter().collect())),
Err(e) => Err(e),
}
many(0.., category_and_keys)(input)
}

fn bench_ini_str(c: &mut Criterion) {
Expand Down
80 changes: 66 additions & 14 deletions src/multi/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ use crate::error::ParseError;
use crate::internal::{Err, IResult, Needed, Parser};
use crate::lib::std::num::NonZeroUsize;
#[cfg(feature = "alloc")]
use crate::lib::std::ops::Bound;
#[cfg(feature = "alloc")]
use crate::lib::std::vec::Vec;
use crate::Input;
use crate::{
Expand Down Expand Up @@ -1053,7 +1051,7 @@ where
}
}

/// Repeats the embedded parser and returns the results in a `Vec`.
/// Repeats the embedded parser and collects the results in a type implementing `Extend + Default`.
/// Fails if the amount of time the embedded parser is run is not
/// within the specified range.
/// # Arguments
Expand All @@ -1062,6 +1060,7 @@ where
/// * A single `usize` value is equivalent to `value..=value`.
/// * An empty range is invalid.
/// * `parse` The parser to apply.
///
/// ```rust
/// # #[macro_use] extern crate nom;
/// # use nom::{Err, error::ErrorKind, Needed, IResult};
Expand All @@ -1078,29 +1077,82 @@ where
/// assert_eq!(parser(""), Ok(("", vec![])));
/// assert_eq!(parser("abcabcabc"), Ok(("abc", vec!["abc", "abc"])));
/// ```
///
/// This is not limited to `Vec`, other collections like `HashMap`
/// can be used:
///
/// ```rust
/// # #[macro_use] extern crate nom;
/// # use nom::{Err, error::ErrorKind, Needed, IResult};
/// use nom::multi::many;
/// use nom::bytes::complete::{tag, take_while};
/// use nom::sequence::{separated_pair, terminated};
/// use nom::AsChar;
///
/// use std::collections::HashMap;
///
/// fn key_value(s: &str) -> IResult<&str, HashMap<&str, &str>> {
/// many(0.., terminated(
/// separated_pair(
/// take_while(AsChar::is_alpha),
/// tag("="),
/// take_while(AsChar::is_alpha)
/// ),
/// tag(";")
/// ))(s)
/// }
///
/// assert_eq!(
/// key_value("a=b;c=d;"),
/// Ok(("", HashMap::from([("a", "b"), ("c", "d")])))
/// );
/// ```
///
/// If more control is needed on the default value, [fold] can
/// be used instead:
///
/// ```rust
/// # #[macro_use] extern crate nom;
/// # use nom::{Err, error::ErrorKind, Needed, IResult};
/// use nom::multi::fold;
/// use nom::bytes::complete::tag;
///
///
/// fn parser(s: &str) -> IResult<&str, Vec<&str>> {
/// fold(
/// 0..=4,
/// tag("abc"),
/// // preallocates a vector of the max size
/// || Vec::with_capacity(4),
/// |mut acc: Vec<_>, item| {
/// acc.push(item);
/// acc
/// }
/// )(s)
/// }
///
///
/// assert_eq!(parser("abcabcabcabc"), Ok(("", vec!["abc", "abc", "abc", "abc"])));
/// ```
#[cfg(feature = "alloc")]
#[cfg_attr(feature = "docsrs", doc(cfg(feature = "alloc")))]
pub fn many<I, O, E, F, G>(range: G, mut parse: F) -> impl FnMut(I) -> IResult<I, Vec<O>, E>
pub fn many<I, O, E, Collection, F, G>(
range: G,
mut parse: F,
) -> impl FnMut(I) -> IResult<I, Collection, E>
where
I: Clone + InputLength,
F: Parser<I, O, E>,
Collection: Extend<O> + Default,
E: ParseError<I>,
G: NomRange<usize>,
{
let capacity = match range.bounds() {
(Bound::Included(start), _) => start,
(Bound::Excluded(start), _) => start + 1,
_ => 4,
};

move |mut input: I| {
if range.is_inverted() {
return Err(Err::Failure(E::from_error_kind(input, ErrorKind::Many)));
}

let max_initial_capacity =
MAX_INITIAL_CAPACITY_BYTES / crate::lib::std::mem::size_of::<O>().max(1);
let mut res = crate::lib::std::vec::Vec::with_capacity(capacity.min(max_initial_capacity));
let mut res = Collection::default();

for count in range.bounded_iter() {
let len = input.input_len();
Expand All @@ -1111,7 +1163,7 @@ where
return Err(Err::Error(E::from_error_kind(input, ErrorKind::Many)));
}

res.push(value);
res.extend(Some(value));
input = tail;
}
Err(Err::Error(e)) => {
Expand Down
4 changes: 3 additions & 1 deletion src/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -487,11 +487,13 @@ mod test {
#[test]
#[cfg(feature = "alloc")]
fn recognize_is_a() {
use crate::lib::std::vec::Vec;

let a = "aabbab";
let b = "ababcd";

fn f(i: &str) -> IResult<&str, &str> {
recognize(many(1.., alt((tag("a"), tag("b")))))(i)
recognize::<_, Vec<&str>, _, _>(many(1.., alt((tag("a"), tag("b")))))(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been going back and forth on this idea partly because of type inference issues when fold_many* easily let's people do the same thing. I was surprised; they came up less than I expected.

This will likely remove the optimizatio of preallocating vector capacity, but we can get this back with a wrapper type if needed, and being able to created directly a hashmap or other collections instead of going through a vec first will be way more useful

fold_many is likely lighter weight than a newtype (code wise). Maybe we should provide a fold_many example that sets the capacity and have many link out to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense that there would be inference issues in the tests calling many directly and the ones using recognize , but it worked surprisingly well everywhere else, so this is giving me confidence to make this change.

The example pointing towards fold_many is a good idea, that will let people define their own capacity allocation

}

assert_eq!(f(a), Ok((&a[6..], a)));
Expand Down
2 changes: 1 addition & 1 deletion tests/fnmut.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ fn parse() {
let mut counter = 0;

let res = {
let mut parser = many::<_, _, (), _, _>(0.., |i| {
let mut parser = many::<_, _, (), Vec<&str>, _, _>(0.., |i| {
counter += 1;
tag("abc")(i)
});
Expand Down
4 changes: 1 addition & 3 deletions tests/ini.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,7 @@ fn key_value(i: &[u8]) -> IResult<&[u8], (&str, &str)> {
}

fn keys_and_values(i: &[u8]) -> IResult<&[u8], HashMap<&str, &str>> {
map(many(0.., terminated(key_value, opt(multispace))), |vec| {
vec.into_iter().collect()
})(i)
many(0.., terminated(key_value, opt(multispace)))(i)
}

fn category_and_keys(i: &[u8]) -> IResult<&[u8], (&str, HashMap<&str, &str>)> {
Expand Down
2 changes: 1 addition & 1 deletion tests/issues.rs
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ fn issue_942() {
fn issue_many_m_n_with_zeros() {
use nom::character::complete::char;
use nom::multi::many;
let mut parser = many::<_, _, (), _, _>(0..=0, char('a'));
let mut parser = many::<_, _, (), Vec<char>, _, _>(0..=0, char('a'));
assert_eq!(parser("aaa"), Ok(("aaa", vec!())));
}

Expand Down