get_many_mut with variable number of keys #332

Ten0 · 2022-05-09T17:18:11Z

get_many_mut on slices (rust-lang/rust#83608) does not need to handle a variable number of keys (although it might be nice) because, worse case scenario, you can sort your input array of indexes and iterate over it with successive split_at_mut.

In this case however, that's not an option, as HashMaps don't have the equivalent of split_at_mut.
So it may be useful to be able to lookup a variable (somewhat large) number of distinct keys at the same time in a HashMap, and get mutable references back.

That is in particular needed when we have a storage of many entries, and we want to run a round of updates on distinct subsets of keys, and want to run these in parallel.

Having get_many_mut follow a similar model to what is described at rust-lang/rust#83608 (review) and providing an interface that allows for this would consequently be useful - in fact, that is something I currently would use.

The text was updated successfully, but these errors were encountered:

Make HashMap extension fn like get_many_mut but for variable sizes See also rust-lang/hashbrown#332

What42Pizza · 2023-02-15T23:23:18Z

I'd also really like to have this, I'm in a situation right now where I need to have an arbitrary number of mutable references and the way I'm doing it for now isn't great

Update: I've made some code that allows you to do this, I can share it if anyone needs it

Amanieu · 2023-02-22T14:27:46Z

The problem with returning a variable number of mutable references is that it requires returning a Vec<&mut T> which requires a dynamic allocation. It might end up being faster to just keep the keys around and doing a hash table lookup every time the value is needed.

Ten0 · 2023-02-22T18:36:06Z

It might end up being faster to just keep the keys around and doing a hash table lookup every time the value is needed.

It seems that this would be for the end user to decide, no?
In particular if the follow-up is e.g. rayon parallel processing or that kind of stuff it quickly becomes worth it.

Amanieu · 2023-02-23T11:53:59Z

Keep in mind that get_many_mut has a complexity of O(n^2) where n is the number of keys to look up. This is because each returned value must be checked against every other returned value to ensure that none of them point to the same entry, which is necessary to satisfy Rust's safety guarantees (mutable references are unique). This can quickly add up when n becomes large.

Ten0 · 2023-02-23T12:42:24Z

each returned value must be checked against every other returned value

That seems reasonable on the const size version because those are unlikely to ever be very large, but here indeed that would seem a bit too expensive, so probably flagging entry as currently being borrowed would fix this?

By the way, on the const size version, how do you deal with Eq implementations that are inconsistent? ("It's ok I'm equal to nobody - oh wait in fact I'm the same key" -> yields several mutable borrows to the same entry)
It looks like the flag at the entry level would be the solution for this as well, no?

Amanieu · 2023-02-23T12:55:29Z

We compare the pointers to the entries that are actually returned by the lookups.

Ten0 · 2023-02-23T13:05:18Z

Ah of course! ^^
Then I guess if n is too large just sorting the pointers and checking for no consecutive dupes works in O(n* log(n)).

The problems becomes similar to this one:
rust-lang/rust#83608 (review)

JustForFun88 · 2023-02-23T13:08:39Z

Ah of course! ^^ Then I guess if n is too large just sorting the pointers and checking for no consecutive dupes works in O(n* log(n)).

Unlikely, since you still need to take into account the sort itself

Amanieu · 2023-02-23T13:09:56Z

Currently the implementation is optimized for small n (the most common case is expected to be n = 2).

Ten0 · 2023-02-23T13:24:12Z

Unlikely, since you still need to take into account the sort itself

The sort itself is O(n* log(n)) where n is the number of queried entries.

Currently the implementation is optimized for small n

For the const-size version it's pretty clear that we want to ensure that we are as optimized as possible for small slice sizes. It's also very practical to be able to pattern-match directly on the result (so we don't want to remove a const size version), and I would imagine as well that the most common case is 2.

However it looks like we could provide different APIs through the same function via a GetManyMut trait, as was considered for the std's version on slices.

Then we could have different versions based on the slice size:

let ptrs: Vec<*mut ??> = /* ... */;
if n < 10 { // 10 is the limit for insertion sort in std
    // do n^2
} else {
    let mut ptrs_sorted = ptrs.clone();
    ptrs_sorted.sort_unstable(); // n * log(n)
    assert!(ptrs_sorted.windows(2).all(|w| w[0] != w[1]); // n
}

having optimal perf for both cases. (That was actually also suggested for the std const version here)

With pointers, it looks like the problem becomes essentially exactly the same as they have in std with indexes.
rust-lang/rust#83608 (review)
rust-lang/rust#83608 (comment)

JustForFun88 · 2023-02-23T14:14:33Z

Unlikely, since you still need to take into account the sort itself

The sort itself is O(n* log(n)) where n is the number of queried entries.

Well, as you can see from your own example, you first need to sort, and then go through the array again, looking for the equal pointers. In addition, you do not take into account that sorting will violate the order of the values. That is, the corresponding key will not point to the corresponding value.

Ten0 · 2023-02-23T14:31:22Z

Well, as you can see from your own example, you first need to sort, and then go through the array again, looking for the equal pointers

I'm not sure why that is an issue.

In addition, you do not take into account that sorting will violate the order of the values. That is, the corresponding key will not point to the corresponding value.

I do, that's what the ptrs.clone() is for. I'm returning ptrs at the end.
If we want to have a single alloc we could also with_capacity(n*2) and duplicate the contents at the end of the vec, sort the subslice for the check, then truncate it.

JustForFun88 · 2023-02-23T19:02:36Z

I do, that's what the ptrs.clone() is for. I'm returning ptrs at the end. If we want to have a single alloc we could also with_capacity(n*2) and duplicate the contents at the end of the vec, sort the subslice for the check, then truncate it.

Hmm... it might work. It looks like the method you suggested will improve the existing implementation as well.

As for variable length. Why not consider using something like https://doc.rust-lang.org/std/array/struct.IntoIter.html. For example, create a separate search array and a separate output array. Then push only unique pointers into the output array and return it as IntoIter?

Ten0 · 2023-02-23T19:06:22Z

Why not consider...

I'm afraid I don't understand what you have in mind. Perhaps that would be easier with a code sample?

JustForFun88 · 2023-02-24T11:10:01Z

get_many_mut on slices (rust-lang/rust#83608) does not need to handle a variable number of keys (although it might be nice) because, worse case scenario, you can sort your input array of indexes and iterate over it with successive split_at_mut.

Actually I figure out how it can be done. I try to implement it in next two days.

JustForFun88 · 2023-02-25T09:03:41Z

@Amanieu, @Ten0 What if we provide the following API (draft pull request #408):

pub fn try_get_many_key_value_mut<'a, Q, I, const N: usize>(
    &mut self,
    iter: &mut I,
) -> ArrayIter<(&K, &mut V), N>
where
    I: Iterator<Item = &'a Q>,
    Q: ?Sized + Hash + Equivalent<K> + 'a,
{
    /* implementation */
}

pub unsafe fn try_get_many_key_value_unchecked_mut<'a, Q, I, const N: usize>(
    &mut self,
    iter: &mut I,
) -> ArrayIter<(&K, &mut V), N>
where
    I: Iterator<Item = &'a Q>,
    Q: ?Sized + Hash + Equivalent<K> + 'a,
{
    /* implementation */
}

Where ArrayIter practically repeats the implementation of IntoIter from https://doc.rust-lang.org/std/array/struct.IntoIter.html. Unfortunately, using IntoIter directly does not work due to the instability of new_unchecked. The ArrayIter itself is not only an iterator, it has a few more methods:

impl<T, const N: usize> ArrayIter<T, N> {
    /// Returns an immutable slice of all elements that have not been yielded yet.
    #[inline]
    pub fn as_slice(&self) -> &[T] {
         /* implementation */
    }

    /// Returns a mutable slice of all elements that have not been yielded yet.
    #[inline]
    pub fn as_mut_slice(&mut self) -> &mut [T] {
         /* implementation */
    }

    /// Returns an [`ArrayIter`] of the same size as `self`, with function `f`
    /// applied to each element in order.
    pub fn convert<F, U>(self, mut f: F) -> ArrayIter<U, N>
    where
        F: FnMut(T) -> U,
    {
         /* implementation */
    }
}

Amanieu · 2023-03-01T01:58:31Z

While your implementation is very clever, I don't think this is an API I actually want in hashbrown. It just ends up being very hard to use correctly (due to the interactions between the N constant and the iterator size, for example).

As I've said before, you really should just do separate lookups when you actually need the values instead of keeping an arbitrary number of mutable references around.

Amanieu mentioned this issue May 20, 2022

Expose get_many_mut and get_many_unchecked_mut to HashMap rust-lang/rust#94647

Merged

PieKing1215 added a commit to PieKing1215/FallingSandEngine that referenced this issue Feb 1, 2023

Do population stage 0 in parallel

bea84f0

Make HashMap extension fn like get_many_mut but for variable sizes See also rust-lang/hashbrown#332

This comment was marked as outdated.

Sign in to view

JustForFun88 mentioned this issue Feb 25, 2023

Initial implementation of try_get_many #408

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_many_mut with variable number of keys #332

get_many_mut with variable number of keys #332

Ten0 commented May 9, 2022 •

edited

Loading

What42Pizza commented Feb 15, 2023 •

edited

Loading

Amanieu commented Feb 22, 2023

Ten0 commented Feb 22, 2023 •

edited

Loading

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023 •

edited

Loading

JustForFun88 commented Feb 23, 2023

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023 •

edited

Loading

JustForFun88 commented Feb 23, 2023

Ten0 commented Feb 23, 2023 •

edited

Loading

JustForFun88 commented Feb 23, 2023

Ten0 commented Feb 23, 2023

This comment was marked as outdated.

JustForFun88 commented Feb 24, 2023

JustForFun88 commented Feb 25, 2023

Amanieu commented Mar 1, 2023

get_many_mut with variable number of keys #332

get_many_mut with variable number of keys #332

Comments

Ten0 commented May 9, 2022 • edited Loading

What42Pizza commented Feb 15, 2023 • edited Loading

Amanieu commented Feb 22, 2023

Ten0 commented Feb 22, 2023 • edited Loading

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023 • edited Loading

JustForFun88 commented Feb 23, 2023

Amanieu commented Feb 23, 2023

Ten0 commented Feb 23, 2023 • edited Loading

JustForFun88 commented Feb 23, 2023

Ten0 commented Feb 23, 2023 • edited Loading

JustForFun88 commented Feb 23, 2023

Ten0 commented Feb 23, 2023

This comment was marked as outdated.

JustForFun88 commented Feb 24, 2023

JustForFun88 commented Feb 25, 2023

Amanieu commented Mar 1, 2023

Ten0 commented May 9, 2022 •

edited

Loading

What42Pizza commented Feb 15, 2023 •

edited

Loading

Ten0 commented Feb 22, 2023 •

edited

Loading

Ten0 commented Feb 23, 2023 •

edited

Loading

Ten0 commented Feb 23, 2023 •

edited

Loading

Ten0 commented Feb 23, 2023 •

edited

Loading