-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic results in custom return types with IndexMap
#386
Deterministic results in custom return types with IndexMap
#386
Conversation
The pyo3 implementation will be pretty simple, I did the impl for hashbrown's HashMap: PyO3/pyo3#1114 (although obviously check the current HEAD as I'm sure there have been updates, that's just a good starting point) it'll probably be more or less identical to that PR. If this does require a pyo3 change 2 options in the interim is to create a custom wrapper struct that contains a |
Pull Request Test Coverage Report for Build 1222342788
💛 - Coveralls |
I made progress on this, the trait is now in PyO3 so in the future we will be able to use it. I still have to consider if it is worth the effort to create temporary code while PyO3 doesn't release. With regards to benchmark, I will have to redo them. Originally I was using the default hasher for IndexMap and it was slower than hashbrown's because hashbrown's ahash is super fast. I've switched to using hashbrown's default hasher with |
I ran the benchmarks from https://github.com/mtreinish/retworkx-bench and it showed an improvement on the time_graph_greed_coloring becnhmarks and a small regression on floyd warshall:
Everything else in the benchmark suite remained basically the same. So it looks like indexmap is a good solution for retaining insertion order without any performance penalty. (although I will say the benchmarks the parametric benchmarks where these differences show up are kind of bogus, I need to rewrite them to be more representative graphs) |
Right now functions that use or wrap a HashMap for their return type to Python have a non-deterministic iteration order. This is because HashMap is by definition an unordered type. However, the expectation for Python users is that since Python 3.6 dictionaries are ordered as they preserve insertion order. To fix this user expectation for graph_greedy_color() this commit leverages IndexMap from the indexmap crate [1] which offers similar performance but offers consistent order (and slightly faster iteration, although for our use case it might not matter). This will provide a consistent insertion order for the output dict from graph_greedy_color(). Wider use of IndexMap for other places we return or wrap HashMaps to Python is being done in Qiskit#386 to get this advantage more broadly, but it depends on a new PyO3 release to work. So in the meantime this just fixes the issue for graph_greedy_color() where this a reported issue and need for the deterministic iteration order (it's also doesn't require the PyO3 side changeseasy as it returns a PyDict instead of relying on the pyfunction macro to build to convert for us). Fixes Qiskit#347 Co-Authored-By: Ivan Carvalho <[email protected]> [1] https://crates.io/crates/indexmap
Right now functions that use or wrap a HashMap for their return type to Python have a non-deterministic iteration order. This is because HashMap is by definition an unordered type. However, the expectation for Python users is that since Python 3.6 dictionaries are ordered as they preserve insertion order. To fix this user expectation for graph_greedy_color() this commit leverages IndexMap from the indexmap crate [1] which offers similar performance but offers consistent order (and slightly faster iteration, although for our use case it might not matter). This will provide a consistent insertion order for the output dict from graph_greedy_color(). Wider use of IndexMap for other places we return or wrap HashMaps to Python is being done in #386 to get this advantage more broadly, but it depends on a new PyO3 release to work. So in the meantime this just fixes the issue for graph_greedy_color() where this a reported issue and need for the deterministic iteration order (it's also doesn't require the PyO3 side changeseasy as it returns a PyDict instead of relying on the pyfunction macro to build to convert for us). Fixes #347 Co-Authored-By: Ivan Carvalho <[email protected]> [1] https://crates.io/crates/indexmap
So with #407 merging I think this is unblocked now (pyo3 0.14.2 include the indexmap support). But since we're close to 0.10.0 I think we should hold this after the release and wait we open 0.11.0 to make this change. Just so we have more time to live with the potential performance impact of switching to indexmap and make adjustments. |
IndexMap
IndexMap
This is ready for review and no longer WIP. I will let you make the call for which release this goes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There are still a few places, besides custom return types, where we return a HashMap
and we could switch to IndexMap
. I have spotted graph::adj
, digraph::adj
, digraph::adj_direction
, graph::compose
, digraph::compose
, core_number
and max_weight_matching
.
My guess is that we will not see a significant performance regression since most of the algorithms are still using a HashMap
and they build an IndexMap
only at the return phase.
src/dictmap.rs
Outdated
pub type DictMap<K, V> = indexmap::IndexMap<K, V, ahash::RandomState>; | ||
|
||
#[macro_export] | ||
macro_rules! _dictmap_new { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A macro works fine here. In case we want to keep the common syntax DictMap::new()
we could define a custom trait like
pub trait Init {
fn new() -> Self
where
Self: Sized;
fn with_capacity(n: usize) -> Self
where
Self: Sized;
}
impl<K, V> Init for DictMap<K, V> {
fn new() -> Self {
indexmap::IndexMap::with_capacity_and_hasher(
0,
ahash::RandomState::default(),
)
}
fn with_capacity(n: usize) -> Self {
indexmap::IndexMap::with_capacity_and_hasher(
n,
ahash::RandomState::default(),
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea and I implemented it. One remarkable thing to say is that the trait must be in scope, otherwise we get:
= help: items from traits can only be used if the trait is in scope
= note: the following trait is implemented but not in scope; perhaps add a `use` for it:
`use crate::dictmap::Init;`
To overcome that, I replaced the imports with use crate::dictmap::*
to guarantee the trait would be in scope
Indeed we can replace those, I started replacing the one for custom types but because of #347 I think we can also extend the use to where we have return types like HashMap or PyDict. Performance will not be affected and we'd have a more Pythonic result |
I have incorporated the feedback, now more functions are deterministic and the diff is just replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
I have left some inline comments that suggest reverting back to HashMap
for some variables that do not affect the output result. The same goes for the internal changes in vf2.rs
and max_weight_matching.rs
, i.e we don't really need there the determinism that IndexMap
offers and even if it's slightly slower we should avoid using it.
In the previous comment, i meant to replace the HashSet
used in max_weight_matching.rs
https://github.com/Qiskit/retworkx/blob/main/src/matching/max_weight_matching.rs#L867 with an IndexSet
.
Also, we probably need an IndexMap
for the path
variable in dijkstra::dijkstra
(https://github.com/Qiskit/retworkx/blob/main/src/shortest_path/dijkstra.rs#L105) since we iterate over this in some places (e.g here ) and might result in non-deterministic behavior.
I removed DictMap where it wasn't necessary. For Also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Could you also revert the changes in vf2.rs
and max_weight_matching.rs
? I don't see a reason using an IndexMap
there.
Removed DictMap from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the updates
Related to the discussions in #347 about deterministic behaviour
Replaces HashMap with IndexMap from the indexmap as the default hashmap in custom return types. IndexMap is inspired by Python's dict implementation, hence it presevers insertion order as long items are not removed. This leads to a deterministic behaviour, and removes the need to fix the seed in the hashmap to reproduce the results.
TODO List:
Benchmark code to compare the performanceMake(that was easy, just had to enableall_pairs_dijkstra
parallel againrayon
feature in indexmap)Contribute the traits for(done in Add optional support for conversion fromindexmap::IndexMap
to PyO3indexmap::IndexMap
PyO3/pyo3#1728)Comments
If this gets approved, we'd need to wait for a new PyO3 release. To make the conversions between PyDict and IndexMap work, I had to implement the traits on their crate.