perf!: use a priority queue #104

Eh2406 · 2021-07-11T02:50:53Z

BREAKING CHANGE: Changes the api of DependencyProvider

Still wip, it needs a lot of documentation, and some thinking about the ergonomics of the api. For example what should be able to return an error? How will this work with pubgrub-rs/advanced_dependency_providers#6?

This is a big perf win on larger benchmarks. Some previous discussion:

For example Generating a ron file from all crates in a snapshot of Crates.io went from 21,648s with v0.2.1 to 3,209s with this branch.

mpizenberg · 2021-08-20T17:36:41Z

Instead of having one choose_package_version function, giving full choice to the dependency provider everytime a decision needs to be taken, the decision process is split in two.

prioritize which ranks (Package, Range) pairs in a priority list
choose_version which given a (Package, Range) pair simply pick a version

The big advantage of the prioritize function is that picking a given package is much more efficient when there is a lot of choice. For example, without priority queues, when a package introduces 10 new dependencies, we will have to choose one package among 10, then among 9, etc. Basically a complexity in N^2 if N is the number of potential packages. If instead we sort them in a priority queue, the complexity is N*log(N) since we just need to sort the dependencies once according to a computed priority. Then we can simply pick them one after the other.

One drawback I can think of is that the evaluated priority is not re-evaluated when gaining new knowledge after picking new dependencies (is it?). If I understood correctly though, any backtrack clears the prioritized potential packages, so the drawback is not that problematic. I didn't understood fully though the play with indices and decision levels and exactly how you choose to update the packages in the priority list.

So I have one main question after reading this. How much of this idea can be implemented without changing the current API of choose_package_version? Could we build a priority list fully in the dependency provider? If not exactly like this, how similar in terms of compute complexity?

By the way, the changes in extract_solution to return the solution instead of an Option is unrelated to the PR, right? The change basically assumes that the PubGrubError::Failure("How did we end up with no package to choose but no solution?") cannot happen if I understand correctly.

Eh2406 · 2021-08-20T20:34:46Z

I didn't understood fully though the play with indices and decision levels and exactly how you choose to update the packages in the priority list.

indexmap gives us a HashMap with the added property that all keys have an index and the iteration is in index order. I use this for two things.

When a decision is made, then I move the package to the index of the current_decision_level. This means that the beginning of the iteration will be all the decisions made in order. extract_solution takes advantage of that to get slightly more efficient.
When a new derivation comes in we move the pointer changed_this_decision_level so that it is smaller or equal to the index of the changed package.

The result is that we can efficiently update the prioritized_potential_packages:

If we just backtracked, then we start over and re-prioritize everything that does not already have a decision.
else re-prioritize everything after the changed_this_decision_level, witch includes all the things we modified

Could we build a priority list fully in the dependency provider?

I don't think so. To get the perf benefits choose_package_version needs to do work proportional to the number of changed ranges in packages, not proportional to the len(packages). But it has no way of knowing what has changed.

Eh2406 · 2021-09-13T15:44:40Z

Theoretically if there are never two things with the same Priority and the heuristic implemented by choose_package_version is the same as the one in prioritize+choose_version then every step of the entire algorithm should be exactly the same. Using my WIP-POC crates runner (based on the crates2 branch) I have started to experiment to see if this works in practice. The first try was to record the wonderful new logging data (thanks #107), but it filled up my SSD. 😢 So next I tried adding a impl Hasher as an argument to resolve and calling hash.write(format!("{}:{}:{}:{:?}", decision.0, decision.1, state.incompatibility_store.len(), state.partial_solution.current_decision_level).as_bytes()), just after the call to log::info!("DP chose: {} @ {:?}", decision.0, decision.1);. I then recorded the result for each of 285204 crates with both versions dev and que. The results, all but 4 crates (solana-stake-o-matic v1.2.18, v1.2.19, and solana-install v1.2.18, v1.2.19) had the same hash! I will look into how and way they are different, but it is clear that the time savings is not because this PR is calculating something different!

mpizenberg · 2021-09-14T09:26:04Z

That is great! and indeed a little weird that 4 crates had a different output.

Eh2406 · 2021-10-02T23:22:49Z

One objection to this PR is that there are algorithms that can be implemented by choose_package_version that can not be implemented in this API. Two examples (with not all the details worked out):

let v: Vec<_> = packages.collect();
let mut hasher = FxHasher::new();
v.iter().forEatch(|(p, r)| hasher.hash((p.borrow(), v.borrow()));
let hash = hasher.finish();
let (p, r) = v[hash % (v.len() - 1)];

or

let v: Vec<_> = packages.collect();
let has_a = v..iter().find(|(p, _)| p == package_named_a));
let has_b = v..iter().find(|(p, _)| p == package_named_b));
let has_c = v..iter().find(|(p, _)| p == package_named_c));

if has_a.is_sum() &&  has_b.is_sum() {
   return has_b;
} else if has_b.is_sum() &&  has_c.is_sum() {
   return has_c;
} else if has_a.is_sum() &&  has_c.is_sum() {
   return has_a;
} else {
   return v.last();
}

These examples do prove that there are things limiting about the new API. But they are also chaotic and incoherent; they are not examples of someone doing something reasonable or justified. So are there reasonable examples; examples where someone could explain why they are writing the code? I don't think so. Mostly because I cannot think of any examples. But I would not be surprised if the argument from this video can be adopted to prove that all implementations which do not have a consistent prioritization are internally contradictory.

Eh2406 · 2022-05-28T19:07:30Z

rebased.

mpizenberg · 2022-10-22T11:38:26Z

But I would not be surprised if the argument from this video can be adopted to prove that all implementations which do not have a consistent prioritization are internally contradictory.

This is assuming that the priority computed by the "utility function" (as the video calls it) has access to all the parts of the world state it needs until we ask to re-compute it again. Which is not the case, it's a tradeoff of when do we ask package priorities to be recomputed.

mpizenberg · 2022-10-22T18:51:48Z

I thought this PR needed rebasing but it seems ok actually. What would be nice is having better documentation of the new process detailing under which conditions a package priority is evaluated and re-evaluated, and finding the best suited place in doc comments to put these explanations. @Eh2406 once you've added more doc and feel this is ready for review, let me know and I'll make a more thorough reading and review.

mpizenberg · 2022-11-01T19:30:02Z

Thanks for the new doc comments! I'll start reviewing and making some naming changes during this week. Is it ok with you @Eh2406 or do you still want to apply some changes here before to avoid conflicts.

Eh2406 · 2022-11-01T19:33:53Z

I think its in good shape for review.
It would be really nice to have a helper like the old choose_package_with_fewest_versions but I am unlikely to find time to figure that out before you have time to review.

Eh2406 · 2023-10-26T02:03:13Z

I will try to get this updated and cleaned up soon. But I would like to get away from long lived branches. It there are problems with this, or ways it can be improved (and there are) lets merge it as is and open issues to follow up.

Eh2406 · 2023-10-26T23:12:09Z

While reviewing and rebasing the code today it's clear that one complicated part of this PR is the many clever things I have done with the fact that package_assignments is ordered. I have showed a lot of information into that structure. (Which packages have had decisions, and which packages have been modified since last time they were prioritized.) If that is what is preventing us merging this code, it would be possible to store that data in other forms. I haven't figured out the details, but a HashSet for "needs re-prioritization" would be pretty direct. If that would be more mergeable, please let me know.

charliermarsh · 2023-10-29T20:48:44Z

I migrated our project over to this branch without much trouble. I think the separation in the API is actually pretty nice, even ignoring the potential performance benefits.

mpizenberg · 2023-10-29T21:12:23Z

Just a thought that came to mind. The priority list is a nice way of knowing which concurrent requests to prioritize, if we are to enable concurrent dependency provider requests while waiting for the next solver's demand. Also, since the solver would always re-ask the dependency provider for an updated priority when backtracking happens, it also gives a way for the dependency provider to know which queued requests should be cancelled. Right?

Pulls in pubgrub-rs/pubgrub#104.

Eh2406 · 2023-10-30T02:32:34Z

The priority list is a nice way of knowing which concurrent requests to prioritize, if we are to enable concurrent dependency provider requests while waiting for the next solver's demand.

The API in this PR so far allows a pretty good approximation of this. If a request has already been cashed it gets a high priority, if the request has not yet returned it gets a low priority. The resolver will work through all of the known results before it asks for an unknown result. Unfortunately, when the resolver runs out of known results it will pick an item at random to block on which may or may not have completed in the meantime. This could be improved by providing a way for a command to say "by the way could we reprioritize x" (maybe something on should cancel?) or a way to say "if the highest priority is below x then reprioritize everything". I will look into whether there is a dependency queuing implementation that handles the async -> conversion for us.

Also, since the solver would always re-ask the dependency provider for an updated priority when backtracking happens, it also gives a way for the dependency provider to know which queued requests should be cancelled. Right?

Theoretically yes. It would be more ergonomic if we provided a reliable way for the dependency provider to know that we backtracked.

Using PubGrub-rs with async code is not intuitive under 0.2 nore dev nore this PR. Let's not let the conversation about how to make the async situation better block merging this PR.

aleksator

I don't see anything that should be blocking this PR. Great job guys!
I'll use my superpowers to press the "Approve" button.

BREAKING CHANGE: Changes the API of DependencyProvider

aleksator · 2023-11-05T04:59:35Z

I took a liberty to rebase and fix conflicts. Feels free to merge if you are satisfied @Eh2406

Eh2406 · 2023-11-06T02:05:37Z

I'm going to try this new merged queue thing! If/when people find problems, we can open issues and get them addressed!

BREAKING CHANGE: Changes the API of DependencyProvider

Eh2406 force-pushed the priority-queue branch from 8b847a6 to 7f15520 Compare August 21, 2021 22:16

Eh2406 force-pushed the priority-queue branch from 7f15520 to 3852caa Compare September 11, 2021 01:32

mpizenberg mentioned this pull request Oct 10, 2021

Use a VersionSet trait instead of Range #108

Merged

Eh2406 added this to the v0.3 milestone May 27, 2022

Eh2406 force-pushed the priority-queue branch 2 times, most recently from 5c072c7 to 6863f93 Compare May 28, 2022 19:07

Eh2406 force-pushed the priority-queue branch from 6863f93 to c931a9b Compare October 22, 2022 19:00

mpizenberg mentioned this pull request May 23, 2023

Project status? #128

Closed

Eh2406 marked this pull request as ready for review October 2, 2023 00:56

Eh2406 force-pushed the priority-queue branch 2 times, most recently from 86f7568 to a00c3a6 Compare October 2, 2023 19:35

Eh2406 force-pushed the priority-queue branch from a00c3a6 to a4f0f9b Compare October 19, 2023 18:23

Eh2406 mentioned this pull request Oct 22, 2023

Solving slows down dramatically when testing hundreds of versions #135

Closed

Eh2406 force-pushed the priority-queue branch from a4f0f9b to 20233c7 Compare October 23, 2023 01:47

Eh2406 force-pushed the priority-queue branch from 20233c7 to 1663bcd Compare October 26, 2023 21:53

charliermarsh mentioned this pull request Oct 29, 2023

Add PubGrub's priority queue astral-sh/uv#221

Merged

charliermarsh added a commit to astral-sh/uv that referenced this pull request Oct 29, 2023

Add PubGrub's priority queue (#221)

2ba85bf

Pulls in pubgrub-rs/pubgrub#104.

aleksator approved these changes Nov 4, 2023

View reviewed changes

aleksator force-pushed the priority-queue branch from 1663bcd to c3a4f8c Compare November 5, 2023 00:38

aleksator changed the title ~~perf: Priority queue~~ perf!: use a priority queue Nov 5, 2023

perf!: use a priority queue

7cf095e

BREAKING CHANGE: Changes the API of DependencyProvider

aleksator force-pushed the priority-queue branch from c3a4f8c to 7cf095e Compare November 5, 2023 00:42

Eh2406 added this pull request to the merge queue Nov 6, 2023

Merged via the queue into dev with commit fe309ff Nov 6, 2023
5 checks passed

Eh2406 deleted the priority-queue branch November 6, 2023 02:06

zanieb pushed a commit to astral-sh/pubgrub that referenced this pull request Nov 8, 2023

perf!: use a priority queue (pubgrub-rs#104)

7883d9b

BREAKING CHANGE: Changes the API of DependencyProvider

This was referenced Nov 15, 2023

Add incompatibility for unusable dependencies #153

Closed

fork maint #154

Closed

Eh2406 mentioned this pull request Mar 18, 2024

Benchmark on crates.io index #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf!: use a priority queue #104

perf!: use a priority queue #104

Eh2406 commented Jul 11, 2021

mpizenberg commented Aug 20, 2021

Eh2406 commented Aug 20, 2021

Eh2406 commented Sep 13, 2021

mpizenberg commented Sep 14, 2021

Eh2406 commented Oct 2, 2021

Eh2406 commented May 28, 2022

mpizenberg commented Oct 22, 2022

mpizenberg commented Oct 22, 2022

mpizenberg commented Nov 1, 2022

Eh2406 commented Nov 1, 2022

Eh2406 commented Oct 26, 2023

Eh2406 commented Oct 26, 2023

charliermarsh commented Oct 29, 2023

mpizenberg commented Oct 29, 2023 •

edited

Loading

Eh2406 commented Oct 30, 2023

aleksator left a comment

aleksator commented Nov 5, 2023

Eh2406 commented Nov 6, 2023

perf!: use a priority queue #104

perf!: use a priority queue #104

Conversation

Eh2406 commented Jul 11, 2021

mpizenberg commented Aug 20, 2021

Eh2406 commented Aug 20, 2021

Eh2406 commented Sep 13, 2021

mpizenberg commented Sep 14, 2021

Eh2406 commented Oct 2, 2021

Eh2406 commented May 28, 2022

mpizenberg commented Oct 22, 2022

mpizenberg commented Oct 22, 2022

mpizenberg commented Nov 1, 2022

Eh2406 commented Nov 1, 2022

Eh2406 commented Oct 26, 2023

Eh2406 commented Oct 26, 2023

charliermarsh commented Oct 29, 2023

mpizenberg commented Oct 29, 2023 • edited Loading

Eh2406 commented Oct 30, 2023

aleksator left a comment

Choose a reason for hiding this comment

aleksator commented Nov 5, 2023

Eh2406 commented Nov 6, 2023

mpizenberg commented Oct 29, 2023 •

edited

Loading