-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement random greedy descent Ising solver #1
Comments
Randomized greedy descent means that spins are addressed at random. This could mean several different things:
By contrast one might consider two non-random orderings of high interest:
In summary, the two most important generalizations of steepest greedy are probably [1] and [2] IMO. But accommodating other cases might be useful. |
Note that the above comments also apply to variations upon dwave-neal. This might be considered a seperate feature request. dwave-neal currently uses sequential ordering, but could be generalized to accept random ordering. |
Thanks @jackraymond. I considered some of these (along with a few other), but it's great to have a specific use case in mind when thinking which one to implement first (and how). |
It's not an exhaustive list. I could definitely make regular use of [1,2] though. |
I believe we should aim for a highly efficient implementation that doesn't completely trash the guarantees of randomness. This would puts the kibosh on [1] (a call to the system RNG inside a tight innermost loop is out of the question efficiency-wise, if that's what you really meant) and [2] (fixed iteration order is most vulnerable to worst-case instead of average-case performance that randomness is supposed to guarantee). I would also rule out the colored-node version due to the overhead costs of computing the coloring. If you want to spend time finding a good-quality coloring, don't waste it on a greedy approach that is going to stop at the first local minimum it finds. |
To my mind the option of generating a random permutation of nodes, during initialization, then stepping through it in the innermost loop, offers the best combination of efficiency and randomized robustness. For variety, each new sample could start at a random location in the permutation, and/or could incorporate random step sizes each time. This would provide the desired variation among samples, and choosing the ``next'' node in the greedy iteration would take a couple nanoseconds for an array access instead of, idk, hundreds of microseconds for a system call? |
Re: random number generator speed -- we don't have to hard-code it to a slow(er) one. We can use |
Sure fast RNG is fine with me, as long as there is decent variation from sample to sample.
It might be worth checking whether it really is faster than building single permutation array and re-using it with only the overhead cost of array access.
|
I guess given Jack's preferences maybe this should be a user parameter? |
I don't see why slower speed would be a killer objection to [1], but it could be an argument against it being defaulted or being implemented as a first priority, particularly for optimization use cases. I think that following best-practice, common and simple conventions for randomness is more important. Particularly for comparison of classical dynamics and quantum dynamics, or for evaluation of pathological cases (those where randomization matters most). Obtaining a heuristic coloring can be achieved in O(N) operations, and for some common special cases (like coloring bipartite graphs, e.g. Chimera, RBM) the colorings are optimal. So I don't see that as a killer objection either. |
Fair enough. I want to use this code for benchmarking against the QPU. I want to claim truthfully that this is a state-of-the art implementation that runs as efficiently as we (i.e. Rad) could make go. I want to defuse any suspicion that D-Wave deliberately made it run slower than it should. In the larger field of classical solvers, Greedy is not going to win prizes for |
I don't believe parallel updating based on pre-computing a coloring would lead to any real speedup if the code itself is not actually parallelized. But I'm assuming that the updating is done on the fly so there can be multiple greedy bit flips per sweep -- maybe that assumption is wrong? |
In my experience, on lattices and bicliques for a variety of models, a colored ordering is typically worse for heuristic optimization than a random permutation (because correlated block moves are obstructed) when evaluated on a per sweep basis. Using a fixed ordering, I can achieve a colored ordering by relabeling my variables, if I wanted to. I am not particularly interested in that use case, unless it is exploited at hardware level. In matlab, I have exploited heuristic+exact colored orderings for very large speedups in the past. |
We do one bit flip per sweep. At each step (sweep) the best direction/dimension/variable is picked (by energy contribution) and flipped. Parallel updates based on coloring sound nice, but I doubt we would see a significant speedup on graphs of our typical scale (~10k nodes). |
One other place we need to consider the role of randomness is the case of tied energies in SGD. How is a tie broken when steepness is tied, we should as a minima document how this is done given that we often study models with degenerate energy levels (the source of ties). Several possibilities exist, probably the most natural random and systematic choices are : broken uniformly at random, broken systematically towards the highest (numeric rank) variable. |
@mhlr expressed an interest in the case of select uniformly at random amongst spin flips conditioned on them lowering energy. In my enumeration above, I was implicitely assuming that a selected spin is only flipped if it lowers energy. [1] is equal to this case. Although, the notion of 'time' becomes a bit different. The way I wrote everything falls into a 'sweep-based' notion of time, whereas thinking about small sets of moves is more like the 'n-fold way' detached from physical notions of time. This brings up a question of how we judge convergence in [1], and also whether we allow zero-energy change flips. Allowing equal-energy flips is more consistent with the physical dynamics interpretation - (i.e. limit of zero temperature metropolis or gibbs sampling). However, in this case we have no guarantees of convergence, so we might need to think about this more. The set of zero or downward energy spin flips may not become empty when degenerate energy levels are present. |
I was thinking of only non-zero flips since I would like the process to self terminate. |
@jackraymond, our current SGD implementation is detached from any physical interpretation (as long as we consider the Hamiltonian to be nothing more than a function we're minimizing). At each step, we pick the first dimension (in BQM variable order) that improves objective the most. We stop when we reach a (local) minimum. If you think we need something more physics-inspired, we can consider generalizing simulated annealing sampler as well. |
My presentation of methods was more suitable for the sweep based methods (simulated annealing). greedy routines can be considered a 'quenching' limit of an SA routine, but it was pretty unhelpful to do it that way. |
Hi Rad, so is the only randomness in the SGD you describe due to initialization? I would prefer a version with a random tie-breaker (based on sweep order perhaps) rather than always preferring the lowest-index node --- more robust against worst-case scenarios. Any objections to doing that? |
Whether systematic or random tie-breaking applies, most important thing is to document. If only randomized were available, it is possible in general to recover the rank ordering behaviour by adding a perturbation on the bqm linear terms of the form eps*range(num_var), (eps small compared to all gaps) so the randomized ordering is sufficient for me. |
@CatherineCMcGeoch, I think I'd rather keep the current SGD deterministic, and implement a new random greedy descent solver (as this issue was originally about). Makes sense? |
Sure, whatever you like. Are you going to keep the loop that sweeps through all n nodes? To my mind this is a weird coding quirk that physicists enjoy. In CS (greedy, SA, whatever) there is one loop that samples nodes in random order, flipping and updating on the fly. I assume that the extra loop makes it easier to simulate natural processes, but it doesn't seem like a good idea efficiency-wise. |
And I think the concept of implementing separate solvers rather than one highly parameterized version that pleases everybody, is a good idea too. A broad-ranging benchmark test would want to try them all, anyway. |
No, I agree, a random sampler does not need the sweeping loop. I mean, we actually have a little known version of the SGD that uses a red-black tree to keep energy gains per variable flip sorted, so the selection of the steepest flip is O(1). But due to larger constant factor, this version is faster only for very large sparse problems. |
Red black trees! I used to teach them! Or sometimes splay trees, just to mix things up! Though I did see a talk a several years ago suggesting that plain vanilla BSTs typically work just fine in most applications. |
No description provided.
The text was updated successfully, but these errors were encountered: