Skip to content

Latest commit

 

History

History
417 lines (368 loc) · 21.8 KB

nxopt.md

File metadata and controls

417 lines (368 loc) · 21.8 KB

Nxopt is an optimal solver for the Rubik's cube.

Optimally solving a position of the Rubik's cube is done by using a refinement of double-ended search. In double-ended search, we make a list of all positions at distance a from solved and at distance b from the target position, for increasing values of a and b, until we find a position that is in a list generated from solved and a list generated from the target position. This meet-in-the-middle approach can be very effective for many search spaces.

The major refinement performed is to do the search from solved only once, and store the results in a table called a pruning table or pattern database. This can then be saved to disk and reused for many optimal solves. This amortizes the cost of generating a big table (doing a large search) over many solves of optimal positions. In addition, it allows us to spend some time compressing this table in various ways, such as reducing it by symmetry, in order to make more effective use of our resources.

If you ignore the cost of generating the pruning table (since that cost is amortized over many solves), optimally solving Rubik's cubes is an unusual algorithm that scales as the square of Moore's Law. Intuitively, empirically, and analytically, doubling the size of the pruning table will half the CPU time required to perform a search. Further, search is embarrassingly parallelizable so speed increases with number of cores. If we double the number of cores and double the size of our memory to store the pruning table, we will likely quadruple the speed of the algorithm.

The use of pruning tables in this fashion was originated by Korf (at least, in the literature, which is what counts); the nxopt solver is different from Korf's original solver only in detail, not in broad strokes. In both of these we use a pruning table to look up distance estimates for positions; these distance estimates must be admissible (that is, they need not be accurate, but they must only ever underestimate the distance to solved, and never overestimate it). Thus, the pruning table provides an admissible heuristic for iterated depth-first searches from the target position.

The nxopt solver uses a number of techniques to improve its performance. First, it supports twelve different sizes of pruning tables, from 20MB to 4TB, so we can select the one appropriate for a given computer. We use two bits per pruning table entry, to represent that the estimated distance is less than or equal to, exactly one more than, exactly two more than, or at least three more than a table-specific specific base value. Since an estimate of "less than or equal to" a given base value is not admissible, we also have a smaller pruning table with 4-bit entries that provides better admissible estimates for when the distance value is not greater than the base value. This smaller pruning table is interleaved with the larger pruning table so an entry for a position in both the larger and the smaller table always reside in the same cache line.

Further, our pruning tables are based on cosets of a subgroup of the cube group; this subgroup always includes all U and D face rotations. This allows us to not only calculate a larger value than the pruning table itself supplies, but also allows us to reduce the branching factor in the search itself. We organize the subgroups so we can use 16-way symmetry to reduce the size of the pruning table but still get six lookups in the pruning table for a given position, increasing the chance of that search node being pruned. As we solve, we consider for each node both the position itself and its inverse position, selecting the one that has fewer possible moves based on the results from our pruning table.

Our solver supports both half-turn and quarter-turn metrics. It does not take advantage of any symmetry in the target position, nor does it parallelize a specific target position. It does parallelize the search across multiple positions.

On a two-node, 16-physical-core, 256GB machine this program can solve 33 random positions per second in the half turn metric.

Subgroups

Pruning tables are based on subgroups of the cube group. Each pruning table entry corresponds to one coset with respect to the subgroup. The size of the pruning table is thus the number of cosets, which is the size of the whole cube group divided by the size of the subgroup. We separate consideration of corners and edges in constructing our subgroups and their cosets, which means that the description of our subgroups do not guarantee parity match between the corners and the edges. Thus, the number of cosets multiplied by the size of the subgroups will be twice the size of the cube group; other than this, there is no impact on the program.

The subgroups we use are not those generated by a subset of the moves of the cube, as some famous subgroups (like the Kociemba subgroup) are. Instead, they are more general subgroups described by generators that may not be physically realizable on the cube. These generators are separated by their impact on corners and edges, and then within each of these separations, by their impact on permutations and orientations.

The separation of the subgroup into permutations and orientations introduces significant restrictions on the orientations we can use as generators. We identify a subset of the cubies that we track orientations on, and we must ensure that the subgroup never permits a cubie that we track the orientation of, to permute with a cubie that we do not track the orientation of. Thus, the permutation component of the subgroup must divide the cubies into orbits for which we track permutations and orbits for which we do not (although either of these can be empty).

For a subgroup H, the coset represented by a particular pruning table is H c, where c is any member of the coset. That is, if c_1 and c_2 are both positions in that coset, then we have H c_1 is the same as H c_2 which is that coset. (Here operations are performed left-to-right, as is conventional in group theory, rather than right-to-left.) The pruning table entry is the length of the shortest coset representative, or equivalently, the minimum distance to solved of any of the positions in the coset. This value is also known as the coset distance. Since every position in the coset has a distance of at least this value, this makes the entries in our pruning table an admissible heuristic.

We always choose an cubie orientation convention such that no moves in the subgroup affect the orientation of the solved position; this allows us to use the orientation of a given position directly and orthogonally to the permutation of a given position directly to determine the relevant coset. Further, we perform symmetry reduction by the 16 symmetries around the vertical axis (that running through the up and down faces) so we select an orientation convention that is preserved (for the solved state) by the reflections and rotations in that symmetry.

This program supports twelve distinct pruning table sizes, each generated by a particular subgroup. Each subgroup contains the same generators that affect only the corners (but not the edges). The corner permutation generators for the subgroup permute the top corners arbitrarily and separately the bottom corners arbitrarily (but do not permit switching any top corners with any bottom corners). We also track the orientation of all corners. The total size of this subgroup is thus (8 choose 4) * (3^7) or 153,090 before symmetry reduction. After symmetry reduction, we find a total of 9930 distinct reachable positions. Our total symmetry reduction from a memory perspective is thus 15.4; we could get closer to 16 by using additional information from the edges, but this is already 96.3% effective and keeps the code simple.

So no matter what subgroup we use for the edges, the size of the pruning tables determined by the corner cubies will always be a factor of 9930. This 9930 is always used as the most significant factor in the pruning tables; each pruning table has exactly 9930 rows, where each row is calculated from the edge information.

It may appear we are not using sufficient information on corner permutations, since we only encode which cubies are in the top layer and which are in the bottom in the index to our pruning tables. However, we perform a pruning table lookup in all three axes, effectively telling us which corner cubies are in the left layers and which are in the right, and then which corner cubies are in the front and in the back, so when this information is combined, we see that in total we have combined full information on the location of each corner cubie. Since each lookup also encodes full orientation information for corner cubies, there is only a single corner position that will resolve as solved in all three pruning tables.

Some cosets will be coded redundantly in the pruning table. If a particular corner position and orientation exhibits a symmetry shared by the 16 we perform pruning table reduction by, then each coset in that portion of the pruning table will be encoded redundantly. We could eliminate this redundancy by examining some edge cubie information to further break the symmetries, but this adds complexity while only saving a very small amount of memory so we do not do this. Care must be taken when generating the pruning tables to ensure the redundant cosets are fully and correctly populated, however.

Edges

The pruning tables are only distinguished by what edge generators we permit in the subgroup definitions. There are twelve subgroups supported by this program, generated by four distinct options for edge permutation generators, and three distinct options for edge orientation operators. In all twelve cases, the subgroup does not include any generators that affect which middle four edges in either position or orientation. (The middle edges are those not in the up or down faces.) Every coset thus has the same four edges in the middle four positions, and which four edges these are contributes a factor of (12 choose 4) or 495 to every pruning table size.

This factor of 495 is always used as the least significant component of the pruning table index. We exploit the fact that 495 is close to 512 to combine a somewhat smaller pruning table into our main pruning table and gain performance efficiency. Every entry in our big pruning table is two bits, so normally 495 entries would take 990 bits or about 248 bytes. We split the 495 entries into 8 groups of 62 entries (the last entry in the last group not being used). Normally 62 entries at two bits each would require 15.5 bytes; we allocate 16 bytes for each (this is smaller than a cache line) and use the extra four bits to encode the minimum actual value of the 62 entries.

The reason we need to do this is that two bits per entry is insufficient to code the actual integral value of that entry. Instead, we associate with each pruning table a base value, and we encode in the two bits one of four possible values: 0. the coset distance is less than or equal to the base value

  1. the coset distance one plus the base value
  2. this coset distance two plus the base value
  3. the coset distance three or more plus the base value For entries encoded as 1, 2, and 3, we get a value out of the pruning table entry that is likely fairly accurate. But when we see an encoding of 0, we cannot assume the actual coset distance is greater than 0 without violating our requirement of admissibility, so we get no useful pruning from this value. So we encode the minimum actual coset distance across all 16 entries using the extra four-bit field, effectively embedding a pruning table 62 times smaller, with corresponding entries in the same cache line, and using only 3.4% extra space to it.

So far we have a factor of 9930 times 495 or 4,915,350 for our pruning table. We turn our attention now to the four different ways we extend our subgroup for edge permutations.

The first, called EP1, adds generators that arbitrary permute edges in the middle layer, and separately generators that arbitrary permute edges not in the middle layer. This leaves no additional permutation information available to distinguish cosets, so its contribution to coset indexing is exactly 1.

The next, called EP2, adds only generators that arbitrarily permute edges not in the middle layer. This means the coset index includes information on the permutation of the four edges in the middle layer, which adds a factor of 4! or 24 to our coset index.

EP3 includes generators that permute edges in the top layer, and generators that permute edges in the middle layer, and generators that permute edges in the middle layer. Thus, the coset index contains information about which four of eight edges are in the top layer, and this adds a factor of (8 choose 4) or 70 to our coset index.

EP4 includes generators that permute edges in the top layer, and generators that permute edges in the bottom layer. Thus the coset index combines information from which edges are in the top layer with permutation information from the middle layer, contributing a factor of 4! * (8 choose 4) which is 1680 to our coset index.

Separately, we permit some edge orientation generators in our subgroup. There are three. The first, EO1, permits reorientation of any top or bottom edges, so our coset index adds a factor of 2^4 or 16 representing the orientation of the middle edges. The second, EO2 instead permits reorientation of only the middle layer, so our coset index adds a factor of 2^8 or 256. The final, EO3, permits no reorientation of any edges, so our coset index adds a factor of 2^11, or 2048 (not 2^12 because one of the edges is forced by parity considerations).

Overall then we support pruning tables with 9930 * (1, 24, 70, 1680)

  • (16, 256, 2048) * 495 total entries. At 16 bytes per 62 entries, this yields the memory requirements given in the following table:
EO1 EO2 EO3
EP1 20.3MB 325MB 2.60GB
EP2 487MB 7.79GB 62.3GB
EP3 1.42GB 22.7GB 182GB
EP4 34.1GB 546GB 4.36TB

All the choices we have made for subgroup selection are straightforward; there are additional workable subgroup selection possibilities that enhance the set of pruning tables that can be constructed. We may explore some of these in the future.

Search

We enhance a normal iterated-depth-first search (IDFS) in a number of ways derived from the way we construct our pruning tables. To explain this we first discuss some basic group theory.

The distance of a position is the fewest number of moves (in a particular metric) that solves or generates that position. Just like each move has an inverse move, every position has an inverse position (this is part of the definition of a group). The distance of a position is identical to that of its inverse, and also identical to that of all of the symmetry-equivalent positions.

Our pruning tables reduce by symmetry only along one axis. This means we can take a single position, reorient it so some other axis becomes that preferred (vertical) axis, calculate what coset that position belongs to, and get another probe of the pruning table. Since there are three distinct axes, we can probe our single pruning tables three times for a given position, and take the maximum of the three values as our heuristic estimate. This substantially improves the effectiveness of the pruning table, more than making up for the time required for the additional probes.

We can do more than the three probes by using the inverse of the position at the current search node. Calculating an inverse of a cube position can be done quickly, and since this inverse position has the same true depth, we can exploit it to obtain three additional probes for that position, gaining further effectiveness.

The real secret sauce though is in combing the previous ideas with the fact that the pruning table we are using is generated from a subgroup that includes all up and down moves. Let's say we are at position p at some search node, and the pruning table tells us that the distance of that position is at least x. This means the shortest coset representative c such that p c' in H (where ' means inverse) has length at least x. Now, if c starts with any move of the up or down face, both of which are in H, then c' ends with the inverse of that move, and we can find a shorter c' just by trimming that move off the end, which contradicts our assumption. So a pruning entry of x for a position p indicates not only that the minimum length of a solving sequence for p is x, but in addition, no such solving sequence of length x exists that ends in a move of the up or down face.

Let us assume for the moment that we look up all three reorientations of a position p, and all three lookups return the same value x. The first lookup, with the orientation unchanged, tells us that any solving sequence of length x cannot end in a move of U or D faces. The second lookup, with the front-to-back axis remapped to up-down, tells us that any solving sequence of length x cannot end in a move of F or B faces. Finally, the third lookup, with the left-right axis remapped to up-down, tells us that any solving sequence of length x cannot end in a move of L or R faces. We have exhausted all possible last moves, and thus can conclude that no solving sequence of length x exists; the minimum solving sequence must be of length x+1.

The chance that all three pruning table lookups return the same value may be small. But we can still exploit the same information, but in a different way. Let us say the UD forward lookup returned x, the FB forward lookup returned x+1, and the LR forward lookup returned x+2. Similarly, the UD inverse lookup returned x, the FB inverse lookup returned x, and the LR inverse lookup returned x+1. Let's say the permitted depth at this node (from our iterated depth-first search) was exactly x. We know any solving sequence of length x cannot end with a move of the F, B, L, or R faces, and we also know that any solving sequence of length x cannot start with L or R. Thus, for the next move of the solving sequence, there are four of six possible faces we can twist. However, if we invert the position (and thus, all solving sequences), only two of the six possible faces are allowed for twisting; this is a smaller branching factor. Thus, by searching from the inverse position, we can decrease the overall branching factor and significantly speed up our search.

Another optimization we perform is to do only at most five, rather than six, lookups per node. Let us say from node n_1 to node n_2 we did a twist of the F face. The table lookup for n_1 for FB inverse is precisely the same coset as the node lookup for n_2 for the FB inverse case, because the reorientation of FB to the vertical axis moves F to either the up or down faces, and any twist of these faces is in the subgroup. The inverse operation puts that move at the front of the subsequence, where it does not change the coset. So we only need to do at most five lookups per node; we can always inherit one value from the parent node. Many node lookups backtrack immediately; we actually average somewhere around 2.5 node lookups per node.

For the quarter-turn case, we also take overall cube parity into consideration. So if a pruning table lookup says the distance is at least 10, and our current node position has odd parity, we can reinterpret that lookup to indicate the true distance is at least 11.

Base

The values in the true coset distance list have a specific distribution with most of the values clustered around a couple of values. Since our pruning table only uses two bits per entry, we want to choose our base value carefully to provide the most effective pruning. A good start is to just choose one less than the most common value as our base, but ultimately, empirical runs of random positions against possible base values are the best way to select a base. Every node we prune eliminates somewhere between 3 and 15 subnodes (depending on metric and actual values seen) so there is a

Average depths

The following table gives the base value and the average depths returned from our pruning tables. The averages are given both from a single lookup and also from a combined lookup of six entries, including the (x,x,x) -> (x+1) optimization. All lookups include the probe of the larger pruning table, followed by the fallback to the smaller pruning table if the code read from the larger pruning table was 0. Due to our branching reduction trick we gain more reduction in search than other optimal solving programs that have the same average depth. We were only able to compute these values for ten of the twelve sizes, lacking access to a machine with enough memory to do the two largest sizes. We show these values for both half-turn and quarter-turn metric.

size metric base average1 average6
EO1,EP1 20.3MB h 7 8.90627 9.84321
EO1,EP1 20.3MB q 9 9.88396 10.9947
EO2,EP1 325MB h 8 9.88681 10.8274
EO2,EP1 325MB q 10 11.0091 12.0452
EO1,EP2 487MB h 9 9.92374 10.9551
EO1,EP2 487MB q 10 11.492 12.5162
EO1,EP3 1.42GB h 9 10.2745 11.1472
EO1,EP3 1.42GB q 10 11.7077 12.6858
EO3,EP1 2.60GB h 9 10.6748 11.5986
EO3,EP1 2.60GB q 11 11.8365 12.9383
EO2,EP2 7.79GB h 9 10.9391 11.8728
EO2,EP2 7.79GB q 11 12.4016 13.3269
EO2,EP3 22.7GB h 10 11.3619 12.2277
EO2,EP3 22.7GB q 12 12.8229 13.9476
EO1,EP4 34.1GB h 10 11.4555 12.3198
EO1,EP4 34.1GB q 12 13.0704 14.0768
EO3,EP2 62.3GB h 10 11.79 12.7223
EO3,EP2 62.3GB q 12 13.3901 14.3404
EO3,EP3 182GB h 11 12.1112 13.0352
EO3,EP3 182GB q 12 13.8659 14.8389
EO2,EP4 546GB - - - -
EO3,EP4 4.36TB - - - -

(Note: most of this document will be migrating into the .web file in the near future.)