In rock climbing, routes are given difficulty scores that represent how hard it is for a climber to get to the top. The difficulty score assigned to any given route assumes that the climber chooses the optimal sequence. However, the optimal sequence is not provided apriori. A climber could always choose, either deliberately or accidentally, to perform a suboptimal sequence (maybe they thought jumping past holds would be more fun!). This renders the problem of accurately predicting the difficulty of climbing routes challenging; a dataset of climbs associated with a difficulty score that does not have an associated sequence is, in some ways, incomplete.
Previous efforts to predict the difficulty of climbs from the sequence of moves do so by injecting human intuition with data preprocessing. They devise an algorithm, without any machine learning, to determine a plausible sequence using entirely human inputs. However, this methodology leads to predictions ultimately limited by the quality of the injected intuition. If determining the optimal sequence were easy, then this would be a non-issue. But finding the optimal sequence for climbs can be tricky, even for more experienced climbers, let alone distilling that knowledge into a deterministic formula. Thus, finding a way to optimize the sequence in a way not limited by the quality of the injected intuition stands to both be an interesting problem and enable more accurate predictions of climbs' difficulties.
Here, I show that using a dataset containing only routes and their difficulty scores it is possible to simultaneously extract reasonable optimal sequences and use them to predict the difficulty of climbs. With a test set accuracy of 49.1% and a
Each climbing route, called a 'problem', is defined by 1 or 2 starting holds, 1 or 2 finishing holds, and an arbitrary number of holds in between. The climber must start with two hands on the starting holds and work their way to the finish hold.
I approach the problem by defining a Graph Neural Network (GNN), where the individual problems are represented as graphs. Each unique combination of left-hand and right-hand positions defines a node; the node values for any given problem are static. The edge values on the graph represent the difficulty of moving from one position, or node, to the next. The only human intuition injected into the algorithm is that a move consists of moving one hand at a time: at any step, the climber can move either their left or right hand, connecting them to a new position. The difficulty of each move is not known and no human intuition is provided in its calculation.
A move is characterized by an 18 element vector. The position of the climber's lefthand,
Each hold on the wall is characterized by a two-element tuple index by its position and if it is being held by a left or right hand. The first value of the tuple represents the difficulty of the hold and the second represents the hold's angle of rotation. The next 8 elements in the move vector correspond to the characteristics of the holds found at the positions of
The difficulty of a move is calculated from the 18-element move vector with a feedforward NN. The model for the difficulty of a move is initialized randomly and learned with the following procedure:
- Use the model to calculate the edge values on all the problem graphs.
- Determine the optimal sequence for all the problems using the Dijkstra algorithm
- Calculate the difficulty of each climb, which is defined as the sum of the difficulty of the moves for the determined optimal sequence.
- Calculate the MSE loss of the predicted difficulty minus the true linearized difficulty of the climb (defined in more detail below).
- Use the loss to update the move difficulty model with backpropagation.
- Repeat steps 3-5 for a fixed number of epochs.
- Repeat steps 1-6 until convergence is observed.
Because the model is initialized randomly, the initially determined optimal sequences are also random, and therefore completely incorrect. Since the optimal sequence gets updated as the model is trained, it converges to sequences that are logically consistent with the assigned grades of the climbs, effectively learning how to climb.
The difficulty of any given climbing problem is represented by a number. The exact number and how it is presented is different in the U.S. than in Europe, but, ultimately, both provide similar information. A linear increase in the number characterizing the difficulty of a climb does not correspond to a linear increase in its difficulty: it corresponds to an exponential increase. Climbers have a heuristic formula for determining the difficulty of a climb from the difficulty of its component sequences. If you create a new climb that consists of linking together two climbs of difficulty