Table of Contents
MuJoCo MPC (MJPC) is a tool for the design and experimentation of predictive control algorithms. Such algorithms solve planning problems of the form:
where the goal is to optimize state,
Behaviors are designed by specifying a per-timestep cost
The cost is a sum of
- A nonnegative scalar weight
$w \ge 0$ that determines the relative importance of this term. - A norm function
$\rm n(\cdot)$ taking a vector and returning a non-negative scalar that achieves its minimum at$\bf 0$ . - The residual
$\bf r$ is a vector of elements that is "small when the task is solved". Residuals are implemented as custom MuJoCo sensors.
We additionally provide a convenient family of exponential transformations,
The mapping,
- It is defined and smooth for any
$R$ - At
$R=0$ , it reduces to the identity$\rho(l; 0) = l$ - 0 is a fixed point of the function:
$\rho(0; R) = 0$ - It is nonnegative:
$\rho(l; R) \geq 0$ if$l \geq 0$ - It is strictly monotonic:
$\rho(l; R) \gt \rho(q; R)$ if$l > q$ - The value of
$\rho(l; R)$ has the same unit as$l$
Many planners (e.g., Gradient Descent and iLQG) utilize derivative information in order to optimize a policy.
Gradients are computed exactly:
Hessians are approximated:
via Gauss-Newton.
Derivatives of
The residual Jacobians, i.e.,
Likewise, we also provide derivaties for the risk (in)sensitive cost functions.
Gradients are computed exactly:
Hessians are approximated:
as before, via Gauss-Newton.
In order to create a new task, at least two files are needed:
- An MJCF file that describes the simulated model and the task parameters.
- A C++ file that implements the computation of the residual.
It's considered good practice to separate the physics model and task description into two XML files, and include the model file into the task file.
Configuration parameters for the Planner
and Agent
are specified using MJCF custom numeric elements.
<custom>
<numeric
name="[setting_name]"
data="[setting_value(s)...]"
/>
</custom>
Values in brackets should be replaced by the designer.
setting_name
: String specifying setting namesetting_value(s)
: Real(s) specifying setting value(s)
Agent
settings can be specified by prepending agent_
for corresponding class members.
Planner
settings can similarly be specified by prepending the corresponding optimizer name, (e.g., sampling_
, cross_entropy_
, gradient_
, ilqg_
).
It is also possible to create GUI elements for parameters that are passed to the residual function. These are specified by the prefix residual_
, when the suffix will be the display name of the slider:
<custom>
<numeric
name="residual_[name]"
data="[value] [lower_bound] [upper_bound]"
/>
</custom>
value
: Real number for the initial value of the parameter.lower_bound
: Real specifying lower bound of GUI slider.upper_bound
: Real specifying upper bound of GUI slider.
Prefixes residual_list_
and residual_select_
are (both) specified in order to generate a selection of buttons (0) or drop-down list (1):
<custom>
<numeric
name="residual_list_[name]"
data="[Item (0)]|[Item (1)]|[Item (3)]"
name="residual_select_[name]"
data="0" or "1"
/>
</custom>
As mentioned above, the cost is a sum of terms, each computed as a (scalar) norm of a (vector) residual. Each term is defined as a user sensor. The sensor values constitute the residual vector (implemented by the residual function, see below). The norm for each term is defined by the user
attribute of the respective user sensor, according to the following format:
<sensor>
<user
name="[term_name]"
dim="[residual_dimension]"
user="
[norm_type]
[weight]
[weight_lower_bound]
[weight_upper_bound]
[norm_parameters...]"
/>
</sensor>
-
term_name
: String describing term name (to appear in GUI). -
residual_dimension
: number of elements in residual. -
norm_type
: Integer corresponding to the NormType enum, see norms. -
weight
: Real number specifying weight value of the cost. -
weight_lower_bound
,weight_upper_bound
: Real numbers specifying lower and upper bounds on$\text{weight}$ , used for configuring the GUI slider. -
parameters
: Optional real numbers specifying$\text{norm}$ -specific parameters, see norm implementation.
The user sensors defining the residuals should be declared first. Therefore, all sensors should be defined in the task file. After the cost terms are specified, more sensors can be defined to specify the task, e.g., when the residual implementation relies on reading a sensor value. If a sensor named trace%i
exists, the GUI will use it to draw the traces of the tentative and nominal trajectories.
Computing the residual vectors is specified via a C++ function with the following interface:
void Residual(const mjModel* model,
const mjData* data,
double* residual) const
model
: the task'smjModel
. Markedconst
, because this function should not changemjModel
values.data
: the task'smjData
. Markedconst
, because this function should not change mjData values.residual
: The function's output. Each element in the double array should be set by this function, corresponding to the ordering and sizes provided by the user sensors in the task XML. See example below.
NOTE: User sensors corresponding to cost terms should be specified before any other sensors.
As an example, consider these snippets specifying the Swimmer task:
<sensor>
<user name="Control" dim="5" user="0 0.1 0 0.1" />
<user name="Distance" dim="2" user="2 30 0 100 0.04" />
<framepos name="trace0" objtype="geom" objname="nose"/>
<framepos name="nose" objtype="geom" objname="nose"/>
<framepos name="target" objtype="body" objname="target"/>
</sensor>
void Swimmer::ResidualFn::Residual(const mjModel* model,
const mjData* data,
double* residual) const {
// initialize counter
int counter = 0;
// "Control"
mju_copy(residual, data->ctrl, model->nu);
counter += model->nu; // increment counter
// "Distance"
double* target = mjpc::SensorByName(model, data, "target");
double* nose = mjpc::SensorByName(model, data, "nose");
mju_sub(residual + counter, nose, target, 2);
counter += 2;
// test residual counter (recommended, optional)
CheckSensorDim(model, counter);
}
The swimmer's cost has two terms:
- The first user sensor declares a quadratic norm (index 0) on a vector of 5 entries with weight 0.1. The first line of the
Residual
function copies the control into this memory space. - The second user sensor declares an L2 norm (index 2) with weight 30 and one parameters: 0.04. The
Residual
function computes the XY coordinates of the target's position relative to the nose frame, and writes these two numbers to the remaining 2 entries of the residual vector.
The repository includes additional example tasks:
- Humanoid Stand | Walk | Tracking
- Swimmer
- Walker
- Cart-Pole
- Acrobot
- Particle
- Quadruped Hill | Flat
- In-Hand Manipulation
- Quadrotor
- Panda Arm Manipulation
The task
class supports transitions for subgoal-reaching tasks. The transition function is called at every step, and is meant to allow the task specification to adapt to the agent's behavior (e.g., moving the target whenever the agent reaches it). This function should test whether a transition condition is fulfilled, and at that event it should mutate the task specification. It has the following interface:
void TransitionLocked(mjModel* model, mjData* data)
model
: the task'smjModel
. A transition could mutate fields ofmjModel
.data
: the task'smjData
. A transition should mutate fields ofmjData
, see below.
Because we only sync the mjData
state between the GUI and the agent's planner, tasks that need transitions should use mocap
fields or userdata
to specify goals. For code examples, see the Transition
functions in these example tasks: Swimmer (relocating the target), Quadruped (iterating along a fixed set of targets) and Hand (recovering when the cube is dropped).
Additionally, custom labeled buttons can be added to the GUI by specifying a string of labels delimited with a pipe character: |
. For example:
<custom>
<text name="task_transition" data="Label (1)|Label (2)|Label(3)" />
</custom>
The purpose of Planner
is to find improved policies using numerical optimization.
This library includes multiple planners that use different techniques to perform this search:
- Predictive Sampling
- random search
- derivative free
- spline representation for controls
- Cross Entropy Method
- all properties of Predictive Sampling
- refits a nominal policy to mean of elite samples instead of using the best
- Gradient Descent
- requires gradients
- spline representation for controls
- Iterative Linear Quadratic Gausian (iLQG)
- requires gradients and Hessians
- direct representation for controls
Rollouts are performed after setting both the simulation and initialization states.
This includes:
- configuration:
data->qpos
- velocity:
data->qvel
- acceleration:
data->qacc
This includes:
- mocap:
data->mocap
- userdata:
data->userdata
- time:
data->time