Skip to content

Commit

Permalink
Correctly compute plan variance in cross-entropy planner.
Browse files Browse the repository at this point in the history
Commit 734bda2 accidentally changed the variance computation from
```
sum((candidate_policy[idx] - avg)**2 for idx in elites) / (len(elites) - 1)
```

To:
```
sum((candidate_policy[0] - avg)**2 for idx in elites) / (len(elites) - 1)
```

I can't visually see a difference in performance on an in-hand reorientation task.
Fixes #350.

PiperOrigin-RevId: 696463663
Change-Id: I8f13e426d883cb01565d8f051ab29bf8a9a67471
  • Loading branch information
nimrod-gileadi authored and copybara-github committed Nov 14, 2024
1 parent 49783b8 commit 868f357
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions mjpc/planners/cross_entropy/planner.cc
Original file line number Diff line number Diff line change
Expand Up @@ -227,12 +227,13 @@ void CrossEntropyPlanner::OptimizePolicy(int horizon, ThreadPool& pool) {
for (int i = 0; i < n_elite; i++) {
// ordered trajectory index
int idx = trajectory_order[i];
const TimeSpline& elite_plan = candidate_policy[idx].plan;

// add parameters
for (int i = 0; i < num_spline_points; i++) {
TimeSpline::Node n = candidate_policy[idx].plan.NodeAt(i);
for (int t = 0; t < num_spline_points; t++) {
TimeSpline::ConstNode n = elite_plan.NodeAt(t);
for (int j = 0; j < model->nu; j++) {
parameters_scratch[i * model->nu + j] += n.values()[j];
parameters_scratch[t * model->nu + j] += n.values()[j];
}
}

Expand All @@ -247,12 +248,15 @@ void CrossEntropyPlanner::OptimizePolicy(int horizon, ThreadPool& pool) {

// loop over elites to compute variance
std::fill(variance.begin(), variance.end(), 0.0); // reset variance to zero
for (int t = 0; t < num_spline_points; t++) {
TimeSpline::Node n = candidate_policy[trajectory_order[0]].plan.NodeAt(t);
for (int j = 0; j < model->nu; j++) {
// average
double p_avg = parameters_scratch[t * model->nu + j];
for (int i = 0; i < n_elite; i++) {
for (int i = 0; i < n_elite; i++) {
int idx = trajectory_order[i];
const TimeSpline& elite_plan = candidate_policy[idx].plan;
for (int t = 0; t < num_spline_points; t++) {
TimeSpline::ConstNode n = elite_plan.NodeAt(t);
for (int j = 0; j < model->nu; j++) {
// average
double p_avg = parameters_scratch[t * model->nu + j];

// candidate parameter
double pi = n.values()[j];
double diff = pi - p_avg;
Expand Down

0 comments on commit 868f357

Please sign in to comment.