Shape mismatch in MJX's put_data function #2141

rustam-e · 2024-10-15T21:11:33Z

Intro

Hi!

I am a masters student at Vrije Universiteit Amsterdam, I use MuJoCo for my research on Modular Robotics.

My setup

Currently, Mujoco - trying to convert existing Evolutionary Computing for modular robotics framework Revolve2 (https://github.com/ci-group/revolve2) from Mujoco to MJX 3.2.3

What's happening? What did you expect?

I am following the provided tutorial on https://colab.research.google.com/github/google-deepmind/mujoco/blob/main/mjx/tutorial.ipynb#scrollTo=Jtz7j1PDOnw5 and convert currently working code running the mujoco.mj_step to run using MJX.

I get an error related efc_J:

Steps for reproduction

checkout https://github.com/rustam-e/revolve2/tree/develop
dependency installation - https://ci-group.github.io/revolve2/installation/index.html
run the experiment with python examples/4_example_experiment_setups/4c_robot_bodybrain_ea-mjx/main.py

the only differences from a functional examples/4_example_experiment_setups/4c_robot_bodybrain_ea/main.py can be found in rustam-e/revolve2@30bfc7f

Minimal model for reproduction

the issue is consistent with all of the experiements we ran

Code required for reproduction

running any of the examples by changing
from:
from revolve2.simulators.mujoco_simulator import LocalSimulator

to:
from revolve2.simulators.mjx_simulator import LocalSimulator

would throw consistently the same error like:
ValueError: could not broadcast input array from shape (0,10) into shape (4,10)

the only diference across examples being the second dimension

Confirmations

I searched the latest documentation thoroughly before posting.
I searched previous Issues and Discussions, I am certain this has not been raised before.

The text was updated successfully, but these errors were encountered:

rustam-e · 2024-10-16T19:53:50Z

The issue seems to be due to the fact the the Mujoco did not instantiate efc_J property of the model, but running the mj_step once before the put_data call seems to have resolved the issue

yuvaltassa · 2024-10-17T21:35:39Z

It's good that you found the solution, but put_data should either handle this situation (being called before any arena allocations) gracefully or, if that's not possible, give the user an informative error.

That, I think, is a bug. WDYT @erikfrey ?

rustam-e · 2025-02-18T21:05:32Z

@yuvaltassa we finally completed the research project we were working on - Benchmarking MJX vs Mujoco vs combining resource allocation with Mujoco and MJX running in parallel on a single machine - thought I might share if you're interested and was hoping for your opinion on methodology and whether you feel that the experiment was fairly conducted.

🔍 Key Observations:
✅ CPU dominance: In most cases, CPU-only simulations were the fastest.
✅ Parallelism overhead: At low variant counts (number of Evolutionary algorithms to be evolved), the overhead of running CPU and GPU together negated performance benefits.
✅ Scalability potential: At higher variant counts, the hybrid CPU + GPU strategy approached or even hinted at surpassing single-hardware performance.
✅ Cluster computing > GPU acceleration? Our results suggest that for Mujoco simulations, a more effective acceleration approach might be parallelizing across multi-core CPU clusters rather than relying solely on GPUs.
These findings emphasize the importance of strategic workload distribution rather than blindly leveraging GPUs for acceleration. Future research could explore how to refine hybrid execution models for large-scale EC problems.
📄 Read the full study here: https://arxiv.org/abs/2502.11129

erikfrey · 2025-02-19T23:56:44Z

Hi @rustam-e,

Two things you should do for your MJX benchmarking:

Use jax.lax.scan or jax.lax.fori_loop with length=n_n_steps and some reasonable unroll. So something like:

def unroll(d, _):
  d = step(mjx_model, d)
  return d, None

unroll_fn = jax.jit(lambda d: jax.lax.scan(unroll, d, None, length=n_n_steps, unroll=4)[0])

# burn in
datas = unroll_fn(datas)

# measure
t = time.perf_counter()
...

To get real perf timings, you must call block_until_ready() on something to ensure the GPU operation is complete, otherwise you are timing how quickly you can enqueue work to the GPU, not how quickly it runs. So something like:

datas.qpos.block_until_ready()

elapsed = = time.perf_counter() - t

re: bug yes agreed you shouldn't need to call mj_step per se. I'll try to repro this but a simpler repro case would help quite a bit. Do you have something that's just a few lines?

rustam-e added the bug Something isn't working label Oct 15, 2024

rustam-e changed the title ~~Shape mismatch in put_data function~~ Shape mismatch in MJX's put_data function Oct 15, 2024

rustam-e closed this as completed Oct 16, 2024

yuvaltassa reopened this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape mismatch in MJX's put_data function #2141

Shape mismatch in MJX's put_data function #2141

rustam-e commented Oct 15, 2024 •

edited

Loading

rustam-e commented Oct 16, 2024

yuvaltassa commented Oct 17, 2024

rustam-e commented Feb 18, 2025

erikfrey commented Feb 19, 2025 •

edited

Loading

Shape mismatch in MJX's put_data function #2141

Shape mismatch in MJX's put_data function #2141

Comments

rustam-e commented Oct 15, 2024 • edited Loading

Intro

My setup

What's happening? What did you expect?

Steps for reproduction

Minimal model for reproduction

Code required for reproduction

Confirmations

rustam-e commented Oct 16, 2024

yuvaltassa commented Oct 17, 2024

rustam-e commented Feb 18, 2025

erikfrey commented Feb 19, 2025 • edited Loading

rustam-e commented Oct 15, 2024 •

edited

Loading

erikfrey commented Feb 19, 2025 •

edited

Loading