Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch in MJX's put_data function #2141

Open
2 tasks done
rustam-e opened this issue Oct 15, 2024 · 4 comments
Open
2 tasks done

Shape mismatch in MJX's put_data function #2141

rustam-e opened this issue Oct 15, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@rustam-e
Copy link

rustam-e commented Oct 15, 2024

Intro

Hi!

I am a masters student at Vrije Universiteit Amsterdam, I use MuJoCo for my research on Modular Robotics.

My setup

Currently, Mujoco - trying to convert existing Evolutionary Computing for modular robotics framework Revolve2 (https://github.com/ci-group/revolve2) from Mujoco to MJX 3.2.3

What's happening? What did you expect?

I am following the provided tutorial on https://colab.research.google.com/github/google-deepmind/mujoco/blob/main/mjx/tutorial.ipynb#scrollTo=Jtz7j1PDOnw5 and convert currently working code running the mujoco.mj_step to run using MJX.

I get an error related efc_J:

Steps for reproduction

  1. checkout https://github.com/rustam-e/revolve2/tree/develop
  2. dependency installation - https://ci-group.github.io/revolve2/installation/index.html
  3. run the experiment with python examples/4_example_experiment_setups/4c_robot_bodybrain_ea-mjx/main.py

the only differences from a functional examples/4_example_experiment_setups/4c_robot_bodybrain_ea/main.py can be found in rustam-e/revolve2@30bfc7f

Minimal model for reproduction

the issue is consistent with all of the experiements we ran

Code required for reproduction

running any of the examples by changing
from:
from revolve2.simulators.mujoco_simulator import LocalSimulator

to:
from revolve2.simulators.mjx_simulator import LocalSimulator

would throw consistently the same error like:
ValueError: could not broadcast input array from shape (0,10) into shape (4,10)

the only diference across examples being the second dimension

Confirmations

@rustam-e rustam-e added the bug Something isn't working label Oct 15, 2024
@rustam-e rustam-e changed the title Shape mismatch in put_data function Shape mismatch in MJX's put_data function Oct 15, 2024
@rustam-e
Copy link
Author

The issue seems to be due to the fact the the Mujoco did not instantiate efc_J property of the model, but running the mj_step once before the put_data call seems to have resolved the issue

@yuvaltassa yuvaltassa reopened this Oct 17, 2024
@yuvaltassa
Copy link
Collaborator

It's good that you found the solution, but put_data should either handle this situation (being called before any arena allocations) gracefully or, if that's not possible, give the user an informative error.

That, I think, is a bug. WDYT @erikfrey ?

@rustam-e
Copy link
Author

@yuvaltassa we finally completed the research project we were working on - Benchmarking MJX vs Mujoco vs combining resource allocation with Mujoco and MJX running in parallel on a single machine - thought I might share if you're interested and was hoping for your opinion on methodology and whether you feel that the experiment was fairly conducted.

🔍 Key Observations:
✅ CPU dominance: In most cases, CPU-only simulations were the fastest.
✅ Parallelism overhead: At low variant counts (number of Evolutionary algorithms to be evolved), the overhead of running CPU and GPU together negated performance benefits.
✅ Scalability potential: At higher variant counts, the hybrid CPU + GPU strategy approached or even hinted at surpassing single-hardware performance.
✅ Cluster computing > GPU acceleration? Our results suggest that for Mujoco simulations, a more effective acceleration approach might be parallelizing across multi-core CPU clusters rather than relying solely on GPUs.
These findings emphasize the importance of strategic workload distribution rather than blindly leveraging GPUs for acceleration. Future research could explore how to refine hybrid execution models for large-scale EC problems.
📄 Read the full study here: https://arxiv.org/abs/2502.11129

@erikfrey
Copy link
Collaborator

erikfrey commented Feb 19, 2025

Hi @rustam-e,

Two things you should do for your MJX benchmarking:

  1. Use jax.lax.scan or jax.lax.fori_loop with length=n_n_steps and some reasonable unroll. So something like:
def unroll(d, _):
  d = step(mjx_model, d)
  return d, None

unroll_fn = jax.jit(lambda d: jax.lax.scan(unroll, d, None, length=n_n_steps, unroll=4)[0])

# burn in
datas = unroll_fn(datas)

# measure
t = time.perf_counter()
...
  1. To get real perf timings, you must call block_until_ready() on something to ensure the GPU operation is complete, otherwise you are timing how quickly you can enqueue work to the GPU, not how quickly it runs. So something like:
datas.qpos.block_until_ready()

elapsed = = time.perf_counter() - t

re: bug yes agreed you shouldn't need to call mj_step per se. I'll try to repro this but a simpler repro case would help quite a bit. Do you have something that's just a few lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants