Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] state_dict hooks compatibility in from_module and to_module #596

Closed
wants to merge 6 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 11, 2023

Optionally calls the state_dict hooks in state_dict and load_state_dict.

cc @fegin

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2023
Copy link

github-actions bot commented Dec 11, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1400ms 17.3453μs 57.6525 KOps/s 62.4285 KOps/s $\textbf{\color{#d91a1a}-7.65\%}$
test_plain_set_stack_nested 0.2517ms 0.1421ms 7.0353 KOps/s 7.0234 KOps/s $\color{#35bf28}+0.17\%$
test_plain_set_nested_inplace 55.7840μs 19.8769μs 50.3096 KOps/s 53.8207 KOps/s $\textbf{\color{#d91a1a}-6.52\%}$
test_plain_set_stack_nested_inplace 0.3250ms 0.1759ms 5.6844 KOps/s 5.5473 KOps/s $\color{#35bf28}+2.47\%$
test_items 15.9090μs 2.4462μs 408.7928 KOps/s 412.8885 KOps/s $\color{#d91a1a}-0.99\%$
test_items_nested 0.5542ms 0.2711ms 3.6893 KOps/s 3.6919 KOps/s $\color{#d91a1a}-0.07\%$
test_items_nested_locked 0.6440ms 0.2709ms 3.6919 KOps/s 3.6828 KOps/s $\color{#35bf28}+0.25\%$
test_items_nested_leaf 0.2125ms 0.1670ms 5.9889 KOps/s 6.0391 KOps/s $\color{#d91a1a}-0.83\%$
test_items_stack_nested 2.1524ms 1.3680ms 730.9749 Ops/s 751.7134 Ops/s $\color{#d91a1a}-2.76\%$
test_items_stack_nested_leaf 1.9284ms 1.2150ms 823.0260 Ops/s 835.2675 Ops/s $\color{#d91a1a}-1.47\%$
test_items_stack_nested_locked 1.1229ms 0.8823ms 1.1335 KOps/s 1.1398 KOps/s $\color{#d91a1a}-0.55\%$
test_keys 19.2750μs 4.2274μs 236.5531 KOps/s 257.5496 KOps/s $\textbf{\color{#d91a1a}-8.15\%}$
test_keys_nested 53.4843ms 0.1573ms 6.3577 KOps/s 6.7540 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_keys_nested_locked 0.2641ms 0.1468ms 6.8113 KOps/s 6.7202 KOps/s $\color{#35bf28}+1.36\%$
test_keys_nested_leaf 0.2131ms 0.1297ms 7.7086 KOps/s 7.7116 KOps/s $\color{#d91a1a}-0.04\%$
test_keys_stack_nested 1.5192ms 1.3025ms 767.7499 Ops/s 778.2348 Ops/s $\color{#d91a1a}-1.35\%$
test_keys_stack_nested_leaf 2.0958ms 1.2980ms 770.3907 Ops/s 776.1664 Ops/s $\color{#d91a1a}-0.74\%$
test_keys_stack_nested_locked 3.6089ms 0.8248ms 1.2124 KOps/s 1.2283 KOps/s $\color{#d91a1a}-1.30\%$
test_values 6.1418μs 1.1369μs 879.5606 KOps/s 857.1922 KOps/s $\color{#35bf28}+2.61\%$
test_values_nested 93.6640μs 51.9106μs 19.2639 KOps/s 18.1565 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_values_nested_locked 0.1056ms 52.2033μs 19.1559 KOps/s 19.1455 KOps/s $\color{#35bf28}+0.05\%$
test_values_nested_leaf 4.0486ms 46.7328μs 21.3982 KOps/s 21.4655 KOps/s $\color{#d91a1a}-0.31\%$
test_values_stack_nested 1.2799ms 1.0701ms 934.4855 Ops/s 953.4445 Ops/s $\color{#d91a1a}-1.99\%$
test_values_stack_nested_leaf 1.8489ms 1.0555ms 947.4106 Ops/s 970.2931 Ops/s $\color{#d91a1a}-2.36\%$
test_values_stack_nested_locked 0.7331ms 0.6208ms 1.6108 KOps/s 1.6508 KOps/s $\color{#d91a1a}-2.42\%$
test_membership 19.1560μs 1.3388μs 746.9485 KOps/s 760.9406 KOps/s $\color{#d91a1a}-1.84\%$
test_membership_nested 22.6120μs 2.9299μs 341.3133 KOps/s 348.2275 KOps/s $\color{#d91a1a}-1.99\%$
test_membership_nested_leaf 23.9650μs 2.9392μs 340.2243 KOps/s 339.1227 KOps/s $\color{#35bf28}+0.32\%$
test_membership_stacked_nested 30.9580μs 11.9142μs 83.9334 KOps/s 85.1624 KOps/s $\color{#d91a1a}-1.44\%$
test_membership_stacked_nested_leaf 38.0310μs 11.9277μs 83.8385 KOps/s 83.9895 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_nested_last 39.3040μs 6.0050μs 166.5276 KOps/s 164.3708 KOps/s $\color{#35bf28}+1.31\%$
test_membership_nested_leaf_last 29.0040μs 5.9513μs 168.0296 KOps/s 166.8085 KOps/s $\color{#35bf28}+0.73\%$
test_membership_stacked_nested_last 0.3597ms 0.1674ms 5.9721 KOps/s 5.9828 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_stacked_nested_leaf_last 49.1120μs 13.9794μs 71.5338 KOps/s 71.5025 KOps/s $\color{#35bf28}+0.04\%$
test_nested_getleaf 32.2200μs 10.5228μs 95.0316 KOps/s 94.4209 KOps/s $\color{#35bf28}+0.65\%$
test_nested_get 30.3960μs 10.0318μs 99.6834 KOps/s 98.7462 KOps/s $\color{#35bf28}+0.95\%$
test_stacked_getleaf 0.6318ms 0.3981ms 2.5119 KOps/s 2.4488 KOps/s $\color{#35bf28}+2.58\%$
test_stacked_get 0.5772ms 0.3628ms 2.7564 KOps/s 2.6682 KOps/s $\color{#35bf28}+3.30\%$
test_nested_getitemleaf 29.9560μs 10.6096μs 94.2545 KOps/s 93.3900 KOps/s $\color{#35bf28}+0.93\%$
test_nested_getitem 37.2600μs 10.0129μs 99.8713 KOps/s 98.5275 KOps/s $\color{#35bf28}+1.36\%$
test_stacked_getitemleaf 0.8669ms 0.4006ms 2.4965 KOps/s 2.4252 KOps/s $\color{#35bf28}+2.94\%$
test_stacked_getitem 0.5456ms 0.3638ms 2.7485 KOps/s 2.6573 KOps/s $\color{#35bf28}+3.43\%$
test_lock_nested 1.2383ms 0.4084ms 2.4485 KOps/s 2.4072 KOps/s $\color{#35bf28}+1.71\%$
test_lock_stack_nested 70.7318ms 6.2733ms 159.4069 Ops/s 155.0296 Ops/s $\color{#35bf28}+2.82\%$
test_unlock_nested 62.2035ms 0.4775ms 2.0944 KOps/s 2.3793 KOps/s $\textbf{\color{#d91a1a}-11.97\%}$
test_unlock_stack_nested 71.5023ms 5.9403ms 168.3425 Ops/s 161.9871 Ops/s $\color{#35bf28}+3.92\%$
test_flatten_speed 0.5772ms 0.3647ms 2.7423 KOps/s 2.6978 KOps/s $\color{#35bf28}+1.65\%$
test_unflatten_speed 0.6817ms 0.4583ms 2.1818 KOps/s 2.1707 KOps/s $\color{#35bf28}+0.51\%$
test_common_ops 5.1225ms 0.6914ms 1.4463 KOps/s 1.5180 KOps/s $\color{#d91a1a}-4.73\%$
test_creation 27.2710μs 2.0194μs 495.1909 KOps/s 503.8599 KOps/s $\color{#d91a1a}-1.72\%$
test_creation_empty 34.4440μs 10.8500μs 92.1656 KOps/s 114.9754 KOps/s $\textbf{\color{#d91a1a}-19.84\%}$
test_creation_nested_1 37.4600μs 13.5740μs 73.6704 KOps/s 86.2351 KOps/s $\textbf{\color{#d91a1a}-14.57\%}$
test_creation_nested_2 51.1550μs 16.7592μs 59.6686 KOps/s 67.9146 KOps/s $\textbf{\color{#d91a1a}-12.14\%}$
test_clone 95.7890μs 12.1685μs 82.1795 KOps/s 80.5660 KOps/s $\color{#35bf28}+2.00\%$
test_getitem[int] 35.8660μs 11.6806μs 85.6124 KOps/s 83.8435 KOps/s $\color{#35bf28}+2.11\%$
test_getitem[slice_int] 93.0640μs 23.2096μs 43.0857 KOps/s 41.5850 KOps/s $\color{#35bf28}+3.61\%$
test_getitem[range] 0.1225ms 41.8944μs 23.8695 KOps/s 23.0152 KOps/s $\color{#35bf28}+3.71\%$
test_getitem[tuple] 44.9130μs 19.1502μs 52.2188 KOps/s 51.7980 KOps/s $\color{#35bf28}+0.81\%$
test_getitem[list] 80.2200μs 37.0535μs 26.9880 KOps/s 25.4109 KOps/s $\textbf{\color{#35bf28}+6.21\%}$
test_setitem_dim[int] 58.0380μs 29.3076μs 34.1208 KOps/s 34.6682 KOps/s $\color{#d91a1a}-1.58\%$
test_setitem_dim[slice_int] 91.9210μs 55.2657μs 18.0944 KOps/s 17.8909 KOps/s $\color{#35bf28}+1.14\%$
test_setitem_dim[range] 0.1326ms 74.0322μs 13.5076 KOps/s 13.4722 KOps/s $\color{#35bf28}+0.26\%$
test_setitem_dim[tuple] 76.7320μs 44.0915μs 22.6801 KOps/s 22.7287 KOps/s $\color{#d91a1a}-0.21\%$
test_setitem 0.1666ms 18.5095μs 54.0264 KOps/s 55.2890 KOps/s $\color{#d91a1a}-2.28\%$
test_set 0.1754ms 18.3880μs 54.3832 KOps/s 57.0777 KOps/s $\color{#d91a1a}-4.72\%$
test_set_shared 2.1736ms 0.1350ms 7.4092 KOps/s 7.2492 KOps/s $\color{#35bf28}+2.21\%$
test_update 99.1240μs 21.4208μs 46.6836 KOps/s 50.7753 KOps/s $\textbf{\color{#d91a1a}-8.06\%}$
test_update_nested 97.1610μs 28.5029μs 35.0841 KOps/s 36.8657 KOps/s $\color{#d91a1a}-4.83\%$
test_set_nested 0.1553ms 19.9718μs 50.0706 KOps/s 52.2562 KOps/s $\color{#d91a1a}-4.18\%$
test_set_nested_new 0.1193ms 24.5688μs 40.7020 KOps/s 42.9142 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_select 97.4620μs 46.7336μs 21.3979 KOps/s 21.0495 KOps/s $\color{#35bf28}+1.66\%$
test_unbind_speed 0.6271ms 0.3364ms 2.9727 KOps/s 2.9549 KOps/s $\color{#35bf28}+0.60\%$
test_unbind_speed_stack0 63.9612ms 4.1792ms 239.2800 Ops/s 237.2108 Ops/s $\color{#35bf28}+0.87\%$
test_unbind_speed_stack1 1.4893μs 0.6218μs 1.6083 MOps/s 1.5217 MOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_split 59.6358ms 1.6481ms 606.7616 Ops/s 594.7048 Ops/s $\color{#35bf28}+2.03\%$
test_chunk 3.2130ms 1.5723ms 636.0058 Ops/s 606.7421 Ops/s $\color{#35bf28}+4.82\%$
test_creation[device0] 0.1800ms 96.7006μs 10.3412 KOps/s 9.9764 KOps/s $\color{#35bf28}+3.66\%$
test_creation_from_tensor 4.7488ms 77.7598μs 12.8601 KOps/s 12.2693 KOps/s $\color{#35bf28}+4.82\%$
test_add_one[memmap_tensor0] 0.2559ms 5.2152μs 191.7486 KOps/s 188.0493 KOps/s $\color{#35bf28}+1.97\%$
test_contiguous[memmap_tensor0] 17.7040μs 0.6380μs 1.5674 MOps/s 1.5648 MOps/s $\color{#35bf28}+0.16\%$
test_stack[memmap_tensor0] 49.5620μs 3.5411μs 282.3992 KOps/s 286.4341 KOps/s $\color{#d91a1a}-1.41\%$
test_memmaptd_index 0.3695ms 0.1968ms 5.0822 KOps/s 5.0323 KOps/s $\color{#35bf28}+0.99\%$
test_memmaptd_index_astensor 0.4529ms 0.2572ms 3.8886 KOps/s 3.8272 KOps/s $\color{#35bf28}+1.60\%$
test_memmaptd_index_op 0.7627ms 0.5507ms 1.8157 KOps/s 1.9234 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_serialize_model 0.1017s 96.8312ms 10.3273 Ops/s 9.2541 Ops/s $\textbf{\color{#35bf28}+11.60\%}$
test_serialize_model_pickle 0.4507s 0.3757s 2.6618 Ops/s 2.5923 Ops/s $\color{#35bf28}+2.68\%$
test_serialize_weights 0.1572s 0.1032s 9.6855 Ops/s 9.4085 Ops/s $\color{#35bf28}+2.94\%$
test_serialize_weights_returnearly 0.1822s 0.1280s 7.8148 Ops/s 7.6669 Ops/s $\color{#35bf28}+1.93\%$
test_serialize_weights_pickle 0.6961s 0.4971s 2.0118 Ops/s 2.3615 Ops/s $\textbf{\color{#d91a1a}-14.80\%}$
test_serialize_weights_filesystem 0.1476s 94.4118ms 10.5919 Ops/s 10.6792 Ops/s $\color{#d91a1a}-0.82\%$
test_serialize_model_filesystem 0.1498s 94.7292ms 10.5564 Ops/s 11.1123 Ops/s $\textbf{\color{#d91a1a}-5.00\%}$
test_reshape_pytree 57.0960μs 23.0467μs 43.3902 KOps/s 42.4144 KOps/s $\color{#35bf28}+2.30\%$
test_reshape_td 80.1990μs 30.0731μs 33.2523 KOps/s 32.4092 KOps/s $\color{#35bf28}+2.60\%$
test_view_pytree 54.0610μs 22.9079μs 43.6530 KOps/s 42.7319 KOps/s $\color{#35bf28}+2.16\%$
test_view_td 33.5420μs 4.8386μs 206.6697 KOps/s 198.7917 KOps/s $\color{#35bf28}+3.96\%$
test_unbind_pytree 67.7770μs 26.4390μs 37.8229 KOps/s 37.7172 KOps/s $\color{#35bf28}+0.28\%$
test_unbind_td 99.0240μs 53.5419μs 18.6770 KOps/s 18.1711 KOps/s $\color{#35bf28}+2.78\%$
test_split_pytree 54.7920μs 26.0595μs 38.3737 KOps/s 38.0002 KOps/s $\color{#35bf28}+0.98\%$
test_split_td 0.5481ms 42.5521μs 23.5006 KOps/s 23.0724 KOps/s $\color{#35bf28}+1.86\%$
test_add_pytree 83.9070μs 32.5334μs 30.7376 KOps/s 30.7930 KOps/s $\color{#d91a1a}-0.18\%$
test_add_td 0.1072ms 50.6934μs 19.7265 KOps/s 20.6325 KOps/s $\color{#d91a1a}-4.39\%$
test_distributed 0.1736ms 97.1816μs 10.2900 KOps/s 9.9042 KOps/s $\color{#35bf28}+3.90\%$
test_tdmodule 0.7531ms 22.5298μs 44.3856 KOps/s 46.7229 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_tdmodule_dispatch 0.1822ms 39.8000μs 25.1256 KOps/s 25.6425 KOps/s $\color{#d91a1a}-2.02\%$
test_tdseq 0.1167ms 25.9130μs 38.5907 KOps/s 41.1978 KOps/s $\textbf{\color{#d91a1a}-6.33\%}$
test_tdseq_dispatch 0.1392ms 45.3395μs 22.0558 KOps/s 23.2249 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_instantiation_functorch 1.5111ms 1.2753ms 784.1383 Ops/s 767.4028 Ops/s $\color{#35bf28}+2.18\%$
test_instantiation_td 1.5200ms 0.9943ms 1.0057 KOps/s 984.7598 Ops/s $\color{#35bf28}+2.13\%$
test_exec_functorch 0.2868ms 0.1559ms 6.4158 KOps/s 6.2516 KOps/s $\color{#35bf28}+2.63\%$
test_exec_functional_call 0.2894ms 0.1448ms 6.9055 KOps/s 6.8039 KOps/s $\color{#35bf28}+1.49\%$
test_exec_td 0.2712ms 0.1417ms 7.0554 KOps/s 6.9648 KOps/s $\color{#35bf28}+1.30\%$
test_exec_td_decorator 0.7888ms 0.1766ms 5.6612 KOps/s 5.5024 KOps/s $\color{#35bf28}+2.89\%$
test_vmap_mlp_speed[True-True] 1.1994ms 0.8795ms 1.1370 KOps/s 1.1135 KOps/s $\color{#35bf28}+2.11\%$
test_vmap_mlp_speed[True-False] 0.9215ms 0.4683ms 2.1352 KOps/s 2.1330 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed[False-True] 1.0474ms 0.7680ms 1.3022 KOps/s 1.2800 KOps/s $\color{#35bf28}+1.73\%$
test_vmap_mlp_speed[False-False] 0.6313ms 0.3814ms 2.6218 KOps/s 2.5896 KOps/s $\color{#35bf28}+1.24\%$
test_vmap_mlp_speed_decorator[True-True] 3.0529ms 2.4278ms 411.9023 Ops/s 409.7337 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_mlp_speed_decorator[True-False] 0.9152ms 0.5173ms 1.9333 KOps/s 1.9154 KOps/s $\color{#35bf28}+0.93\%$
test_vmap_mlp_speed_decorator[False-True] 2.8347ms 1.9720ms 507.1080 Ops/s 508.8530 Ops/s $\color{#d91a1a}-0.34\%$
test_vmap_mlp_speed_decorator[False-False] 0.6914ms 0.3967ms 2.5211 KOps/s 2.4707 KOps/s $\color{#35bf28}+2.04\%$

Copy link

github-actions bot commented Dec 11, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1372ms 13.5620μs 73.7355 KOps/s 71.0700 KOps/s $\color{#35bf28}+3.75\%$
test_plain_set_stack_nested 0.1547ms 0.1184ms 8.4445 KOps/s 8.4665 KOps/s $\color{#d91a1a}-0.26\%$
test_plain_set_nested_inplace 41.3800μs 14.7983μs 67.5756 KOps/s 65.3531 KOps/s $\color{#35bf28}+3.40\%$
test_plain_set_stack_nested_inplace 0.1894ms 0.1446ms 6.9158 KOps/s 6.8981 KOps/s $\color{#35bf28}+0.26\%$
test_items 24.4700μs 4.6759μs 213.8627 KOps/s 208.9761 KOps/s $\color{#35bf28}+2.34\%$
test_items_nested 0.4090ms 0.3382ms 2.9572 KOps/s 2.9467 KOps/s $\color{#35bf28}+0.35\%$
test_items_nested_locked 0.4118ms 0.3394ms 2.9462 KOps/s 2.9118 KOps/s $\color{#35bf28}+1.18\%$
test_items_nested_leaf 0.2810ms 0.1990ms 5.0264 KOps/s 4.9840 KOps/s $\color{#35bf28}+0.85\%$
test_items_stack_nested 1.4323ms 1.2988ms 769.9445 Ops/s 770.4830 Ops/s $\color{#d91a1a}-0.07\%$
test_items_stack_nested_leaf 1.2235ms 1.1288ms 885.8827 Ops/s 877.2682 Ops/s $\color{#35bf28}+0.98\%$
test_items_stack_nested_locked 1.0425ms 0.8950ms 1.1173 KOps/s 1.1006 KOps/s $\color{#35bf28}+1.52\%$
test_keys 18.4500μs 4.6023μs 217.2825 KOps/s 216.0096 KOps/s $\color{#35bf28}+0.59\%$
test_keys_nested 0.7904ms 94.6765μs 10.5623 KOps/s 10.6740 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_nested_locked 0.1351ms 94.5496μs 10.5765 KOps/s 10.7692 KOps/s $\color{#d91a1a}-1.79\%$
test_keys_nested_leaf 0.1804ms 78.1864μs 12.7900 KOps/s 12.9408 KOps/s $\color{#d91a1a}-1.17\%$
test_keys_stack_nested 1.1881ms 1.1391ms 877.8924 Ops/s 884.3274 Ops/s $\color{#d91a1a}-0.73\%$
test_keys_stack_nested_leaf 1.2472ms 1.1165ms 895.6563 Ops/s 892.1104 Ops/s $\color{#35bf28}+0.40\%$
test_keys_stack_nested_locked 0.8254ms 0.7175ms 1.3937 KOps/s 1.3866 KOps/s $\color{#35bf28}+0.52\%$
test_values 13.2537μs 1.8843μs 530.6959 KOps/s 523.8714 KOps/s $\color{#35bf28}+1.30\%$
test_values_nested 75.7310μs 44.9285μs 22.2576 KOps/s 21.9666 KOps/s $\color{#35bf28}+1.32\%$
test_values_nested_locked 75.5420μs 47.0740μs 21.2432 KOps/s 20.9648 KOps/s $\color{#35bf28}+1.33\%$
test_values_nested_leaf 64.8810μs 39.1641μs 25.5336 KOps/s 25.3813 KOps/s $\color{#35bf28}+0.60\%$
test_values_stack_nested 1.0526ms 0.9434ms 1.0599 KOps/s 1.0378 KOps/s $\color{#35bf28}+2.14\%$
test_values_stack_nested_leaf 1.0577ms 0.9317ms 1.0733 KOps/s 1.0656 KOps/s $\color{#35bf28}+0.73\%$
test_values_stack_nested_locked 0.6998ms 0.5722ms 1.7476 KOps/s 1.7342 KOps/s $\color{#35bf28}+0.77\%$
test_membership 3.6860μs 0.9394μs 1.0646 MOps/s 1.0673 MOps/s $\color{#d91a1a}-0.25\%$
test_membership_nested 32.6100μs 2.3063μs 433.5911 KOps/s 433.8712 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_nested_leaf 16.2600μs 2.2283μs 448.7656 KOps/s 446.4456 KOps/s $\color{#35bf28}+0.52\%$
test_membership_stacked_nested 32.5800μs 11.2142μs 89.1730 KOps/s 90.3751 KOps/s $\color{#d91a1a}-1.33\%$
test_membership_stacked_nested_leaf 30.6710μs 11.3287μs 88.2717 KOps/s 91.8350 KOps/s $\color{#d91a1a}-3.88\%$
test_membership_nested_last 39.9510μs 4.7251μs 211.6352 KOps/s 210.3031 KOps/s $\color{#35bf28}+0.63\%$
test_membership_nested_leaf_last 18.7410μs 4.7398μs 210.9787 KOps/s 208.3336 KOps/s $\color{#35bf28}+1.27\%$
test_membership_stacked_nested_last 0.1774ms 0.1373ms 7.2813 KOps/s 7.3688 KOps/s $\color{#d91a1a}-1.19\%$
test_membership_stacked_nested_leaf_last 51.1210μs 13.3490μs 74.9118 KOps/s 78.7540 KOps/s $\color{#d91a1a}-4.88\%$
test_nested_getleaf 38.5910μs 8.4056μs 118.9686 KOps/s 119.5909 KOps/s $\color{#d91a1a}-0.52\%$
test_nested_get 24.8700μs 7.9422μs 125.9090 KOps/s 126.1635 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_getleaf 0.4353ms 0.3217ms 3.1086 KOps/s 3.1497 KOps/s $\color{#d91a1a}-1.30\%$
test_stacked_get 0.3394ms 0.2899ms 3.4489 KOps/s 3.5417 KOps/s $\color{#d91a1a}-2.62\%$
test_nested_getitemleaf 36.6900μs 8.4657μs 118.1242 KOps/s 118.4880 KOps/s $\color{#d91a1a}-0.31\%$
test_nested_getitem 31.4700μs 8.0128μs 124.8004 KOps/s 125.0007 KOps/s $\color{#d91a1a}-0.16\%$
test_stacked_getitemleaf 0.3977ms 0.3240ms 3.0868 KOps/s 3.1446 KOps/s $\color{#d91a1a}-1.84\%$
test_stacked_getitem 0.3415ms 0.2915ms 3.4304 KOps/s 3.5210 KOps/s $\color{#d91a1a}-2.57\%$
test_lock_nested 4.5991ms 0.4250ms 2.3532 KOps/s 2.3822 KOps/s $\color{#d91a1a}-1.22\%$
test_lock_stack_nested 92.4297ms 6.7673ms 147.7683 Ops/s 149.7002 Ops/s $\color{#d91a1a}-1.29\%$
test_unlock_nested 0.8865ms 0.4145ms 2.4127 KOps/s 2.4081 KOps/s $\color{#35bf28}+0.19\%$
test_unlock_stack_nested 89.1323ms 7.0499ms 141.8461 Ops/s 141.2035 Ops/s $\color{#35bf28}+0.46\%$
test_flatten_speed 0.7903ms 0.2620ms 3.8169 KOps/s 3.8528 KOps/s $\color{#d91a1a}-0.93\%$
test_unflatten_speed 0.4292ms 0.3525ms 2.8371 KOps/s 2.7980 KOps/s $\color{#35bf28}+1.40\%$
test_common_ops 1.0261ms 0.5780ms 1.7300 KOps/s 1.6579 KOps/s $\color{#35bf28}+4.35\%$
test_creation 17.5500μs 1.6032μs 623.7384 KOps/s 618.6956 KOps/s $\color{#35bf28}+0.82\%$
test_creation_empty 21.9000μs 7.9206μs 126.2525 KOps/s 108.3920 KOps/s $\textbf{\color{#35bf28}+16.48\%}$
test_creation_nested_1 42.3910μs 9.8248μs 101.7831 KOps/s 89.6687 KOps/s $\textbf{\color{#35bf28}+13.51\%}$
test_creation_nested_2 31.7500μs 12.3112μs 81.2267 KOps/s 73.5771 KOps/s $\textbf{\color{#35bf28}+10.40\%}$
test_clone 0.1558ms 12.8589μs 77.7672 KOps/s 77.1915 KOps/s $\color{#35bf28}+0.75\%$
test_getitem[int] 25.6900μs 11.1683μs 89.5391 KOps/s 87.9813 KOps/s $\color{#35bf28}+1.77\%$
test_getitem[slice_int] 44.4810μs 21.7223μs 46.0357 KOps/s 45.3078 KOps/s $\color{#35bf28}+1.61\%$
test_getitem[range] 69.0910μs 37.7513μs 26.4892 KOps/s 27.3363 KOps/s $\color{#d91a1a}-3.10\%$
test_getitem[tuple] 55.6810μs 18.8605μs 53.0209 KOps/s 53.4351 KOps/s $\color{#d91a1a}-0.78\%$
test_getitem[list] 0.4048ms 35.0628μs 28.5203 KOps/s 29.3296 KOps/s $\color{#d91a1a}-2.76\%$
test_setitem_dim[int] 66.4710μs 27.7454μs 36.0421 KOps/s 35.9479 KOps/s $\color{#35bf28}+0.26\%$
test_setitem_dim[slice_int] 82.5810μs 49.0132μs 20.4027 KOps/s 21.0275 KOps/s $\color{#d91a1a}-2.97\%$
test_setitem_dim[range] 0.1087ms 65.1509μs 15.3490 KOps/s 15.9271 KOps/s $\color{#d91a1a}-3.63\%$
test_setitem_dim[tuple] 61.3710μs 41.7874μs 23.9306 KOps/s 23.9579 KOps/s $\color{#d91a1a}-0.11\%$
test_setitem 0.1372ms 17.3407μs 57.6678 KOps/s 55.0099 KOps/s $\color{#35bf28}+4.83\%$
test_set 0.1325ms 16.7488μs 59.7058 KOps/s 56.8552 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_set_shared 2.8979ms 0.1026ms 9.7492 KOps/s 9.8526 KOps/s $\color{#d91a1a}-1.05\%$
test_update 0.1280ms 19.4351μs 51.4533 KOps/s 48.1437 KOps/s $\textbf{\color{#35bf28}+6.87\%}$
test_update_nested 0.1527ms 25.5954μs 39.0695 KOps/s 36.7373 KOps/s $\textbf{\color{#35bf28}+6.35\%}$
test_set_nested 0.1329ms 18.0311μs 55.4599 KOps/s 53.1385 KOps/s $\color{#35bf28}+4.37\%$
test_set_nested_new 0.1380ms 20.9831μs 47.6575 KOps/s 45.3712 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_select 0.1565ms 41.8579μs 23.8903 KOps/s 23.4074 KOps/s $\color{#35bf28}+2.06\%$
test_to 74.6020μs 54.2570μs 18.4308 KOps/s 18.1902 KOps/s $\color{#35bf28}+1.32\%$
test_to_nonblocking 73.3010μs 34.8042μs 28.7322 KOps/s 28.8325 KOps/s $\color{#d91a1a}-0.35\%$
test_unbind_speed 0.3722ms 0.3311ms 3.0206 KOps/s 3.0242 KOps/s $\color{#d91a1a}-0.12\%$
test_unbind_speed_stack0 86.5820ms 4.1775ms 239.3772 Ops/s 256.5125 Ops/s $\textbf{\color{#d91a1a}-6.68\%}$
test_unbind_speed_stack1 1.5750μs 0.5388μs 1.8559 MOps/s 1.8863 MOps/s $\color{#d91a1a}-1.61\%$
test_split 1.8554ms 1.5727ms 635.8451 Ops/s 575.5000 Ops/s $\textbf{\color{#35bf28}+10.49\%}$
test_chunk 79.0481ms 1.7026ms 587.3235 Ops/s 584.0735 Ops/s $\color{#35bf28}+0.56\%$
test_creation[device0] 0.1417ms 72.6205μs 13.7702 KOps/s 13.8337 KOps/s $\color{#d91a1a}-0.46\%$
test_creation_from_tensor 0.1317ms 53.3616μs 18.7401 KOps/s 17.5757 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_add_one[memmap_tensor0] 0.1472ms 7.0969μs 140.9062 KOps/s 140.7511 KOps/s $\color{#35bf28}+0.11\%$
test_contiguous[memmap_tensor0] 23.7800μs 0.6506μs 1.5370 MOps/s 1.5207 MOps/s $\color{#35bf28}+1.07\%$
test_stack[memmap_tensor0] 33.5810μs 4.5966μs 217.5522 KOps/s 224.1874 KOps/s $\color{#d91a1a}-2.96\%$
test_memmaptd_index 0.2710ms 0.2485ms 4.0249 KOps/s 4.0655 KOps/s $\color{#d91a1a}-1.00\%$
test_memmaptd_index_astensor 0.3311ms 0.3012ms 3.3197 KOps/s 3.2936 KOps/s $\color{#35bf28}+0.79\%$
test_memmaptd_index_op 0.7806ms 0.5868ms 1.7042 KOps/s 1.6495 KOps/s $\color{#35bf28}+3.31\%$
test_serialize_model 0.1701s 99.1106ms 10.0897 Ops/s 9.6108 Ops/s $\color{#35bf28}+4.98\%$
test_serialize_model_pickle 1.3487s 1.2365s 0.8088 Ops/s 0.8056 Ops/s $\color{#35bf28}+0.40\%$
test_serialize_weights 0.1696s 96.1028ms 10.4055 Ops/s 9.7487 Ops/s $\textbf{\color{#35bf28}+6.74\%}$
test_serialize_weights_returnearly 0.2726s 79.0735ms 12.6465 Ops/s 14.7876 Ops/s $\textbf{\color{#d91a1a}-14.48\%}$
test_serialize_weights_pickle 1.3531s 1.2382s 0.8077 Ops/s 0.8082 Ops/s $\color{#d91a1a}-0.07\%$
test_reshape_pytree 57.2110μs 24.6835μs 40.5129 KOps/s 40.9625 KOps/s $\color{#d91a1a}-1.10\%$
test_reshape_td 58.9410μs 29.0580μs 34.4139 KOps/s 35.5274 KOps/s $\color{#d91a1a}-3.13\%$
test_view_pytree 54.6410μs 24.2635μs 41.2141 KOps/s 42.3294 KOps/s $\color{#d91a1a}-2.63\%$
test_view_td 21.4110μs 4.1009μs 243.8491 KOps/s 245.0510 KOps/s $\color{#d91a1a}-0.49\%$
test_unbind_pytree 52.7810μs 29.8786μs 33.4688 KOps/s 33.4730 KOps/s $\color{#d91a1a}-0.01\%$
test_unbind_td 83.7810μs 51.8030μs 19.3039 KOps/s 18.0397 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_split_pytree 51.9610μs 28.2703μs 35.3729 KOps/s 34.5890 KOps/s $\color{#35bf28}+2.27\%$
test_split_td 0.7514ms 40.3405μs 24.7890 KOps/s 24.3731 KOps/s $\color{#35bf28}+1.71\%$
test_add_pytree 60.4610μs 36.3874μs 27.4820 KOps/s 27.8288 KOps/s $\color{#d91a1a}-1.25\%$
test_add_td 96.3610μs 47.4363μs 21.0809 KOps/s 20.5102 KOps/s $\color{#35bf28}+2.78\%$
test_distributed 3.9383ms 73.2619μs 13.6497 KOps/s 13.4299 KOps/s $\color{#35bf28}+1.64\%$
test_tdmodule 36.7400μs 17.2074μs 58.1144 KOps/s 53.5232 KOps/s $\textbf{\color{#35bf28}+8.58\%}$
test_tdmodule_dispatch 0.2483ms 33.5821μs 29.7777 KOps/s 28.7829 KOps/s $\color{#35bf28}+3.46\%$
test_tdseq 39.9410μs 20.7328μs 48.2327 KOps/s 46.1799 KOps/s $\color{#35bf28}+4.45\%$
test_tdseq_dispatch 53.1210μs 36.2253μs 27.6050 KOps/s 26.4358 KOps/s $\color{#35bf28}+4.42\%$
test_instantiation_functorch 1.7707ms 1.6760ms 596.6663 Ops/s 599.9548 Ops/s $\color{#d91a1a}-0.55\%$
test_instantiation_td 1.7637ms 1.1931ms 838.1392 Ops/s 865.1082 Ops/s $\color{#d91a1a}-3.12\%$
test_exec_functorch 0.1953ms 0.1558ms 6.4183 KOps/s 6.3371 KOps/s $\color{#35bf28}+1.28\%$
test_exec_functional_call 0.1854ms 0.1578ms 6.3356 KOps/s 6.3427 KOps/s $\color{#d91a1a}-0.11\%$
test_exec_td 0.2165ms 0.1474ms 6.7820 KOps/s 6.7759 KOps/s $\color{#35bf28}+0.09\%$
test_exec_td_decorator 0.7590ms 0.1889ms 5.2941 KOps/s 5.3562 KOps/s $\color{#d91a1a}-1.16\%$
test_vmap_mlp_speed[True-True] 1.2023ms 1.1157ms 896.3102 Ops/s 899.5287 Ops/s $\color{#d91a1a}-0.36\%$
test_vmap_mlp_speed[True-False] 0.7271ms 0.6647ms 1.5044 KOps/s 1.5109 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed[False-True] 1.1615ms 1.0275ms 973.2353 Ops/s 978.4973 Ops/s $\color{#d91a1a}-0.54\%$
test_vmap_mlp_speed[False-False] 0.6537ms 0.5919ms 1.6896 KOps/s 1.6981 KOps/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed_decorator[True-True] 3.2408ms 2.5343ms 394.5879 Ops/s 391.1642 Ops/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed_decorator[True-False] 1.0136ms 0.7109ms 1.4067 KOps/s 1.3960 KOps/s $\color{#35bf28}+0.77\%$
test_vmap_mlp_speed_decorator[False-True] 2.5364ms 2.1120ms 473.4815 Ops/s 466.8251 Ops/s $\color{#35bf28}+1.43\%$
test_vmap_mlp_speed_decorator[False-False] 1.1096ms 0.6106ms 1.6378 KOps/s 1.6428 KOps/s $\color{#d91a1a}-0.30\%$
test_vmap_transformer_speed[True-True] 12.6607ms 12.5604ms 79.6150 Ops/s 79.8489 Ops/s $\color{#d91a1a}-0.29\%$
test_vmap_transformer_speed[True-False] 8.2966ms 8.2350ms 121.4329 Ops/s 121.6993 Ops/s $\color{#d91a1a}-0.22\%$
test_vmap_transformer_speed[False-True] 12.4867ms 12.4337ms 80.4263 Ops/s 80.9636 Ops/s $\color{#d91a1a}-0.66\%$
test_vmap_transformer_speed[False-False] 8.1948ms 8.1601ms 122.5480 Ops/s 122.8381 Ops/s $\color{#d91a1a}-0.24\%$
test_vmap_transformer_speed_decorator[True-True] 0.1665s 83.0680ms 12.0383 Ops/s 11.9058 Ops/s $\color{#35bf28}+1.11\%$
test_vmap_transformer_speed_decorator[True-False] 21.5018ms 19.7528ms 50.6258 Ops/s 50.7519 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed_decorator[False-True] 69.9475ms 68.9055ms 14.5126 Ops/s 14.4450 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[False-False] 21.1530ms 19.4443ms 51.4289 Ops/s 47.0814 Ops/s $\textbf{\color{#35bf28}+9.23\%}$

@fegin
Copy link

fegin commented Dec 12, 2023

@vmoens The PR looks good to me.

I'm thinking to write a DCP storage plugin with TensorDict to see how it works. But we will need TensorDict to support 1.) DTensor and 2.) optimizer state_dict.

@vmoens vmoens added the enhancement New feature or request label Dec 13, 2023
@vmoens
Copy link
Contributor Author

vmoens commented Dec 14, 2023

@fegin I made the from-module method a little more recursive.

Let me see about DTensors and optimizers state-dict compatibility.

@vmoens vmoens closed this Jan 16, 2024
@vmoens vmoens deleted the from_module-state_dict branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants