Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improve in-place ops for TensorDictParams #609

Merged
merged 2 commits into from
Jan 5, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 5, 2024

Makes it possible to call zero_ and such on TensorDictParams, and makes these ops active on the data.

Also clarifies the docstring of clone

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 5, 2024
@vmoens vmoens added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 5, 2024
@vmoens vmoens marked this pull request as ready for review January 5, 2024 09:30
Copy link

github-actions bot commented Jan 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.3090μs 18.1905μs 54.9737 KOps/s 62.4993 KOps/s $\textbf{\color{#d91a1a}-12.04\%}$
test_plain_set_stack_nested 0.2590ms 0.1492ms 6.7032 KOps/s 7.1848 KOps/s $\textbf{\color{#d91a1a}-6.70\%}$
test_plain_set_nested_inplace 52.2080μs 20.3831μs 49.0602 KOps/s 54.6697 KOps/s $\textbf{\color{#d91a1a}-10.26\%}$
test_plain_set_stack_nested_inplace 0.4092ms 0.1813ms 5.5150 KOps/s 5.7352 KOps/s $\color{#d91a1a}-3.84\%$
test_items 15.1690μs 2.4919μs 401.2944 KOps/s 418.5972 KOps/s $\color{#d91a1a}-4.13\%$
test_items_nested 0.4887ms 0.2750ms 3.6364 KOps/s 3.5686 KOps/s $\color{#35bf28}+1.90\%$
test_items_nested_locked 1.3204ms 0.2731ms 3.6621 KOps/s 3.6904 KOps/s $\color{#d91a1a}-0.77\%$
test_items_nested_leaf 0.3158ms 0.1702ms 5.8750 KOps/s 5.9793 KOps/s $\color{#d91a1a}-1.74\%$
test_items_stack_nested 1.4579ms 1.3084ms 764.3202 Ops/s 754.0726 Ops/s $\color{#35bf28}+1.36\%$
test_items_stack_nested_leaf 2.9020ms 1.2155ms 822.7313 Ops/s 841.1068 Ops/s $\color{#d91a1a}-2.18\%$
test_items_stack_nested_locked 2.5872ms 0.7857ms 1.2727 KOps/s 1.3084 KOps/s $\color{#d91a1a}-2.73\%$
test_keys 18.0040μs 3.8381μs 260.5423 KOps/s 256.6417 KOps/s $\color{#35bf28}+1.52\%$
test_keys_nested 51.1257ms 0.1583ms 6.3151 KOps/s 6.7481 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_keys_nested_locked 0.2089ms 0.1490ms 6.7106 KOps/s 6.8679 KOps/s $\color{#d91a1a}-2.29\%$
test_keys_nested_leaf 0.2138ms 0.1309ms 7.6401 KOps/s 7.7841 KOps/s $\color{#d91a1a}-1.85\%$
test_keys_stack_nested 1.8744ms 1.2811ms 780.5736 Ops/s 774.5607 Ops/s $\color{#35bf28}+0.78\%$
test_keys_stack_nested_leaf 2.2178ms 1.2983ms 770.2204 Ops/s 775.2572 Ops/s $\color{#d91a1a}-0.65\%$
test_keys_stack_nested_locked 1.5124ms 0.7061ms 1.4162 KOps/s 1.4391 KOps/s $\color{#d91a1a}-1.59\%$
test_values 18.4804μs 1.1977μs 834.9163 KOps/s 872.9466 KOps/s $\color{#d91a1a}-4.36\%$
test_values_nested 0.1015ms 52.3755μs 19.0929 KOps/s 18.9936 KOps/s $\color{#35bf28}+0.52\%$
test_values_nested_locked 0.1128ms 53.2488μs 18.7798 KOps/s 19.0841 KOps/s $\color{#d91a1a}-1.59\%$
test_values_nested_leaf 0.1304ms 47.6791μs 20.9736 KOps/s 21.4182 KOps/s $\color{#d91a1a}-2.08\%$
test_values_stack_nested 1.3036ms 1.0481ms 954.1339 Ops/s 943.0064 Ops/s $\color{#35bf28}+1.18\%$
test_values_stack_nested_leaf 1.1517ms 1.0328ms 968.2056 Ops/s 959.0429 Ops/s $\color{#35bf28}+0.96\%$
test_values_stack_nested_locked 1.0696ms 0.5124ms 1.9514 KOps/s 1.9235 KOps/s $\color{#35bf28}+1.45\%$
test_membership 45.7060μs 1.3495μs 741.0013 KOps/s 722.6454 KOps/s $\color{#35bf28}+2.54\%$
test_membership_nested 37.9210μs 2.8544μs 350.3415 KOps/s 345.2012 KOps/s $\color{#35bf28}+1.49\%$
test_membership_nested_leaf 39.4040μs 2.8575μs 349.9614 KOps/s 342.0553 KOps/s $\color{#35bf28}+2.31\%$
test_membership_stacked_nested 33.6130μs 11.8409μs 84.4527 KOps/s 83.4259 KOps/s $\color{#35bf28}+1.23\%$
test_membership_stacked_nested_leaf 57.3170μs 11.8118μs 84.6609 KOps/s 83.0149 KOps/s $\color{#35bf28}+1.98\%$
test_membership_nested_last 34.7450μs 6.0139μs 166.2827 KOps/s 165.3360 KOps/s $\color{#35bf28}+0.57\%$
test_membership_nested_leaf_last 39.5840μs 6.0205μs 166.0982 KOps/s 165.0833 KOps/s $\color{#35bf28}+0.61\%$
test_membership_stacked_nested_last 0.2423ms 0.1702ms 5.8770 KOps/s 5.9270 KOps/s $\color{#d91a1a}-0.84\%$
test_membership_stacked_nested_leaf_last 61.0540μs 14.0313μs 71.2692 KOps/s 71.7494 KOps/s $\color{#d91a1a}-0.67\%$
test_nested_getleaf 43.8920μs 10.8682μs 92.0116 KOps/s 93.7826 KOps/s $\color{#d91a1a}-1.89\%$
test_nested_get 64.4610μs 10.1740μs 98.2894 KOps/s 98.9209 KOps/s $\color{#d91a1a}-0.64\%$
test_stacked_getleaf 0.5667ms 0.4704ms 2.1257 KOps/s 2.1348 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_get 0.5214ms 0.4400ms 2.2729 KOps/s 2.2772 KOps/s $\color{#d91a1a}-0.19\%$
test_nested_getitemleaf 49.4820μs 10.8966μs 91.7719 KOps/s 93.2136 KOps/s $\color{#d91a1a}-1.55\%$
test_nested_getitem 51.8760μs 10.5013μs 95.2260 KOps/s 99.9236 KOps/s $\color{#d91a1a}-4.70\%$
test_stacked_getitemleaf 0.5725ms 0.4745ms 2.1077 KOps/s 2.1448 KOps/s $\color{#d91a1a}-1.73\%$
test_stacked_getitem 0.9013ms 0.4428ms 2.2584 KOps/s 2.2881 KOps/s $\color{#d91a1a}-1.30\%$
test_lock_nested 1.2755ms 0.4085ms 2.4481 KOps/s 2.4205 KOps/s $\color{#35bf28}+1.14\%$
test_lock_stack_nested 80.4807ms 6.5717ms 152.1678 Ops/s 150.8129 Ops/s $\color{#35bf28}+0.90\%$
test_unlock_nested 70.6863ms 0.4861ms 2.0571 KOps/s 2.3696 KOps/s $\textbf{\color{#d91a1a}-13.19\%}$
test_unlock_stack_nested 78.8660ms 6.2167ms 160.8565 Ops/s 159.2687 Ops/s $\color{#35bf28}+1.00\%$
test_flatten_speed 0.6174ms 0.3722ms 2.6864 KOps/s 2.7165 KOps/s $\color{#d91a1a}-1.11\%$
test_unflatten_speed 0.5729ms 0.4615ms 2.1667 KOps/s 2.2164 KOps/s $\color{#d91a1a}-2.24\%$
test_common_ops 3.9583ms 0.7284ms 1.3729 KOps/s 1.4743 KOps/s $\textbf{\color{#d91a1a}-6.88\%}$
test_creation 41.4970μs 1.9784μs 505.4486 KOps/s 509.5153 KOps/s $\color{#d91a1a}-0.80\%$
test_creation_empty 50.2240μs 11.2953μs 88.5325 KOps/s 113.9033 KOps/s $\textbf{\color{#d91a1a}-22.27\%}$
test_creation_nested_1 36.2580μs 14.2518μs 70.1665 KOps/s 85.5375 KOps/s $\textbf{\color{#d91a1a}-17.97\%}$
test_creation_nested_2 47.3990μs 19.5923μs 51.0403 KOps/s 59.2686 KOps/s $\textbf{\color{#d91a1a}-13.88\%}$
test_clone 0.1276ms 12.5188μs 79.8799 KOps/s 81.4405 KOps/s $\color{#d91a1a}-1.92\%$
test_getitem[int] 28.6340μs 11.9848μs 83.4390 KOps/s 81.8896 KOps/s $\color{#35bf28}+1.89\%$
test_getitem[slice_int] 62.7670μs 23.8825μs 41.8717 KOps/s 42.0625 KOps/s $\color{#d91a1a}-0.45\%$
test_getitem[range] 86.8220μs 41.4917μs 24.1012 KOps/s 23.5844 KOps/s $\color{#35bf28}+2.19\%$
test_getitem[tuple] 66.9840μs 19.2966μs 51.8225 KOps/s 51.4350 KOps/s $\color{#35bf28}+0.75\%$
test_getitem[list] 0.4734ms 37.5939μs 26.6000 KOps/s 26.0530 KOps/s $\color{#35bf28}+2.10\%$
test_setitem_dim[int] 56.9060μs 31.3505μs 31.8974 KOps/s 33.5831 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_setitem_dim[slice_int] 0.1001ms 58.3785μs 17.1296 KOps/s 18.3034 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_setitem_dim[range] 0.1163ms 74.9601μs 13.3404 KOps/s 13.5617 KOps/s $\color{#d91a1a}-1.63\%$
test_setitem_dim[tuple] 0.1135ms 48.2785μs 20.7132 KOps/s 22.4239 KOps/s $\textbf{\color{#d91a1a}-7.63\%}$
test_setitem 0.1070ms 18.9078μs 52.8881 KOps/s 55.6456 KOps/s $\color{#d91a1a}-4.96\%$
test_set 0.1280ms 18.5772μs 53.8293 KOps/s 58.5519 KOps/s $\textbf{\color{#d91a1a}-8.07\%}$
test_set_shared 3.3934ms 0.1401ms 7.1360 KOps/s 7.1682 KOps/s $\color{#d91a1a}-0.45\%$
test_update 0.1330ms 22.1788μs 45.0880 KOps/s 50.2584 KOps/s $\textbf{\color{#d91a1a}-10.29\%}$
test_update_nested 0.1335ms 29.4533μs 33.9520 KOps/s 36.8342 KOps/s $\textbf{\color{#d91a1a}-7.82\%}$
test_set_nested 99.4350μs 20.2283μs 49.4356 KOps/s 51.8548 KOps/s $\color{#d91a1a}-4.67\%$
test_set_nested_new 0.1056ms 24.4732μs 40.8610 KOps/s 42.0636 KOps/s $\color{#d91a1a}-2.86\%$
test_select 0.2587ms 48.8578μs 20.4675 KOps/s 20.8481 KOps/s $\color{#d91a1a}-1.83\%$
test_unbind_speed 0.4227ms 0.3438ms 2.9090 KOps/s 2.9311 KOps/s $\color{#d91a1a}-0.75\%$
test_unbind_speed_stack0 71.4436ms 4.4092ms 226.7978 Ops/s 218.6774 Ops/s $\color{#35bf28}+3.71\%$
test_unbind_speed_stack1 5.3600μs 0.6775μs 1.4761 MOps/s 1.5236 MOps/s $\color{#d91a1a}-3.12\%$
test_split 2.4354ms 1.5718ms 636.2084 Ops/s 641.3992 Ops/s $\color{#d91a1a}-0.81\%$
test_chunk 64.3525ms 1.6693ms 599.0574 Ops/s 603.0233 Ops/s $\color{#d91a1a}-0.66\%$
test_creation[device0] 0.4871ms 0.2968ms 3.3696 KOps/s 3.4038 KOps/s $\color{#d91a1a}-1.00\%$
test_creation_from_tensor 3.8111ms 0.3325ms 3.0071 KOps/s 3.0176 KOps/s $\color{#d91a1a}-0.35\%$
test_add_one[memmap_tensor0] 75.2910μs 24.9755μs 40.0392 KOps/s 39.3092 KOps/s $\color{#35bf28}+1.86\%$
test_contiguous[memmap_tensor0] 33.8030μs 5.8468μs 171.0331 KOps/s 171.8132 KOps/s $\color{#d91a1a}-0.45\%$
test_stack[memmap_tensor0] 64.8520μs 19.1012μs 52.3526 KOps/s 51.5778 KOps/s $\color{#35bf28}+1.50\%$
test_memmaptd_index 0.2803ms 0.2005ms 4.9868 KOps/s 4.9698 KOps/s $\color{#35bf28}+0.34\%$
test_memmaptd_index_astensor 0.3581ms 0.2602ms 3.8431 KOps/s 3.9003 KOps/s $\color{#d91a1a}-1.47\%$
test_memmaptd_index_op 1.0887ms 0.5560ms 1.7986 KOps/s 1.9128 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_serialize_model 0.1011s 97.3997ms 10.2670 Ops/s 8.5916 Ops/s $\textbf{\color{#35bf28}+19.50\%}$
test_serialize_model_filesystem 0.1654s 97.3606ms 10.2711 Ops/s 10.2256 Ops/s $\color{#35bf28}+0.44\%$
test_serialize_model_pickle 0.4513s 0.3790s 2.6384 Ops/s 2.5621 Ops/s $\color{#35bf28}+2.98\%$
test_serialize_weights 0.1670s 0.1054s 9.4833 Ops/s 9.2846 Ops/s $\color{#35bf28}+2.14\%$
test_serialize_weights_filesystem 94.6981ms 91.2399ms 10.9601 Ops/s 10.2069 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_serialize_weights_returnearly 0.1280s 0.1200s 8.3331 Ops/s 8.0825 Ops/s $\color{#35bf28}+3.10\%$
test_serialize_weights_pickle 1.1081s 0.6578s 1.5202 Ops/s 2.0208 Ops/s $\textbf{\color{#d91a1a}-24.77\%}$
test_reshape_pytree 59.9520μs 23.2655μs 42.9820 KOps/s 43.0918 KOps/s $\color{#d91a1a}-0.25\%$
test_reshape_td 75.0900μs 30.1714μs 33.1440 KOps/s 32.5672 KOps/s $\color{#35bf28}+1.77\%$
test_view_pytree 75.8520μs 23.4797μs 42.5899 KOps/s 43.4425 KOps/s $\color{#d91a1a}-1.96\%$
test_view_td 23.3530μs 4.9837μs 200.6532 KOps/s 207.2732 KOps/s $\color{#d91a1a}-3.19\%$
test_unbind_pytree 61.8650μs 27.0364μs 36.9871 KOps/s 38.1410 KOps/s $\color{#d91a1a}-3.03\%$
test_unbind_td 0.3353ms 56.5329μs 17.6888 KOps/s 17.9865 KOps/s $\color{#d91a1a}-1.65\%$
test_split_pytree 63.8090μs 26.4504μs 37.8066 KOps/s 38.3258 KOps/s $\color{#d91a1a}-1.35\%$
test_split_td 0.5249ms 42.9124μs 23.3033 KOps/s 22.2633 KOps/s $\color{#35bf28}+4.67\%$
test_add_pytree 88.0440μs 31.9926μs 31.2573 KOps/s 30.9523 KOps/s $\color{#35bf28}+0.99\%$
test_add_td 0.1100ms 47.7179μs 20.9565 KOps/s 21.9335 KOps/s $\color{#d91a1a}-4.45\%$
test_distributed 50.2330μs 6.0480μs 165.3447 KOps/s 163.8089 KOps/s $\color{#35bf28}+0.94\%$
test_tdmodule 0.8662ms 24.2807μs 41.1849 KOps/s 44.5800 KOps/s $\textbf{\color{#d91a1a}-7.62\%}$
test_tdmodule_dispatch 0.2146ms 43.3645μs 23.0604 KOps/s 24.9185 KOps/s $\textbf{\color{#d91a1a}-7.46\%}$
test_tdseq 43.4810μs 25.7443μs 38.8436 KOps/s 38.4698 KOps/s $\color{#35bf28}+0.97\%$
test_tdseq_dispatch 0.1361ms 46.8383μs 21.3500 KOps/s 22.0463 KOps/s $\color{#d91a1a}-3.16\%$
test_instantiation_functorch 1.9845ms 1.2820ms 780.0107 Ops/s 775.5775 Ops/s $\color{#35bf28}+0.57\%$
test_instantiation_td 1.4660ms 0.9902ms 1.0099 KOps/s 918.5274 Ops/s $\textbf{\color{#35bf28}+9.95\%}$
test_exec_functorch 0.2400ms 0.1548ms 6.4601 KOps/s 6.4958 KOps/s $\color{#d91a1a}-0.55\%$
test_exec_functional_call 0.4136ms 0.1484ms 6.7406 KOps/s 6.9589 KOps/s $\color{#d91a1a}-3.14\%$
test_exec_td 0.2463ms 0.1433ms 6.9773 KOps/s 7.0452 KOps/s $\color{#d91a1a}-0.96\%$
test_exec_td_decorator 0.7115ms 0.1761ms 5.6786 KOps/s 5.8494 KOps/s $\color{#d91a1a}-2.92\%$
test_vmap_mlp_speed[True-True] 1.1769ms 0.9003ms 1.1108 KOps/s 1.1061 KOps/s $\color{#35bf28}+0.42\%$
test_vmap_mlp_speed[True-False] 0.7587ms 0.4806ms 2.0809 KOps/s 2.1005 KOps/s $\color{#d91a1a}-0.93\%$
test_vmap_mlp_speed[False-True] 1.1277ms 0.7764ms 1.2881 KOps/s 1.2720 KOps/s $\color{#35bf28}+1.26\%$
test_vmap_mlp_speed[False-False] 0.6438ms 0.3893ms 2.5684 KOps/s 2.5685 KOps/s $-0.00\%$
test_vmap_mlp_speed_decorator[True-True] 2.4356ms 1.7952ms 557.0397 Ops/s 557.0783 Ops/s $-0.01\%$
test_vmap_mlp_speed_decorator[True-False] 0.8931ms 0.5291ms 1.8899 KOps/s 1.9161 KOps/s $\color{#d91a1a}-1.37\%$
test_vmap_mlp_speed_decorator[False-True] 1.9823ms 1.5090ms 662.6740 Ops/s 663.4706 Ops/s $\color{#d91a1a}-0.12\%$
test_vmap_mlp_speed_decorator[False-False] 0.6465ms 0.4027ms 2.4835 KOps/s 2.4801 KOps/s $\color{#35bf28}+0.14\%$

@vmoens vmoens merged commit da449cf into main Jan 5, 2024
44 of 47 checks passed
@vmoens vmoens deleted the params-refactor branch January 5, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants