-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Improve in-place ops for TensorDictParams #609
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jan 5, 2024
vmoens
added
documentation
Improvements or additions to documentation
enhancement
New feature or request
labels
Jan 5, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 42.3090μs | 18.1905μs | 54.9737 KOps/s | 62.4993 KOps/s | |
test_plain_set_stack_nested | 0.2590ms | 0.1492ms | 6.7032 KOps/s | 7.1848 KOps/s | |
test_plain_set_nested_inplace | 52.2080μs | 20.3831μs | 49.0602 KOps/s | 54.6697 KOps/s | |
test_plain_set_stack_nested_inplace | 0.4092ms | 0.1813ms | 5.5150 KOps/s | 5.7352 KOps/s | |
test_items | 15.1690μs | 2.4919μs | 401.2944 KOps/s | 418.5972 KOps/s | |
test_items_nested | 0.4887ms | 0.2750ms | 3.6364 KOps/s | 3.5686 KOps/s | |
test_items_nested_locked | 1.3204ms | 0.2731ms | 3.6621 KOps/s | 3.6904 KOps/s | |
test_items_nested_leaf | 0.3158ms | 0.1702ms | 5.8750 KOps/s | 5.9793 KOps/s | |
test_items_stack_nested | 1.4579ms | 1.3084ms | 764.3202 Ops/s | 754.0726 Ops/s | |
test_items_stack_nested_leaf | 2.9020ms | 1.2155ms | 822.7313 Ops/s | 841.1068 Ops/s | |
test_items_stack_nested_locked | 2.5872ms | 0.7857ms | 1.2727 KOps/s | 1.3084 KOps/s | |
test_keys | 18.0040μs | 3.8381μs | 260.5423 KOps/s | 256.6417 KOps/s | |
test_keys_nested | 51.1257ms | 0.1583ms | 6.3151 KOps/s | 6.7481 KOps/s | |
test_keys_nested_locked | 0.2089ms | 0.1490ms | 6.7106 KOps/s | 6.8679 KOps/s | |
test_keys_nested_leaf | 0.2138ms | 0.1309ms | 7.6401 KOps/s | 7.7841 KOps/s | |
test_keys_stack_nested | 1.8744ms | 1.2811ms | 780.5736 Ops/s | 774.5607 Ops/s | |
test_keys_stack_nested_leaf | 2.2178ms | 1.2983ms | 770.2204 Ops/s | 775.2572 Ops/s | |
test_keys_stack_nested_locked | 1.5124ms | 0.7061ms | 1.4162 KOps/s | 1.4391 KOps/s | |
test_values | 18.4804μs | 1.1977μs | 834.9163 KOps/s | 872.9466 KOps/s | |
test_values_nested | 0.1015ms | 52.3755μs | 19.0929 KOps/s | 18.9936 KOps/s | |
test_values_nested_locked | 0.1128ms | 53.2488μs | 18.7798 KOps/s | 19.0841 KOps/s | |
test_values_nested_leaf | 0.1304ms | 47.6791μs | 20.9736 KOps/s | 21.4182 KOps/s | |
test_values_stack_nested | 1.3036ms | 1.0481ms | 954.1339 Ops/s | 943.0064 Ops/s | |
test_values_stack_nested_leaf | 1.1517ms | 1.0328ms | 968.2056 Ops/s | 959.0429 Ops/s | |
test_values_stack_nested_locked | 1.0696ms | 0.5124ms | 1.9514 KOps/s | 1.9235 KOps/s | |
test_membership | 45.7060μs | 1.3495μs | 741.0013 KOps/s | 722.6454 KOps/s | |
test_membership_nested | 37.9210μs | 2.8544μs | 350.3415 KOps/s | 345.2012 KOps/s | |
test_membership_nested_leaf | 39.4040μs | 2.8575μs | 349.9614 KOps/s | 342.0553 KOps/s | |
test_membership_stacked_nested | 33.6130μs | 11.8409μs | 84.4527 KOps/s | 83.4259 KOps/s | |
test_membership_stacked_nested_leaf | 57.3170μs | 11.8118μs | 84.6609 KOps/s | 83.0149 KOps/s | |
test_membership_nested_last | 34.7450μs | 6.0139μs | 166.2827 KOps/s | 165.3360 KOps/s | |
test_membership_nested_leaf_last | 39.5840μs | 6.0205μs | 166.0982 KOps/s | 165.0833 KOps/s | |
test_membership_stacked_nested_last | 0.2423ms | 0.1702ms | 5.8770 KOps/s | 5.9270 KOps/s | |
test_membership_stacked_nested_leaf_last | 61.0540μs | 14.0313μs | 71.2692 KOps/s | 71.7494 KOps/s | |
test_nested_getleaf | 43.8920μs | 10.8682μs | 92.0116 KOps/s | 93.7826 KOps/s | |
test_nested_get | 64.4610μs | 10.1740μs | 98.2894 KOps/s | 98.9209 KOps/s | |
test_stacked_getleaf | 0.5667ms | 0.4704ms | 2.1257 KOps/s | 2.1348 KOps/s | |
test_stacked_get | 0.5214ms | 0.4400ms | 2.2729 KOps/s | 2.2772 KOps/s | |
test_nested_getitemleaf | 49.4820μs | 10.8966μs | 91.7719 KOps/s | 93.2136 KOps/s | |
test_nested_getitem | 51.8760μs | 10.5013μs | 95.2260 KOps/s | 99.9236 KOps/s | |
test_stacked_getitemleaf | 0.5725ms | 0.4745ms | 2.1077 KOps/s | 2.1448 KOps/s | |
test_stacked_getitem | 0.9013ms | 0.4428ms | 2.2584 KOps/s | 2.2881 KOps/s | |
test_lock_nested | 1.2755ms | 0.4085ms | 2.4481 KOps/s | 2.4205 KOps/s | |
test_lock_stack_nested | 80.4807ms | 6.5717ms | 152.1678 Ops/s | 150.8129 Ops/s | |
test_unlock_nested | 70.6863ms | 0.4861ms | 2.0571 KOps/s | 2.3696 KOps/s | |
test_unlock_stack_nested | 78.8660ms | 6.2167ms | 160.8565 Ops/s | 159.2687 Ops/s | |
test_flatten_speed | 0.6174ms | 0.3722ms | 2.6864 KOps/s | 2.7165 KOps/s | |
test_unflatten_speed | 0.5729ms | 0.4615ms | 2.1667 KOps/s | 2.2164 KOps/s | |
test_common_ops | 3.9583ms | 0.7284ms | 1.3729 KOps/s | 1.4743 KOps/s | |
test_creation | 41.4970μs | 1.9784μs | 505.4486 KOps/s | 509.5153 KOps/s | |
test_creation_empty | 50.2240μs | 11.2953μs | 88.5325 KOps/s | 113.9033 KOps/s | |
test_creation_nested_1 | 36.2580μs | 14.2518μs | 70.1665 KOps/s | 85.5375 KOps/s | |
test_creation_nested_2 | 47.3990μs | 19.5923μs | 51.0403 KOps/s | 59.2686 KOps/s | |
test_clone | 0.1276ms | 12.5188μs | 79.8799 KOps/s | 81.4405 KOps/s | |
test_getitem[int] | 28.6340μs | 11.9848μs | 83.4390 KOps/s | 81.8896 KOps/s | |
test_getitem[slice_int] | 62.7670μs | 23.8825μs | 41.8717 KOps/s | 42.0625 KOps/s | |
test_getitem[range] | 86.8220μs | 41.4917μs | 24.1012 KOps/s | 23.5844 KOps/s | |
test_getitem[tuple] | 66.9840μs | 19.2966μs | 51.8225 KOps/s | 51.4350 KOps/s | |
test_getitem[list] | 0.4734ms | 37.5939μs | 26.6000 KOps/s | 26.0530 KOps/s | |
test_setitem_dim[int] | 56.9060μs | 31.3505μs | 31.8974 KOps/s | 33.5831 KOps/s | |
test_setitem_dim[slice_int] | 0.1001ms | 58.3785μs | 17.1296 KOps/s | 18.3034 KOps/s | |
test_setitem_dim[range] | 0.1163ms | 74.9601μs | 13.3404 KOps/s | 13.5617 KOps/s | |
test_setitem_dim[tuple] | 0.1135ms | 48.2785μs | 20.7132 KOps/s | 22.4239 KOps/s | |
test_setitem | 0.1070ms | 18.9078μs | 52.8881 KOps/s | 55.6456 KOps/s | |
test_set | 0.1280ms | 18.5772μs | 53.8293 KOps/s | 58.5519 KOps/s | |
test_set_shared | 3.3934ms | 0.1401ms | 7.1360 KOps/s | 7.1682 KOps/s | |
test_update | 0.1330ms | 22.1788μs | 45.0880 KOps/s | 50.2584 KOps/s | |
test_update_nested | 0.1335ms | 29.4533μs | 33.9520 KOps/s | 36.8342 KOps/s | |
test_set_nested | 99.4350μs | 20.2283μs | 49.4356 KOps/s | 51.8548 KOps/s | |
test_set_nested_new | 0.1056ms | 24.4732μs | 40.8610 KOps/s | 42.0636 KOps/s | |
test_select | 0.2587ms | 48.8578μs | 20.4675 KOps/s | 20.8481 KOps/s | |
test_unbind_speed | 0.4227ms | 0.3438ms | 2.9090 KOps/s | 2.9311 KOps/s | |
test_unbind_speed_stack0 | 71.4436ms | 4.4092ms | 226.7978 Ops/s | 218.6774 Ops/s | |
test_unbind_speed_stack1 | 5.3600μs | 0.6775μs | 1.4761 MOps/s | 1.5236 MOps/s | |
test_split | 2.4354ms | 1.5718ms | 636.2084 Ops/s | 641.3992 Ops/s | |
test_chunk | 64.3525ms | 1.6693ms | 599.0574 Ops/s | 603.0233 Ops/s | |
test_creation[device0] | 0.4871ms | 0.2968ms | 3.3696 KOps/s | 3.4038 KOps/s | |
test_creation_from_tensor | 3.8111ms | 0.3325ms | 3.0071 KOps/s | 3.0176 KOps/s | |
test_add_one[memmap_tensor0] | 75.2910μs | 24.9755μs | 40.0392 KOps/s | 39.3092 KOps/s | |
test_contiguous[memmap_tensor0] | 33.8030μs | 5.8468μs | 171.0331 KOps/s | 171.8132 KOps/s | |
test_stack[memmap_tensor0] | 64.8520μs | 19.1012μs | 52.3526 KOps/s | 51.5778 KOps/s | |
test_memmaptd_index | 0.2803ms | 0.2005ms | 4.9868 KOps/s | 4.9698 KOps/s | |
test_memmaptd_index_astensor | 0.3581ms | 0.2602ms | 3.8431 KOps/s | 3.9003 KOps/s | |
test_memmaptd_index_op | 1.0887ms | 0.5560ms | 1.7986 KOps/s | 1.9128 KOps/s | |
test_serialize_model | 0.1011s | 97.3997ms | 10.2670 Ops/s | 8.5916 Ops/s | |
test_serialize_model_filesystem | 0.1654s | 97.3606ms | 10.2711 Ops/s | 10.2256 Ops/s | |
test_serialize_model_pickle | 0.4513s | 0.3790s | 2.6384 Ops/s | 2.5621 Ops/s | |
test_serialize_weights | 0.1670s | 0.1054s | 9.4833 Ops/s | 9.2846 Ops/s | |
test_serialize_weights_filesystem | 94.6981ms | 91.2399ms | 10.9601 Ops/s | 10.2069 Ops/s | |
test_serialize_weights_returnearly | 0.1280s | 0.1200s | 8.3331 Ops/s | 8.0825 Ops/s | |
test_serialize_weights_pickle | 1.1081s | 0.6578s | 1.5202 Ops/s | 2.0208 Ops/s | |
test_reshape_pytree | 59.9520μs | 23.2655μs | 42.9820 KOps/s | 43.0918 KOps/s | |
test_reshape_td | 75.0900μs | 30.1714μs | 33.1440 KOps/s | 32.5672 KOps/s | |
test_view_pytree | 75.8520μs | 23.4797μs | 42.5899 KOps/s | 43.4425 KOps/s | |
test_view_td | 23.3530μs | 4.9837μs | 200.6532 KOps/s | 207.2732 KOps/s | |
test_unbind_pytree | 61.8650μs | 27.0364μs | 36.9871 KOps/s | 38.1410 KOps/s | |
test_unbind_td | 0.3353ms | 56.5329μs | 17.6888 KOps/s | 17.9865 KOps/s | |
test_split_pytree | 63.8090μs | 26.4504μs | 37.8066 KOps/s | 38.3258 KOps/s | |
test_split_td | 0.5249ms | 42.9124μs | 23.3033 KOps/s | 22.2633 KOps/s | |
test_add_pytree | 88.0440μs | 31.9926μs | 31.2573 KOps/s | 30.9523 KOps/s | |
test_add_td | 0.1100ms | 47.7179μs | 20.9565 KOps/s | 21.9335 KOps/s | |
test_distributed | 50.2330μs | 6.0480μs | 165.3447 KOps/s | 163.8089 KOps/s | |
test_tdmodule | 0.8662ms | 24.2807μs | 41.1849 KOps/s | 44.5800 KOps/s | |
test_tdmodule_dispatch | 0.2146ms | 43.3645μs | 23.0604 KOps/s | 24.9185 KOps/s | |
test_tdseq | 43.4810μs | 25.7443μs | 38.8436 KOps/s | 38.4698 KOps/s | |
test_tdseq_dispatch | 0.1361ms | 46.8383μs | 21.3500 KOps/s | 22.0463 KOps/s | |
test_instantiation_functorch | 1.9845ms | 1.2820ms | 780.0107 Ops/s | 775.5775 Ops/s | |
test_instantiation_td | 1.4660ms | 0.9902ms | 1.0099 KOps/s | 918.5274 Ops/s | |
test_exec_functorch | 0.2400ms | 0.1548ms | 6.4601 KOps/s | 6.4958 KOps/s | |
test_exec_functional_call | 0.4136ms | 0.1484ms | 6.7406 KOps/s | 6.9589 KOps/s | |
test_exec_td | 0.2463ms | 0.1433ms | 6.9773 KOps/s | 7.0452 KOps/s | |
test_exec_td_decorator | 0.7115ms | 0.1761ms | 5.6786 KOps/s | 5.8494 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1769ms | 0.9003ms | 1.1108 KOps/s | 1.1061 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7587ms | 0.4806ms | 2.0809 KOps/s | 2.1005 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.1277ms | 0.7764ms | 1.2881 KOps/s | 1.2720 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6438ms | 0.3893ms | 2.5684 KOps/s | 2.5685 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.4356ms | 1.7952ms | 557.0397 Ops/s | 557.0783 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8931ms | 0.5291ms | 1.8899 KOps/s | 1.9161 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.9823ms | 1.5090ms | 662.6740 Ops/s | 663.4706 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6465ms | 0.4027ms | 2.4835 KOps/s | 2.4801 KOps/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
documentation
Improvements or additions to documentation
enhancement
New feature or request
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Makes it possible to call
zero_
and such on TensorDictParams, and makes these ops active on the data.Also clarifies the docstring of
clone