Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Remove shared/memmap inheritance from clone / select / exclude #624

Merged
merged 4 commits into from
Jan 17, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 17, 2024

Description

@shagunsodhani I may benefit from a bit of feedback on this.

Context

When a tensordict is placed in shared memory or memmaped, it is blocked. We do this to avoid having people writing in it and hoping that these changes will be reflected in another process (which won't be the case). Locking the tensordict ensures that you must first unlock it (and hence loose the is_shared() attribute) before writing.

For some operations (typically all the ops that don't change the data_ptr() ) I thought it was cool to keep the is_shared() attribute since we can be sure that the content is still shared. That means that coming from a shared / memmap tensordict all these ops would return a shared and locked tensordict:

td.view(-1)
td.transpose(0, 1)
td.select("key")
td.exclude("key")
td.clone(recurse=False) # clones the tree structure but not the tensors
td.flatten_keys()
td.unflatten_keys()

Problem

In the past we just copies the private _is_memmap and _is_shared but not the lock: that meant that you ended up with a shared but not locked TD (which is bad!) I solved this in #621

Unfortunately that breaks a lot of stuff in torchrl:

Problem is that usually if you do clone, select or exclude you may want to modify the tensordict that you had.

So the plan now will be:

  • For shape operations (unbind, view, transpose, permute, squeeze, unsqueeze) you keep the lock and the shared/memmap attribute because we assume that these ops are primarily there for you to present your data in a different format
  • Key-based operations (clone(False), select, exclude, flatten_keys, unflatten_keys) do not propagate the shared and lock attribute

Thoughts?

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2024
Copy link

github-actions bot commented Jan 17, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.2890μs 16.9386μs 59.0369 KOps/s 59.6602 KOps/s $\color{#d91a1a}-1.04\%$
test_plain_set_stack_nested 0.1859ms 0.1405ms 7.1163 KOps/s 6.9729 KOps/s $\color{#35bf28}+2.06\%$
test_plain_set_nested_inplace 42.9310μs 18.9778μs 52.6931 KOps/s 51.6330 KOps/s $\color{#35bf28}+2.05\%$
test_plain_set_stack_nested_inplace 0.2962ms 0.1742ms 5.7398 KOps/s 5.6393 KOps/s $\color{#35bf28}+1.78\%$
test_items 14.0360μs 2.5062μs 399.0073 KOps/s 407.1817 KOps/s $\color{#d91a1a}-2.01\%$
test_items_nested 1.0618ms 0.2726ms 3.6689 KOps/s 3.6607 KOps/s $\color{#35bf28}+0.22\%$
test_items_nested_locked 0.3292ms 0.2720ms 3.6760 KOps/s 3.6400 KOps/s $\color{#35bf28}+0.99\%$
test_items_nested_leaf 0.5716ms 0.1695ms 5.8982 KOps/s 5.9558 KOps/s $\color{#d91a1a}-0.97\%$
test_items_stack_nested 1.6044ms 1.3337ms 749.8019 Ops/s 741.7511 Ops/s $\color{#35bf28}+1.09\%$
test_items_stack_nested_leaf 1.3147ms 1.1994ms 833.7642 Ops/s 806.6262 Ops/s $\color{#35bf28}+3.36\%$
test_items_stack_nested_locked 0.9955ms 0.8694ms 1.1502 KOps/s 1.1151 KOps/s $\color{#35bf28}+3.14\%$
test_keys 20.0080μs 3.8631μs 258.8607 KOps/s 256.9797 KOps/s $\color{#35bf28}+0.73\%$
test_keys_nested 48.2249ms 0.1588ms 6.2961 KOps/s 6.6597 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_keys_nested_locked 0.2577ms 0.1561ms 6.4053 KOps/s 6.4663 KOps/s $\color{#d91a1a}-0.94\%$
test_keys_nested_leaf 0.2366ms 0.1310ms 7.6307 KOps/s 7.6512 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_stack_nested 1.4084ms 1.2843ms 778.6379 Ops/s 771.5054 Ops/s $\color{#35bf28}+0.92\%$
test_keys_stack_nested_leaf 1.4993ms 1.2745ms 784.6350 Ops/s 773.0753 Ops/s $\color{#35bf28}+1.50\%$
test_keys_stack_nested_locked 1.0147ms 0.8101ms 1.2344 KOps/s 1.1906 KOps/s $\color{#35bf28}+3.68\%$
test_values 7.3587μs 1.2240μs 817.0106 KOps/s 909.7908 KOps/s $\textbf{\color{#d91a1a}-10.20\%}$
test_values_nested 97.6520μs 51.1912μs 19.5346 KOps/s 19.1978 KOps/s $\color{#35bf28}+1.75\%$
test_values_nested_locked 99.8370μs 51.7165μs 19.3362 KOps/s 19.4083 KOps/s $\color{#d91a1a}-0.37\%$
test_values_nested_leaf 0.1342ms 45.4476μs 22.0034 KOps/s 21.6535 KOps/s $\color{#35bf28}+1.62\%$
test_values_stack_nested 1.2796ms 1.0487ms 953.5922 Ops/s 954.7465 Ops/s $\color{#d91a1a}-0.12\%$
test_values_stack_nested_leaf 1.6131ms 1.0383ms 963.1018 Ops/s 923.4548 Ops/s $\color{#35bf28}+4.29\%$
test_values_stack_nested_locked 1.0163ms 0.6182ms 1.6176 KOps/s 1.5886 KOps/s $\color{#35bf28}+1.83\%$
test_membership 13.6060μs 1.3050μs 766.2579 KOps/s 714.9944 KOps/s $\textbf{\color{#35bf28}+7.17\%}$
test_membership_nested 41.7280μs 3.4475μs 290.0659 KOps/s 288.6202 KOps/s $\color{#35bf28}+0.50\%$
test_membership_nested_leaf 26.8500μs 3.4470μs 290.1094 KOps/s 296.1731 KOps/s $\color{#d91a1a}-2.05\%$
test_membership_stacked_nested 48.7910μs 12.0983μs 82.6561 KOps/s 82.2248 KOps/s $\color{#35bf28}+0.52\%$
test_membership_stacked_nested_leaf 35.5070μs 12.1105μs 82.5727 KOps/s 80.5293 KOps/s $\color{#35bf28}+2.54\%$
test_membership_nested_last 36.8190μs 6.6402μs 150.5981 KOps/s 151.4362 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested_leaf_last 29.4350μs 6.6871μs 149.5410 KOps/s 151.1608 KOps/s $\color{#d91a1a}-1.07\%$
test_membership_stacked_nested_last 0.3954ms 0.1727ms 5.7897 KOps/s 5.5188 KOps/s $\color{#35bf28}+4.91\%$
test_membership_stacked_nested_leaf_last 53.7710μs 14.3136μs 69.8635 KOps/s 69.5821 KOps/s $\color{#35bf28}+0.40\%$
test_nested_getleaf 48.6010μs 10.3813μs 96.3268 KOps/s 91.7751 KOps/s $\color{#35bf28}+4.96\%$
test_nested_get 30.5870μs 9.7941μs 102.1025 KOps/s 97.3025 KOps/s $\color{#35bf28}+4.93\%$
test_stacked_getleaf 0.6597ms 0.4003ms 2.4981 KOps/s 2.5207 KOps/s $\color{#d91a1a}-0.90\%$
test_stacked_get 0.6643ms 0.3703ms 2.7002 KOps/s 2.7132 KOps/s $\color{#d91a1a}-0.48\%$
test_nested_getitemleaf 36.8790μs 10.4062μs 96.0964 KOps/s 93.4833 KOps/s $\color{#35bf28}+2.80\%$
test_nested_getitem 35.6370μs 9.8850μs 101.1629 KOps/s 98.7431 KOps/s $\color{#35bf28}+2.45\%$
test_stacked_getitemleaf 0.7243ms 0.4019ms 2.4880 KOps/s 2.4836 KOps/s $\color{#35bf28}+0.18\%$
test_stacked_getitem 0.5587ms 0.3696ms 2.7057 KOps/s 2.7300 KOps/s $\color{#d91a1a}-0.89\%$
test_lock_nested 1.2829ms 0.3902ms 2.5625 KOps/s 2.5178 KOps/s $\color{#35bf28}+1.77\%$
test_lock_stack_nested 81.4792ms 6.5024ms 153.7889 Ops/s 154.4270 Ops/s $\color{#d91a1a}-0.41\%$
test_unlock_nested 70.4953ms 0.4633ms 2.1584 KOps/s 2.5495 KOps/s $\textbf{\color{#d91a1a}-15.34\%}$
test_unlock_stack_nested 81.5934ms 6.1058ms 163.7794 Ops/s 166.1743 Ops/s $\color{#d91a1a}-1.44\%$
test_flatten_speed 0.6276ms 0.3641ms 2.7463 KOps/s 2.6964 KOps/s $\color{#35bf28}+1.85\%$
test_unflatten_speed 0.6529ms 0.4504ms 2.2203 KOps/s 2.1842 KOps/s $\color{#35bf28}+1.65\%$
test_common_ops 3.9323ms 0.6699ms 1.4928 KOps/s 1.4685 KOps/s $\color{#35bf28}+1.66\%$
test_creation 15.3390μs 1.9279μs 518.6997 KOps/s 544.1994 KOps/s $\color{#d91a1a}-4.69\%$
test_creation_empty 45.4650μs 9.9300μs 100.7052 KOps/s 95.1942 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_creation_nested_1 31.0090μs 12.4779μs 80.1417 KOps/s 77.9646 KOps/s $\color{#35bf28}+2.79\%$
test_creation_nested_2 0.1079ms 15.8204μs 63.2095 KOps/s 62.1888 KOps/s $\color{#35bf28}+1.64\%$
test_clone 0.2255ms 14.0684μs 71.0812 KOps/s 77.3347 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_getitem[int] 33.9640μs 11.0734μs 90.3063 KOps/s 89.6644 KOps/s $\color{#35bf28}+0.72\%$
test_getitem[slice_int] 61.6250μs 22.0697μs 45.3111 KOps/s 44.7815 KOps/s $\color{#35bf28}+1.18\%$
test_getitem[range] 85.2390μs 39.6212μs 25.2390 KOps/s 24.4706 KOps/s $\color{#35bf28}+3.14\%$
test_getitem[tuple] 55.4040μs 18.3067μs 54.6248 KOps/s 54.9633 KOps/s $\color{#d91a1a}-0.62\%$
test_getitem[list] 78.3770μs 35.2188μs 28.3939 KOps/s 26.8868 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_setitem_dim[int] 60.6040μs 30.9644μs 32.2952 KOps/s 32.6556 KOps/s $\color{#d91a1a}-1.10\%$
test_setitem_dim[slice_int] 0.1144ms 55.9193μs 17.8829 KOps/s 17.5232 KOps/s $\color{#35bf28}+2.05\%$
test_setitem_dim[range] 0.1372ms 74.7204μs 13.3832 KOps/s 13.4004 KOps/s $\color{#d91a1a}-0.13\%$
test_setitem_dim[tuple] 87.2740μs 44.7652μs 22.3388 KOps/s 22.1361 KOps/s $\color{#35bf28}+0.92\%$
test_setitem 0.2559ms 19.8637μs 50.3432 KOps/s 50.6162 KOps/s $\color{#d91a1a}-0.54\%$
test_set 0.1619ms 19.0798μs 52.4116 KOps/s 51.7975 KOps/s $\color{#35bf28}+1.19\%$
test_set_shared 3.2159ms 0.1393ms 7.1811 KOps/s 7.0870 KOps/s $\color{#35bf28}+1.33\%$
test_update 0.2345ms 21.4769μs 46.5618 KOps/s 44.8736 KOps/s $\color{#35bf28}+3.76\%$
test_update_nested 0.2384ms 29.2373μs 34.2028 KOps/s 32.9524 KOps/s $\color{#35bf28}+3.79\%$
test_set_nested 0.1630ms 21.1267μs 47.3334 KOps/s 47.3968 KOps/s $\color{#d91a1a}-0.13\%$
test_set_nested_new 0.1506ms 25.5473μs 39.1431 KOps/s 39.8361 KOps/s $\color{#d91a1a}-1.74\%$
test_select 0.2761ms 38.2821μs 26.1218 KOps/s 25.9240 KOps/s $\color{#35bf28}+0.76\%$
test_select_nested 0.1107ms 57.8653μs 17.2815 KOps/s 16.3649 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_exclude_nested 0.2787ms 0.1080ms 9.2588 KOps/s 8.8417 KOps/s $\color{#35bf28}+4.72\%$
test_empty[True] 0.5275ms 0.3192ms 3.1333 KOps/s 3.0846 KOps/s $\color{#35bf28}+1.58\%$
test_empty[False] 7.6704μs 1.0531μs 949.5375 KOps/s 986.6617 KOps/s $\color{#d91a1a}-3.76\%$
test_unbind_speed 0.5437ms 0.3226ms 3.0995 KOps/s 3.1585 KOps/s $\color{#d91a1a}-1.87\%$
test_unbind_speed_stack0 75.1477ms 3.9142ms 255.4789 Ops/s 224.6859 Ops/s $\textbf{\color{#35bf28}+13.70\%}$
test_unbind_speed_stack1 2.3104μs 0.6320μs 1.5822 MOps/s 1.5745 MOps/s $\color{#35bf28}+0.49\%$
test_split 72.4722ms 1.6060ms 622.6782 Ops/s 684.9593 Ops/s $\textbf{\color{#d91a1a}-9.09\%}$
test_chunk 70.0045ms 1.5794ms 633.1364 Ops/s 646.5099 Ops/s $\color{#d91a1a}-2.07\%$
test_creation[device0] 4.0314ms 0.1048ms 9.5418 KOps/s 9.8661 KOps/s $\color{#d91a1a}-3.29\%$
test_creation_from_tensor 0.2363ms 84.0051μs 11.9040 KOps/s 12.0808 KOps/s $\color{#d91a1a}-1.46\%$
test_add_one[memmap_tensor0] 0.7008ms 5.4741μs 182.6793 KOps/s 194.2736 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_contiguous[memmap_tensor0] 15.9700μs 0.6373μs 1.5691 MOps/s 1.5382 MOps/s $\color{#35bf28}+2.01\%$
test_stack[memmap_tensor0] 80.5410μs 3.6435μs 274.4636 KOps/s 286.6242 KOps/s $\color{#d91a1a}-4.24\%$
test_memmaptd_index 0.9572ms 0.2204ms 4.5373 KOps/s 4.5790 KOps/s $\color{#d91a1a}-0.91\%$
test_memmaptd_index_astensor 0.8331ms 0.2865ms 3.4908 KOps/s 3.6242 KOps/s $\color{#d91a1a}-3.68\%$
test_memmaptd_index_op 0.9333ms 0.5643ms 1.7722 KOps/s 1.7512 KOps/s $\color{#35bf28}+1.20\%$
test_serialize_model 0.1679s 0.1118s 8.9443 Ops/s 9.8357 Ops/s $\textbf{\color{#d91a1a}-9.06\%}$
test_serialize_model_pickle 0.4482s 0.3773s 2.6505 Ops/s 2.5956 Ops/s $\color{#35bf28}+2.12\%$
test_serialize_weights 0.1008s 98.1991ms 10.1834 Ops/s 9.2937 Ops/s $\textbf{\color{#35bf28}+9.57\%}$
test_serialize_weights_returnearly 0.1935s 0.1322s 7.5670 Ops/s 7.4707 Ops/s $\color{#35bf28}+1.29\%$
test_serialize_weights_pickle 0.6441s 0.5165s 1.9361 Ops/s 2.3574 Ops/s $\textbf{\color{#d91a1a}-17.87\%}$
test_serialize_weights_filesystem 0.1019s 92.0737ms 10.8609 Ops/s 10.5353 Ops/s $\color{#35bf28}+3.09\%$
test_serialize_model_filesystem 0.1636s 0.1001s 9.9919 Ops/s 11.0915 Ops/s $\textbf{\color{#d91a1a}-9.91\%}$
test_reshape_pytree 54.8030μs 22.9998μs 43.4786 KOps/s 43.0717 KOps/s $\color{#35bf28}+0.94\%$
test_reshape_td 67.5770μs 29.7799μs 33.5797 KOps/s 33.0899 KOps/s $\color{#35bf28}+1.48\%$
test_view_pytree 89.3450μs 22.9332μs 43.6049 KOps/s 43.1154 KOps/s $\color{#35bf28}+1.14\%$
test_view_td 23.7550μs 4.8695μs 205.3591 KOps/s 205.1644 KOps/s $\color{#35bf28}+0.09\%$
test_unbind_pytree 74.9000μs 26.7364μs 37.4022 KOps/s 37.4725 KOps/s $\color{#d91a1a}-0.19\%$
test_unbind_td 0.1068ms 49.6731μs 20.1316 KOps/s 19.9238 KOps/s $\color{#35bf28}+1.04\%$
test_split_pytree 72.5050μs 26.1610μs 38.2249 KOps/s 37.8936 KOps/s $\color{#35bf28}+0.87\%$
test_split_td 0.5874ms 41.0676μs 24.3501 KOps/s 24.4809 KOps/s $\color{#d91a1a}-0.53\%$
test_add_pytree 73.0470μs 32.4915μs 30.7773 KOps/s 30.8714 KOps/s $\color{#d91a1a}-0.31\%$
test_add_td 0.1219ms 50.5245μs 19.7924 KOps/s 19.4473 KOps/s $\color{#35bf28}+1.77\%$
test_distributed 0.2264ms 97.5997μs 10.2459 KOps/s 9.8100 KOps/s $\color{#35bf28}+4.44\%$
test_tdmodule 0.7514ms 23.2421μs 43.0253 KOps/s 43.4493 KOps/s $\color{#d91a1a}-0.98\%$
test_tdmodule_dispatch 0.2097ms 40.0978μs 24.9390 KOps/s 24.5050 KOps/s $\color{#35bf28}+1.77\%$
test_tdseq 54.9630μs 25.4597μs 39.2777 KOps/s 39.6679 KOps/s $\color{#d91a1a}-0.98\%$
test_tdseq_dispatch 0.1507ms 45.0776μs 22.1839 KOps/s 22.5768 KOps/s $\color{#d91a1a}-1.74\%$
test_instantiation_functorch 1.7439ms 1.2864ms 777.3358 Ops/s 769.0395 Ops/s $\color{#35bf28}+1.08\%$
test_instantiation_td 1.6259ms 1.0046ms 995.4550 Ops/s 994.8229 Ops/s $\color{#35bf28}+0.06\%$
test_exec_functorch 0.2908ms 0.1583ms 6.3158 KOps/s 6.3262 KOps/s $\color{#d91a1a}-0.16\%$
test_exec_functional_call 0.2807ms 0.1470ms 6.8030 KOps/s 6.9104 KOps/s $\color{#d91a1a}-1.55\%$
test_exec_td 0.2209ms 0.1418ms 7.0528 KOps/s 6.9258 KOps/s $\color{#35bf28}+1.83\%$
test_exec_td_decorator 0.9016ms 0.1756ms 5.6948 KOps/s 5.5813 KOps/s $\color{#35bf28}+2.03\%$
test_vmap_mlp_speed[True-True] 1.7351ms 0.9171ms 1.0904 KOps/s 1.0956 KOps/s $\color{#d91a1a}-0.48\%$
test_vmap_mlp_speed[True-False] 0.8289ms 0.4817ms 2.0759 KOps/s 2.0810 KOps/s $\color{#d91a1a}-0.25\%$
test_vmap_mlp_speed[False-True] 1.0559ms 0.7925ms 1.2618 KOps/s 1.2575 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed[False-False] 0.6922ms 0.3920ms 2.5508 KOps/s 2.5532 KOps/s $\color{#d91a1a}-0.09\%$
test_vmap_mlp_speed_decorator[True-True] 3.0883ms 2.4047ms 415.8560 Ops/s 405.7896 Ops/s $\color{#35bf28}+2.48\%$
test_vmap_mlp_speed_decorator[True-False] 1.0524ms 0.5291ms 1.8901 KOps/s 1.8875 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[False-True] 2.6140ms 1.9652ms 508.8506 Ops/s 497.8718 Ops/s $\color{#35bf28}+2.21\%$
test_vmap_mlp_speed_decorator[False-False] 70.7597ms 0.4328ms 2.3107 KOps/s 2.4870 KOps/s $\textbf{\color{#d91a1a}-7.09\%}$

@vmoens vmoens added the bug Something isn't working label Jan 17, 2024
Copy link

github-actions bot commented Jan 17, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1455ms 14.3114μs 69.8742 KOps/s 74.2509 KOps/s $\textbf{\color{#d91a1a}-5.89\%}$
test_plain_set_stack_nested 0.2618ms 0.1211ms 8.2573 KOps/s 8.4109 KOps/s $\color{#d91a1a}-1.83\%$
test_plain_set_nested_inplace 0.1467ms 15.7448μs 63.5132 KOps/s 67.5397 KOps/s $\textbf{\color{#d91a1a}-5.96\%}$
test_plain_set_stack_nested_inplace 0.2879ms 0.1485ms 6.7344 KOps/s 6.7936 KOps/s $\color{#d91a1a}-0.87\%$
test_items 0.1192ms 4.8356μs 206.7997 KOps/s 209.7806 KOps/s $\color{#d91a1a}-1.42\%$
test_items_nested 0.4625ms 0.3418ms 2.9254 KOps/s 2.9301 KOps/s $\color{#d91a1a}-0.16\%$
test_items_nested_locked 0.3871ms 0.3449ms 2.8991 KOps/s 2.9110 KOps/s $\color{#d91a1a}-0.41\%$
test_items_nested_leaf 0.7181ms 0.2024ms 4.9411 KOps/s 4.9454 KOps/s $\color{#d91a1a}-0.09\%$
test_items_stack_nested 1.3993ms 1.3224ms 756.2146 Ops/s 758.0649 Ops/s $\color{#d91a1a}-0.24\%$
test_items_stack_nested_leaf 1.2036ms 1.1524ms 867.7376 Ops/s 878.0033 Ops/s $\color{#d91a1a}-1.17\%$
test_items_stack_nested_locked 0.9797ms 0.9215ms 1.0851 KOps/s 1.0834 KOps/s $\color{#35bf28}+0.16\%$
test_keys 23.4600μs 4.6085μs 216.9926 KOps/s 218.1781 KOps/s $\color{#d91a1a}-0.54\%$
test_keys_nested 1.5706ms 95.7549μs 10.4433 KOps/s 10.5486 KOps/s $\color{#d91a1a}-1.00\%$
test_keys_nested_locked 0.1247ms 98.8597μs 10.1153 KOps/s 10.1640 KOps/s $\color{#d91a1a}-0.48\%$
test_keys_nested_leaf 0.1861ms 79.0062μs 12.6572 KOps/s 12.6946 KOps/s $\color{#d91a1a}-0.29\%$
test_keys_stack_nested 1.2593ms 1.1691ms 855.3567 Ops/s 855.5136 Ops/s $\color{#d91a1a}-0.02\%$
test_keys_stack_nested_leaf 1.3777ms 1.1536ms 866.8177 Ops/s 864.2538 Ops/s $\color{#35bf28}+0.30\%$
test_keys_stack_nested_locked 0.8502ms 0.7618ms 1.3128 KOps/s 1.3573 KOps/s $\color{#d91a1a}-3.28\%$
test_values 7.4500μs 1.9021μs 525.7354 KOps/s 532.4510 KOps/s $\color{#d91a1a}-1.26\%$
test_values_nested 65.1610μs 45.7644μs 21.8510 KOps/s 22.0624 KOps/s $\color{#d91a1a}-0.96\%$
test_values_nested_locked 71.2110μs 48.1040μs 20.7883 KOps/s 21.0466 KOps/s $\color{#d91a1a}-1.23\%$
test_values_nested_leaf 93.9920μs 40.0274μs 24.9829 KOps/s 25.1810 KOps/s $\color{#d91a1a}-0.79\%$
test_values_stack_nested 1.1951ms 0.9659ms 1.0353 KOps/s 1.0359 KOps/s $\color{#d91a1a}-0.06\%$
test_values_stack_nested_leaf 1.0053ms 0.9536ms 1.0486 KOps/s 1.0470 KOps/s $\color{#35bf28}+0.16\%$
test_values_stack_nested_locked 0.6566ms 0.5914ms 1.6909 KOps/s 1.7122 KOps/s $\color{#d91a1a}-1.24\%$
test_membership 23.8100μs 1.1073μs 903.1064 KOps/s 1.0428 MOps/s $\textbf{\color{#d91a1a}-13.39\%}$
test_membership_nested 51.8410μs 2.9406μs 340.0667 KOps/s 338.9692 KOps/s $\color{#35bf28}+0.32\%$
test_membership_nested_leaf 20.2100μs 2.9532μs 338.6152 KOps/s 338.6660 KOps/s $\color{#d91a1a}-0.02\%$
test_membership_stacked_nested 33.4000μs 11.2527μs 88.8673 KOps/s 89.1557 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_stacked_nested_leaf 31.9900μs 11.2872μs 88.5960 KOps/s 88.6289 KOps/s $\color{#d91a1a}-0.04\%$
test_membership_nested_last 23.1000μs 5.4096μs 184.8549 KOps/s 184.6922 KOps/s $\color{#35bf28}+0.09\%$
test_membership_nested_leaf_last 23.9700μs 5.4351μs 183.9890 KOps/s 185.4654 KOps/s $\color{#d91a1a}-0.80\%$
test_membership_stacked_nested_last 0.1726ms 0.1442ms 6.9352 KOps/s 6.9684 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_stacked_nested_leaf_last 62.4810μs 13.2765μs 75.3210 KOps/s 75.9614 KOps/s $\color{#d91a1a}-0.84\%$
test_nested_getleaf 30.6100μs 8.4019μs 119.0207 KOps/s 118.5097 KOps/s $\color{#35bf28}+0.43\%$
test_nested_get 35.2600μs 7.9921μs 125.1240 KOps/s 124.6804 KOps/s $\color{#35bf28}+0.36\%$
test_stacked_getleaf 1.1712ms 0.3211ms 3.1144 KOps/s 3.1078 KOps/s $\color{#35bf28}+0.21\%$
test_stacked_get 0.3198ms 0.2845ms 3.5145 KOps/s 3.4739 KOps/s $\color{#35bf28}+1.17\%$
test_nested_getitemleaf 22.9200μs 8.4476μs 118.3773 KOps/s 118.4795 KOps/s $\color{#d91a1a}-0.09\%$
test_nested_getitem 22.7210μs 8.0070μs 124.8908 KOps/s 124.7536 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_getitemleaf 0.3771ms 0.3229ms 3.0966 KOps/s 3.1000 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_getitem 0.3409ms 0.2909ms 3.4381 KOps/s 3.4284 KOps/s $\color{#35bf28}+0.28\%$
test_lock_nested 7.1087ms 0.4240ms 2.3586 KOps/s 2.4483 KOps/s $\color{#d91a1a}-3.67\%$
test_lock_stack_nested 83.4369ms 6.5117ms 153.5708 Ops/s 156.4724 Ops/s $\color{#d91a1a}-1.85\%$
test_unlock_nested 0.8096ms 0.4108ms 2.4340 KOps/s 2.4709 KOps/s $\color{#d91a1a}-1.49\%$
test_unlock_stack_nested 83.7354ms 6.8483ms 146.0220 Ops/s 145.4811 Ops/s $\color{#35bf28}+0.37\%$
test_flatten_speed 76.4320ms 0.2860ms 3.4968 KOps/s 3.7809 KOps/s $\textbf{\color{#d91a1a}-7.51\%}$
test_unflatten_speed 0.4087ms 0.3638ms 2.7489 KOps/s 2.7562 KOps/s $\color{#d91a1a}-0.26\%$
test_common_ops 1.0938ms 0.6360ms 1.5723 KOps/s 1.6591 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_creation 13.9900μs 1.5908μs 628.6097 KOps/s 629.1395 KOps/s $\color{#d91a1a}-0.08\%$
test_creation_empty 24.3700μs 9.6024μs 104.1401 KOps/s 123.1066 KOps/s $\textbf{\color{#d91a1a}-15.41\%}$
test_creation_nested_1 30.0210μs 11.3914μs 87.7856 KOps/s 100.9130 KOps/s $\textbf{\color{#d91a1a}-13.01\%}$
test_creation_nested_2 37.6710μs 13.8790μs 72.0511 KOps/s 80.7371 KOps/s $\textbf{\color{#d91a1a}-10.76\%}$
test_clone 42.7610μs 13.8477μs 72.2143 KOps/s 73.9101 KOps/s $\color{#d91a1a}-2.29\%$
test_getitem[int] 52.2110μs 11.2019μs 89.2704 KOps/s 90.2336 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[slice_int] 43.9110μs 21.9539μs 45.5500 KOps/s 44.9223 KOps/s $\color{#35bf28}+1.40\%$
test_getitem[range] 66.4810μs 37.6578μs 26.5549 KOps/s 26.9676 KOps/s $\color{#d91a1a}-1.53\%$
test_getitem[tuple] 37.0010μs 19.5947μs 51.0341 KOps/s 51.2907 KOps/s $\color{#d91a1a}-0.50\%$
test_getitem[list] 59.6010μs 34.1074μs 29.3192 KOps/s 28.7976 KOps/s $\color{#35bf28}+1.81\%$
test_setitem_dim[int] 47.8010μs 28.6115μs 34.9509 KOps/s 36.4315 KOps/s $\color{#d91a1a}-4.06\%$
test_setitem_dim[slice_int] 69.0410μs 51.1806μs 19.5386 KOps/s 19.9754 KOps/s $\color{#d91a1a}-2.19\%$
test_setitem_dim[range] 0.1027ms 64.4552μs 15.5147 KOps/s 15.7422 KOps/s $\color{#d91a1a}-1.45\%$
test_setitem_dim[tuple] 72.4810μs 44.3011μs 22.5728 KOps/s 23.1271 KOps/s $\color{#d91a1a}-2.40\%$
test_setitem 0.1210ms 19.4449μs 51.4274 KOps/s 53.5379 KOps/s $\color{#d91a1a}-3.94\%$
test_set 0.1162ms 18.9621μs 52.7367 KOps/s 54.9180 KOps/s $\color{#d91a1a}-3.97\%$
test_set_shared 2.7272ms 0.1035ms 9.6578 KOps/s 9.5917 KOps/s $\color{#35bf28}+0.69\%$
test_update 0.1118ms 22.0207μs 45.4119 KOps/s 49.1886 KOps/s $\textbf{\color{#d91a1a}-7.68\%}$
test_update_nested 0.1389ms 28.8037μs 34.7178 KOps/s 36.6568 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_set_nested 0.1086ms 20.3345μs 49.1776 KOps/s 51.4352 KOps/s $\color{#d91a1a}-4.39\%$
test_set_nested_new 0.1461ms 23.0911μs 43.3067 KOps/s 43.8457 KOps/s $\color{#d91a1a}-1.23\%$
test_select 60.5610μs 36.1147μs 27.6896 KOps/s 27.5858 KOps/s $\color{#35bf28}+0.38\%$
test_select_nested 70.6220μs 53.9351μs 18.5408 KOps/s 17.9740 KOps/s $\color{#35bf28}+3.15\%$
test_exclude_nested 0.1331ms 0.1110ms 9.0064 KOps/s 9.1260 KOps/s $\color{#d91a1a}-1.31\%$
test_empty[True] 0.3470ms 0.3250ms 3.0767 KOps/s 3.1274 KOps/s $\color{#d91a1a}-1.62\%$
test_empty[False] 7.0771μs 0.8628μs 1.1590 MOps/s 1.1598 MOps/s $\color{#d91a1a}-0.07\%$
test_to 73.3220μs 53.4790μs 18.6989 KOps/s 18.8983 KOps/s $\color{#d91a1a}-1.05\%$
test_to_nonblocking 0.1819ms 33.4945μs 29.8556 KOps/s 30.1714 KOps/s $\color{#d91a1a}-1.05\%$
test_unbind_speed 0.3588ms 0.3317ms 3.0145 KOps/s 3.0765 KOps/s $\color{#d91a1a}-2.02\%$
test_unbind_speed_stack0 79.1580ms 3.9088ms 255.8318 Ops/s 266.5742 Ops/s $\color{#d91a1a}-4.03\%$
test_unbind_speed_stack1 1.6216μs 0.5648μs 1.7706 MOps/s 1.8645 MOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_split 76.0209ms 1.6636ms 601.1196 Ops/s 596.7511 Ops/s $\color{#35bf28}+0.73\%$
test_chunk 75.2368ms 1.6464ms 607.3954 Ops/s 642.4269 Ops/s $\textbf{\color{#d91a1a}-5.45\%}$
test_creation[device0] 0.1409ms 73.1134μs 13.6774 KOps/s 12.5668 KOps/s $\textbf{\color{#35bf28}+8.84\%}$
test_creation_from_tensor 0.1869ms 53.3730μs 18.7360 KOps/s 17.0657 KOps/s $\textbf{\color{#35bf28}+9.79\%}$
test_add_one[memmap_tensor0] 72.9110μs 7.2352μs 138.2133 KOps/s 137.3295 KOps/s $\color{#35bf28}+0.64\%$
test_contiguous[memmap_tensor0] 13.0420μs 0.6485μs 1.5420 MOps/s 1.5065 MOps/s $\color{#35bf28}+2.35\%$
test_stack[memmap_tensor0] 29.5410μs 4.5838μs 218.1599 KOps/s 215.6404 KOps/s $\color{#35bf28}+1.17\%$
test_memmaptd_index 1.1439ms 0.2621ms 3.8155 KOps/s 3.7234 KOps/s $\color{#35bf28}+2.47\%$
test_memmaptd_index_astensor 0.6342ms 0.3204ms 3.1212 KOps/s 3.0587 KOps/s $\color{#35bf28}+2.04\%$
test_memmaptd_index_op 0.9440ms 0.6357ms 1.5732 KOps/s 1.6127 KOps/s $\color{#d91a1a}-2.45\%$
test_serialize_model 0.1713s 98.1766ms 10.1857 Ops/s 10.5638 Ops/s $\color{#d91a1a}-3.58\%$
test_serialize_model_pickle 1.3524s 1.2357s 0.8092 Ops/s 0.8075 Ops/s $\color{#35bf28}+0.21\%$
test_serialize_weights 0.1666s 95.7195ms 10.4472 Ops/s 9.7269 Ops/s $\textbf{\color{#35bf28}+7.40\%}$
test_serialize_weights_returnearly 0.2437s 72.7801ms 13.7400 Ops/s 12.7676 Ops/s $\textbf{\color{#35bf28}+7.62\%}$
test_serialize_weights_pickle 1.3499s 1.2366s 0.8087 Ops/s 0.8081 Ops/s $\color{#35bf28}+0.07\%$
test_reshape_pytree 54.4010μs 24.7216μs 40.4505 KOps/s 40.2897 KOps/s $\color{#35bf28}+0.40\%$
test_reshape_td 69.5310μs 29.3283μs 34.0968 KOps/s 33.7475 KOps/s $\color{#35bf28}+1.03\%$
test_view_pytree 49.8910μs 24.4770μs 40.8547 KOps/s 40.3201 KOps/s $\color{#35bf28}+1.33\%$
test_view_td 29.2510μs 4.2396μs 235.8695 KOps/s 235.2203 KOps/s $\color{#35bf28}+0.28\%$
test_unbind_pytree 0.1468ms 31.1125μs 32.1414 KOps/s 32.2104 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_td 0.1412ms 51.8813μs 19.2748 KOps/s 19.4172 KOps/s $\color{#d91a1a}-0.73\%$
test_split_pytree 48.9010μs 28.9207μs 34.5773 KOps/s 34.8089 KOps/s $\color{#d91a1a}-0.67\%$
test_split_td 0.7214ms 40.5487μs 24.6617 KOps/s 24.7573 KOps/s $\color{#d91a1a}-0.39\%$
test_add_pytree 61.2810μs 37.6843μs 26.5363 KOps/s 26.4829 KOps/s $\color{#35bf28}+0.20\%$
test_add_td 78.3820μs 51.9838μs 19.2368 KOps/s 19.3492 KOps/s $\color{#d91a1a}-0.58\%$
test_distributed 0.2221ms 70.7222μs 14.1398 KOps/s 9.6632 KOps/s $\textbf{\color{#35bf28}+46.33\%}$
test_tdmodule 0.1041ms 18.6882μs 53.5098 KOps/s 57.3867 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_tdmodule_dispatch 0.1340ms 35.0506μs 28.5302 KOps/s 29.3954 KOps/s $\color{#d91a1a}-2.94\%$
test_tdseq 44.8000μs 21.7094μs 46.0630 KOps/s 48.7115 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_tdseq_dispatch 54.7100μs 38.7468μs 25.8086 KOps/s 27.1922 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_instantiation_functorch 1.8125ms 1.6762ms 596.6030 Ops/s 592.6178 Ops/s $\color{#35bf28}+0.67\%$
test_instantiation_td 1.7368ms 1.1709ms 854.0749 Ops/s 849.8057 Ops/s $\color{#35bf28}+0.50\%$
test_exec_functorch 0.2845ms 0.1612ms 6.2043 KOps/s 6.1968 KOps/s $\color{#35bf28}+0.12\%$
test_exec_functional_call 0.2035ms 0.1612ms 6.2033 KOps/s 6.2694 KOps/s $\color{#d91a1a}-1.05\%$
test_exec_td 0.1825ms 0.1509ms 6.6285 KOps/s 6.5515 KOps/s $\color{#35bf28}+1.18\%$
test_exec_td_decorator 0.9101ms 0.1914ms 5.2258 KOps/s 5.2029 KOps/s $\color{#35bf28}+0.44\%$
test_vmap_mlp_speed[True-True] 1.4015ms 1.1224ms 890.9360 Ops/s 894.8465 Ops/s $\color{#d91a1a}-0.44\%$
test_vmap_mlp_speed[True-False] 0.7467ms 0.6782ms 1.4745 KOps/s 1.5016 KOps/s $\color{#d91a1a}-1.80\%$
test_vmap_mlp_speed[False-True] 1.0701ms 1.0296ms 971.2909 Ops/s 970.9900 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed[False-False] 0.6437ms 0.5971ms 1.6748 KOps/s 1.6783 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed_decorator[True-True] 3.2392ms 2.5494ms 392.2545 Ops/s 397.5071 Ops/s $\color{#d91a1a}-1.32\%$
test_vmap_mlp_speed_decorator[True-False] 1.1103ms 0.7198ms 1.3892 KOps/s 1.3268 KOps/s $\color{#35bf28}+4.71\%$
test_vmap_mlp_speed_decorator[False-True] 2.5897ms 2.1471ms 465.7398 Ops/s 462.2565 Ops/s $\color{#35bf28}+0.75\%$
test_vmap_mlp_speed_decorator[False-False] 0.9057ms 0.6179ms 1.6183 KOps/s 1.6000 KOps/s $\color{#35bf28}+1.14\%$
test_vmap_transformer_speed[True-True] 13.0929ms 12.5757ms 79.5185 Ops/s 79.9598 Ops/s $\color{#d91a1a}-0.55\%$
test_vmap_transformer_speed[True-False] 8.4284ms 8.2730ms 120.8750 Ops/s 121.1317 Ops/s $\color{#d91a1a}-0.21\%$
test_vmap_transformer_speed[False-True] 12.9421ms 12.4510ms 80.3152 Ops/s 80.6125 Ops/s $\color{#d91a1a}-0.37\%$
test_vmap_transformer_speed[False-False] 8.4369ms 8.2005ms 121.9444 Ops/s 122.4640 Ops/s $\color{#d91a1a}-0.42\%$
test_vmap_transformer_speed_decorator[True-True] 0.1681s 83.0502ms 12.0409 Ops/s 13.2058 Ops/s $\textbf{\color{#d91a1a}-8.82\%}$
test_vmap_transformer_speed_decorator[True-False] 21.5738ms 20.1129ms 49.7194 Ops/s 50.7109 Ops/s $\color{#d91a1a}-1.96\%$
test_vmap_transformer_speed_decorator[False-True] 71.8196ms 69.3777ms 14.4139 Ops/s 13.4165 Ops/s $\textbf{\color{#35bf28}+7.43\%}$
test_vmap_transformer_speed_decorator[False-False] 0.1162s 21.3953ms 46.7391 Ops/s 51.7405 Ops/s $\textbf{\color{#d91a1a}-9.67\%}$

@vmoens vmoens merged commit e50f614 into main Jan 17, 2024
43 of 44 checks passed
@vmoens vmoens deleted the remove-shared-inheritance branch January 17, 2024 17:03
@vmoens
Copy link
Contributor Author

vmoens commented Jan 17, 2024

I merged to fix torchrl's CI but happy to revert / edit if you feel changes are needed

@shagunsodhani
Copy link
Contributor

shagunsodhani commented Jan 24, 2024

For some operations (typically all the ops that don't change the data_ptr() ) I thought it was cool to keep the is_shared() attribute since we can be sure that the content is still shared. That means that coming from a shared / memmap tensordict all these ops would return a shared and locked tensordict:

Makes sense!

So the plan now will be:

  • For shape operations (unbind, view, transpose, permute, squeeze, unsqueeze) you keep the lock and the shared/memmap attribute because we assume that these ops are primarily there for you to present your data in a different format
  • Key-based operations (clone(False), select, exclude, flatten_keys, unflatten_keys) do not propagate the shared and lock attribute

Thoughts?

This seems to be roughly in-line with the idea that if the input and output of the ops share storage (e.g. with unbind or transpose), we propogate the lock and shared/memmap attributes while in other cases, we do not. This is very reasonable and should not "surprise" the user. Some things to look out for:

  1. select is a view-like operator in both PyTorch and TD but in the current proposal, the behavior is different. I think the decision of these ops are primarily there for you to present your data in a different format is a subjective choice while relying on the idea, that we propogate lock and shared/memmap attributes as long as storage remains the same, is more objective and consistent.

  2. Some ops like reshape do not have a consistent behavior (in terms of copying underlying data) in PT and I assume its behavior is the same in TD. So the behavior of propogating locks etc would be a bit in-consistent (when looked at the level of op) but will still be consistent with the idea of tieing locks etc with storage.

@vmoens
Copy link
Contributor Author

vmoens commented Jan 24, 2024

Good points.
For reshape I guess that we could take the stance of saying that it doesn't keep locks and storage attributes because we can't cheaply guarantee that it should. We would write it black on white in the docs. In a way it kind of makes sense that this would only be the case for view (use view if you want to keep them, reshape otherwise. If view breaks, it's because your can't keep your storage anyway)

@shagunsodhani
Copy link
Contributor

Makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants