Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Do not lock nested tensordict in tensordictparams #568

Merged
merged 1 commit into from
Nov 23, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 23, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 23, 2023
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 40.0050μs 15.9244μs 62.7966 KOps/s 63.0367 KOps/s $\color{#d91a1a}-0.38\%$
test_plain_set_stack_nested 0.1675ms 0.1426ms 7.0131 KOps/s 6.9884 KOps/s $\color{#35bf28}+0.35\%$
test_plain_set_nested_inplace 57.7170μs 18.9363μs 52.8087 KOps/s 53.0630 KOps/s $\color{#d91a1a}-0.48\%$
test_plain_set_stack_nested_inplace 0.2448ms 0.1743ms 5.7387 KOps/s 5.7514 KOps/s $\color{#d91a1a}-0.22\%$
test_items 42.5090μs 2.4065μs 415.5338 KOps/s 411.8525 KOps/s $\color{#35bf28}+0.89\%$
test_items_nested 0.3361ms 0.2678ms 3.7347 KOps/s 3.5929 KOps/s $\color{#35bf28}+3.94\%$
test_items_nested_locked 0.9547ms 0.2690ms 3.7175 KOps/s 3.5966 KOps/s $\color{#35bf28}+3.36\%$
test_items_nested_leaf 0.5416ms 0.1629ms 6.1376 KOps/s 5.8212 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_items_stack_nested 1.6929ms 1.4899ms 671.1668 Ops/s 660.9284 Ops/s $\color{#35bf28}+1.55\%$
test_items_stack_nested_leaf 1.9010ms 1.3525ms 739.3613 Ops/s 725.0763 Ops/s $\color{#35bf28}+1.97\%$
test_items_stack_nested_locked 0.8825ms 0.7776ms 1.2860 KOps/s 1.2818 KOps/s $\color{#35bf28}+0.33\%$
test_keys 36.0960μs 3.8143μs 262.1701 KOps/s 256.1569 KOps/s $\color{#35bf28}+2.35\%$
test_keys_nested 1.4358ms 0.1399ms 7.1471 KOps/s 6.6276 KOps/s $\textbf{\color{#35bf28}+7.84\%}$
test_keys_nested_locked 0.2891ms 0.1389ms 7.1995 KOps/s 6.9618 KOps/s $\color{#35bf28}+3.42\%$
test_keys_nested_leaf 0.3254ms 0.1381ms 7.2430 KOps/s 7.0779 KOps/s $\color{#35bf28}+2.33\%$
test_keys_stack_nested 1.5311ms 1.4135ms 707.4647 Ops/s 705.2550 Ops/s $\color{#35bf28}+0.31\%$
test_keys_stack_nested_leaf 1.6560ms 1.4081ms 710.1869 Ops/s 686.0620 Ops/s $\color{#35bf28}+3.52\%$
test_keys_stack_nested_locked 1.1144ms 0.6858ms 1.4580 KOps/s 1.4558 KOps/s $\color{#35bf28}+0.15\%$
test_values 9.0446μs 1.1385μs 878.3692 KOps/s 851.4370 KOps/s $\color{#35bf28}+3.16\%$
test_values_nested 0.1036ms 49.5427μs 20.1846 KOps/s 19.9174 KOps/s $\color{#35bf28}+1.34\%$
test_values_nested_locked 0.1132ms 50.1495μs 19.9404 KOps/s 20.1439 KOps/s $\color{#d91a1a}-1.01\%$
test_values_nested_leaf 55.8430μs 43.9338μs 22.7615 KOps/s 22.3897 KOps/s $\color{#35bf28}+1.66\%$
test_values_stack_nested 1.8787ms 1.2135ms 824.0790 Ops/s 827.1758 Ops/s $\color{#d91a1a}-0.37\%$
test_values_stack_nested_leaf 1.4094ms 1.1842ms 844.4219 Ops/s 837.7800 Ops/s $\color{#35bf28}+0.79\%$
test_values_stack_nested_locked 0.6680ms 0.5151ms 1.9416 KOps/s 1.9333 KOps/s $\color{#35bf28}+0.43\%$
test_membership 37.0690μs 1.3303μs 751.7115 KOps/s 711.4569 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_membership_nested 34.6950μs 2.7473μs 363.9922 KOps/s 354.2563 KOps/s $\color{#35bf28}+2.75\%$
test_membership_nested_leaf 37.3290μs 2.8000μs 357.1387 KOps/s 341.5235 KOps/s $\color{#35bf28}+4.57\%$
test_membership_stacked_nested 36.1370μs 11.6931μs 85.5204 KOps/s 84.4675 KOps/s $\color{#35bf28}+1.25\%$
test_membership_stacked_nested_leaf 36.4880μs 11.6907μs 85.5381 KOps/s 84.1954 KOps/s $\color{#35bf28}+1.59\%$
test_membership_nested_last 49.6120μs 5.8323μs 171.4575 KOps/s 168.2973 KOps/s $\color{#35bf28}+1.88\%$
test_membership_nested_leaf_last 25.8080μs 5.8317μs 171.4779 KOps/s 168.4408 KOps/s $\color{#35bf28}+1.80\%$
test_membership_stacked_nested_last 0.2883ms 0.1686ms 5.9312 KOps/s 5.7457 KOps/s $\color{#35bf28}+3.23\%$
test_membership_stacked_nested_leaf_last 61.8760μs 13.7211μs 72.8804 KOps/s 70.6713 KOps/s $\color{#35bf28}+3.13\%$
test_nested_getleaf 47.9590μs 10.6399μs 93.9855 KOps/s 92.8837 KOps/s $\color{#35bf28}+1.19\%$
test_nested_get 41.4270μs 10.0195μs 99.8049 KOps/s 95.8957 KOps/s $\color{#35bf28}+4.08\%$
test_stacked_getleaf 0.7293ms 0.6388ms 1.5654 KOps/s 1.5547 KOps/s $\color{#35bf28}+0.69\%$
test_stacked_get 0.7185ms 0.6106ms 1.6378 KOps/s 1.6257 KOps/s $\color{#35bf28}+0.75\%$
test_nested_getitemleaf 40.6460μs 10.7117μs 93.3555 KOps/s 90.7342 KOps/s $\color{#35bf28}+2.89\%$
test_nested_getitem 34.4340μs 10.1307μs 98.7101 KOps/s 96.3448 KOps/s $\color{#35bf28}+2.46\%$
test_stacked_getitemleaf 0.7462ms 0.6471ms 1.5453 KOps/s 1.5430 KOps/s $\color{#35bf28}+0.15\%$
test_stacked_getitem 1.0200ms 0.6160ms 1.6234 KOps/s 1.6121 KOps/s $\color{#35bf28}+0.70\%$
test_lock_nested 56.3297ms 0.5451ms 1.8344 KOps/s 1.9918 KOps/s $\textbf{\color{#d91a1a}-7.90\%}$
test_lock_stack_nested 83.7546ms 8.4687ms 118.0818 Ops/s 115.1687 Ops/s $\color{#35bf28}+2.53\%$
test_unlock_nested 63.6959ms 0.5117ms 1.9543 KOps/s 1.9480 KOps/s $\color{#35bf28}+0.33\%$
test_unlock_stack_nested 88.4758ms 8.2969ms 120.5274 Ops/s 202.5210 Ops/s $\textbf{\color{#d91a1a}-40.49\%}$
test_flatten_speed 0.5503ms 0.2616ms 3.8230 KOps/s 3.6713 KOps/s $\color{#35bf28}+4.13\%$
test_unflatten_speed 0.5365ms 0.4534ms 2.2055 KOps/s 2.1243 KOps/s $\color{#35bf28}+3.82\%$
test_common_ops 1.4429ms 0.6926ms 1.4438 KOps/s 1.4569 KOps/s $\color{#d91a1a}-0.90\%$
test_creation 26.3790μs 2.3756μs 420.9486 KOps/s 414.7504 KOps/s $\color{#35bf28}+1.49\%$
test_creation_empty 31.6890μs 8.1049μs 123.3818 KOps/s 121.9541 KOps/s $\color{#35bf28}+1.17\%$
test_creation_nested_1 42.2990μs 11.4247μs 87.5297 KOps/s 86.4083 KOps/s $\color{#35bf28}+1.30\%$
test_creation_nested_2 72.6350μs 14.8311μs 67.4260 KOps/s 66.3569 KOps/s $\color{#35bf28}+1.61\%$
test_clone 0.1332ms 13.5511μs 73.7947 KOps/s 73.3979 KOps/s $\color{#35bf28}+0.54\%$
test_getitem[int] 35.0960μs 12.9217μs 77.3892 KOps/s 75.1027 KOps/s $\color{#35bf28}+3.04\%$
test_getitem[slice_int] 58.5280μs 25.1771μs 39.7187 KOps/s 39.8893 KOps/s $\color{#d91a1a}-0.43\%$
test_getitem[range] 0.1127ms 45.4808μs 21.9873 KOps/s 22.7968 KOps/s $\color{#d91a1a}-3.55\%$
test_getitem[tuple] 67.5950μs 20.6819μs 48.3515 KOps/s 49.0743 KOps/s $\color{#d91a1a}-1.47\%$
test_getitem[list] 0.2614ms 40.1045μs 24.9349 KOps/s 24.7836 KOps/s $\color{#35bf28}+0.61\%$
test_setitem_dim[int] 46.2360μs 28.6347μs 34.9227 KOps/s 34.5140 KOps/s $\color{#35bf28}+1.18\%$
test_setitem_dim[slice_int] 83.9250μs 54.4544μs 18.3640 KOps/s 18.9325 KOps/s $\color{#d91a1a}-3.00\%$
test_setitem_dim[range] 0.1140ms 72.8399μs 13.7287 KOps/s 13.8222 KOps/s $\color{#d91a1a}-0.68\%$
test_setitem_dim[tuple] 64.1190μs 42.4734μs 23.5442 KOps/s 23.6975 KOps/s $\color{#d91a1a}-0.65\%$
test_setitem 0.1127ms 18.5739μs 53.8389 KOps/s 53.7651 KOps/s $\color{#35bf28}+0.14\%$
test_set 0.1176ms 17.9732μs 55.6385 KOps/s 56.4821 KOps/s $\color{#d91a1a}-1.49\%$
test_set_shared 1.9705ms 0.1406ms 7.1103 KOps/s 7.0802 KOps/s $\color{#35bf28}+0.42\%$
test_update 0.1291ms 23.4911μs 42.5693 KOps/s 43.4803 KOps/s $\color{#d91a1a}-2.10\%$
test_update_nested 0.1873ms 34.6919μs 28.8252 KOps/s 29.6144 KOps/s $\color{#d91a1a}-2.67\%$
test_set_nested 0.1459ms 19.7963μs 50.5145 KOps/s 51.5642 KOps/s $\color{#d91a1a}-2.04\%$
test_set_nested_new 0.1935ms 25.3603μs 39.4318 KOps/s 40.7924 KOps/s $\color{#d91a1a}-3.34\%$
test_select 0.2308ms 50.6555μs 19.7412 KOps/s 20.1528 KOps/s $\color{#d91a1a}-2.04\%$
test_unbind_speed 0.4449ms 0.3745ms 2.6702 KOps/s 2.6548 KOps/s $\color{#35bf28}+0.58\%$
test_unbind_speed_stack0 63.2437ms 5.5047ms 181.6641 Ops/s 174.3435 Ops/s $\color{#35bf28}+4.20\%$
test_unbind_speed_stack1 2.7731μs 0.6273μs 1.5942 MOps/s 1.5762 MOps/s $\color{#35bf28}+1.14\%$
test_split 1.7431ms 1.6515ms 605.4988 Ops/s 597.2233 Ops/s $\color{#35bf28}+1.39\%$
test_chunk 59.6495ms 1.7626ms 567.3506 Ops/s 563.2275 Ops/s $\color{#35bf28}+0.73\%$
test_creation[device0] 0.6622ms 0.3025ms 3.3061 KOps/s 3.3128 KOps/s $\color{#d91a1a}-0.20\%$
test_creation_from_tensor 3.3686ms 0.3315ms 3.0164 KOps/s 3.0268 KOps/s $\color{#d91a1a}-0.34\%$
test_add_one[memmap_tensor0] 0.2819ms 25.7776μs 38.7933 KOps/s 39.0954 KOps/s $\color{#d91a1a}-0.77\%$
test_contiguous[memmap_tensor0] 36.8680μs 5.6179μs 178.0039 KOps/s 178.1108 KOps/s $\color{#d91a1a}-0.06\%$
test_stack[memmap_tensor0] 85.7590μs 19.5417μs 51.1727 KOps/s 52.4546 KOps/s $\color{#d91a1a}-2.44\%$
test_memmaptd_index 0.2975ms 0.1943ms 5.1455 KOps/s 5.2409 KOps/s $\color{#d91a1a}-1.82\%$
test_memmaptd_index_astensor 0.4047ms 0.2490ms 4.0163 KOps/s 4.0166 KOps/s $-0.01\%$
test_memmaptd_index_op 0.6573ms 0.5017ms 1.9934 KOps/s 2.0359 KOps/s $\color{#d91a1a}-2.09\%$
test_reshape_pytree 53.9100μs 23.4968μs 42.5589 KOps/s 42.8455 KOps/s $\color{#d91a1a}-0.67\%$
test_reshape_td 63.9690μs 31.8919μs 31.3559 KOps/s 30.7138 KOps/s $\color{#35bf28}+2.09\%$
test_view_pytree 72.2840μs 23.4434μs 42.6560 KOps/s 42.0797 KOps/s $\color{#35bf28}+1.37\%$
test_view_td 18.6950μs 4.9342μs 202.6675 KOps/s 197.6579 KOps/s $\color{#35bf28}+2.53\%$
test_unbind_pytree 78.1650μs 26.6176μs 37.5691 KOps/s 36.5925 KOps/s $\color{#35bf28}+2.67\%$
test_unbind_td 0.1140ms 60.0184μs 16.6616 KOps/s 16.7120 KOps/s $\color{#d91a1a}-0.30\%$
test_split_pytree 68.7680μs 26.2155μs 38.1454 KOps/s 36.6119 KOps/s $\color{#35bf28}+4.19\%$
test_split_td 0.1092ms 46.5178μs 21.4972 KOps/s 21.4644 KOps/s $\color{#35bf28}+0.15\%$
test_add_pytree 89.1860μs 32.8027μs 30.4853 KOps/s 31.0775 KOps/s $\color{#d91a1a}-1.91\%$
test_add_td 0.1151ms 44.4118μs 22.5166 KOps/s 22.5512 KOps/s $\color{#d91a1a}-0.15\%$
test_distributed 19.2950μs 5.8823μs 170.0022 KOps/s 166.5514 KOps/s $\color{#35bf28}+2.07\%$
test_tdmodule 0.1988ms 21.1243μs 47.3389 KOps/s 46.3259 KOps/s $\color{#35bf28}+2.19\%$
test_tdmodule_dispatch 0.2253ms 41.5626μs 24.0601 KOps/s 25.3080 KOps/s $\color{#d91a1a}-4.93\%$
test_tdseq 0.1161ms 23.8544μs 41.9210 KOps/s 41.3045 KOps/s $\color{#35bf28}+1.49\%$
test_tdseq_dispatch 0.1377ms 41.7801μs 23.9349 KOps/s 23.3197 KOps/s $\color{#35bf28}+2.64\%$
test_instantiation_functorch 1.7740ms 1.3063ms 765.5014 Ops/s 765.7777 Ops/s $\color{#d91a1a}-0.04\%$
test_instantiation_td 1.5271ms 1.0182ms 982.1584 Ops/s 979.8093 Ops/s $\color{#35bf28}+0.24\%$
test_exec_functorch 0.2403ms 0.1608ms 6.2171 KOps/s 6.2232 KOps/s $\color{#d91a1a}-0.10\%$
test_exec_functional_call 0.2334ms 0.1513ms 6.6110 KOps/s 6.7004 KOps/s $\color{#d91a1a}-1.34\%$
test_exec_td 0.2302ms 0.1471ms 6.7970 KOps/s 6.8942 KOps/s $\color{#d91a1a}-1.41\%$
test_exec_td_decorator 0.8663ms 0.2191ms 4.5638 KOps/s 4.5223 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed[True-True] 1.5245ms 0.8800ms 1.1364 KOps/s 1.1041 KOps/s $\color{#35bf28}+2.92\%$
test_vmap_mlp_speed[True-False] 0.5703ms 0.4615ms 2.1666 KOps/s 2.1098 KOps/s $\color{#35bf28}+2.69\%$
test_vmap_mlp_speed[False-True] 1.1317ms 0.7667ms 1.3043 KOps/s 1.2835 KOps/s $\color{#35bf28}+1.62\%$
test_vmap_mlp_speed[False-False] 0.7094ms 0.3864ms 2.5880 KOps/s 2.5511 KOps/s $\color{#35bf28}+1.44\%$
test_vmap_mlp_speed_decorator[True-True] 2.2968ms 1.5406ms 649.1187 Ops/s 626.2486 Ops/s $\color{#35bf28}+3.65\%$
test_vmap_mlp_speed_decorator[True-False] 1.0499ms 0.5438ms 1.8387 KOps/s 1.7926 KOps/s $\color{#35bf28}+2.57\%$
test_vmap_mlp_speed_decorator[False-True] 2.6379ms 1.3342ms 749.4961 Ops/s 739.8525 Ops/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed_decorator[False-False] 0.8824ms 0.4245ms 2.3555 KOps/s 2.3142 KOps/s $\color{#35bf28}+1.78\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5435ms 12.6473μs 79.0685 KOps/s 80.1944 KOps/s $\color{#d91a1a}-1.40\%$
test_plain_set_stack_nested 0.1441ms 0.1144ms 8.7441 KOps/s 8.3695 KOps/s $\color{#35bf28}+4.48\%$
test_plain_set_nested_inplace 36.0210μs 15.1812μs 65.8710 KOps/s 67.2036 KOps/s $\color{#d91a1a}-1.98\%$
test_plain_set_stack_nested_inplace 0.1828ms 0.1404ms 7.1200 KOps/s 7.1320 KOps/s $\color{#d91a1a}-0.17\%$
test_items 27.1700μs 4.6925μs 213.1067 KOps/s 212.8910 KOps/s $\color{#35bf28}+0.10\%$
test_items_nested 0.3943ms 0.3395ms 2.9455 KOps/s 2.9644 KOps/s $\color{#d91a1a}-0.64\%$
test_items_nested_locked 0.4030ms 0.3392ms 2.9483 KOps/s 2.9582 KOps/s $\color{#d91a1a}-0.34\%$
test_items_nested_leaf 0.2323ms 0.1995ms 5.0127 KOps/s 4.9752 KOps/s $\color{#35bf28}+0.75\%$
test_items_stack_nested 1.5434ms 1.5008ms 666.2957 Ops/s 671.2362 Ops/s $\color{#d91a1a}-0.74\%$
test_items_stack_nested_leaf 1.4077ms 1.3310ms 751.3155 Ops/s 753.2235 Ops/s $\color{#d91a1a}-0.25\%$
test_items_stack_nested_locked 0.8554ms 0.8122ms 1.2312 KOps/s 1.2457 KOps/s $\color{#d91a1a}-1.16\%$
test_keys 25.3910μs 4.5978μs 217.4941 KOps/s 217.8915 KOps/s $\color{#d91a1a}-0.18\%$
test_keys_nested 0.5550ms 90.1951μs 11.0871 KOps/s 11.0543 KOps/s $\color{#35bf28}+0.30\%$
test_keys_nested_locked 0.1128ms 89.5587μs 11.1659 KOps/s 11.1573 KOps/s $\color{#35bf28}+0.08\%$
test_keys_nested_leaf 41.9452ms 85.9749μs 11.6313 KOps/s 12.1578 KOps/s $\color{#d91a1a}-4.33\%$
test_keys_stack_nested 1.3995ms 1.3185ms 758.4358 Ops/s 765.0570 Ops/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested_leaf 1.3788ms 1.3076ms 764.7498 Ops/s 777.8798 Ops/s $\color{#d91a1a}-1.69\%$
test_keys_stack_nested_locked 0.6508ms 0.6113ms 1.6357 KOps/s 1.6455 KOps/s $\color{#d91a1a}-0.60\%$
test_values 10.3603μs 1.8826μs 531.1672 KOps/s 526.5318 KOps/s $\color{#35bf28}+0.88\%$
test_values_nested 67.7010μs 42.9041μs 23.3078 KOps/s 23.0561 KOps/s $\color{#35bf28}+1.09\%$
test_values_nested_locked 71.2210μs 43.0800μs 23.2126 KOps/s 22.9734 KOps/s $\color{#35bf28}+1.04\%$
test_values_nested_leaf 58.2110μs 37.0239μs 27.0096 KOps/s 26.5015 KOps/s $\color{#35bf28}+1.92\%$
test_values_stack_nested 1.2041ms 1.1486ms 870.6021 Ops/s 887.6482 Ops/s $\color{#d91a1a}-1.92\%$
test_values_stack_nested_leaf 1.1943ms 1.1398ms 877.3747 Ops/s 894.7329 Ops/s $\color{#d91a1a}-1.94\%$
test_values_stack_nested_locked 0.5321ms 0.4864ms 2.0557 KOps/s 2.0944 KOps/s $\color{#d91a1a}-1.85\%$
test_membership 3.8380μs 0.9377μs 1.0664 MOps/s 1.0558 MOps/s $\color{#35bf28}+1.00\%$
test_membership_nested 12.6600μs 2.1242μs 470.7621 KOps/s 473.5142 KOps/s $\color{#d91a1a}-0.58\%$
test_membership_nested_leaf 16.4255μs 2.1201μs 471.6676 KOps/s 474.1116 KOps/s $\color{#d91a1a}-0.52\%$
test_membership_stacked_nested 45.4510μs 10.8854μs 91.8663 KOps/s 91.7100 KOps/s $\color{#35bf28}+0.17\%$
test_membership_stacked_nested_leaf 32.0000μs 11.0217μs 90.7301 KOps/s 92.5343 KOps/s $\color{#d91a1a}-1.95\%$
test_membership_nested_last 21.8200μs 4.6507μs 215.0234 KOps/s 216.8344 KOps/s $\color{#d91a1a}-0.84\%$
test_membership_nested_leaf_last 18.5910μs 4.6521μs 214.9551 KOps/s 216.8677 KOps/s $\color{#d91a1a}-0.88\%$
test_membership_stacked_nested_last 0.1667ms 0.1354ms 7.3860 KOps/s 7.4627 KOps/s $\color{#d91a1a}-1.03\%$
test_membership_stacked_nested_leaf_last 39.5300μs 12.8673μs 77.7165 KOps/s 77.5081 KOps/s $\color{#35bf28}+0.27\%$
test_nested_getleaf 23.0300μs 8.4114μs 118.8857 KOps/s 118.9537 KOps/s $\color{#d91a1a}-0.06\%$
test_nested_get 30.6200μs 7.9208μs 126.2496 KOps/s 126.4914 KOps/s $\color{#d91a1a}-0.19\%$
test_stacked_getleaf 0.6452ms 0.5753ms 1.7381 KOps/s 1.7097 KOps/s $\color{#35bf28}+1.66\%$
test_stacked_get 0.6099ms 0.5433ms 1.8406 KOps/s 1.8007 KOps/s $\color{#35bf28}+2.21\%$
test_nested_getitemleaf 29.3510μs 8.4250μs 118.6941 KOps/s 118.3509 KOps/s $\color{#35bf28}+0.29\%$
test_nested_getitem 35.7710μs 7.9802μs 125.3108 KOps/s 125.0035 KOps/s $\color{#35bf28}+0.25\%$
test_stacked_getitemleaf 0.6511ms 0.5826ms 1.7164 KOps/s 1.7066 KOps/s $\color{#35bf28}+0.58\%$
test_stacked_getitem 0.6329ms 0.5473ms 1.8272 KOps/s 1.8182 KOps/s $\color{#35bf28}+0.49\%$
test_lock_nested 4.1121ms 0.4630ms 2.1597 KOps/s 2.1933 KOps/s $\color{#d91a1a}-1.53\%$
test_lock_stack_nested 67.6040ms 6.5954ms 151.6203 Ops/s 151.9242 Ops/s $\color{#d91a1a}-0.20\%$
test_unlock_nested 1.2828ms 0.4367ms 2.2900 KOps/s 2.0313 KOps/s $\textbf{\color{#35bf28}+12.74\%}$
test_unlock_stack_nested 63.5270ms 7.3099ms 136.8013 Ops/s 138.3265 Ops/s $\color{#d91a1a}-1.10\%$
test_flatten_speed 0.5329ms 0.1863ms 5.3686 KOps/s 5.3807 KOps/s $\color{#d91a1a}-0.22\%$
test_unflatten_speed 0.4184ms 0.3680ms 2.7178 KOps/s 2.7929 KOps/s $\color{#d91a1a}-2.69\%$
test_common_ops 1.0227ms 0.6193ms 1.6147 KOps/s 1.6894 KOps/s $\color{#d91a1a}-4.42\%$
test_creation 17.2500μs 1.9951μs 501.2387 KOps/s 525.9114 KOps/s $\color{#d91a1a}-4.69\%$
test_creation_empty 39.4300μs 7.0719μs 141.4053 KOps/s 151.8950 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_creation_nested_1 26.8310μs 9.4980μs 105.2858 KOps/s 112.3336 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_creation_nested_2 29.9000μs 12.0878μs 82.7284 KOps/s 87.0580 KOps/s $\color{#d91a1a}-4.97\%$
test_clone 0.1061ms 14.1991μs 70.4271 KOps/s 71.9844 KOps/s $\color{#d91a1a}-2.16\%$
test_getitem[int] 27.5000μs 12.0112μs 83.2554 KOps/s 83.9205 KOps/s $\color{#d91a1a}-0.79\%$
test_getitem[slice_int] 61.5410μs 23.9313μs 41.7863 KOps/s 43.5250 KOps/s $\color{#d91a1a}-3.99\%$
test_getitem[range] 68.9310μs 43.4432μs 23.0186 KOps/s 25.9756 KOps/s $\textbf{\color{#d91a1a}-11.38\%}$
test_getitem[tuple] 58.7210μs 20.5662μs 48.6235 KOps/s 50.5652 KOps/s $\color{#d91a1a}-3.84\%$
test_getitem[list] 0.3303ms 37.2509μs 26.8450 KOps/s 27.8329 KOps/s $\color{#d91a1a}-3.55\%$
test_setitem_dim[int] 58.5810μs 25.6903μs 38.9252 KOps/s 40.1553 KOps/s $\color{#d91a1a}-3.06\%$
test_setitem_dim[slice_int] 66.3920μs 46.0446μs 21.7181 KOps/s 22.6077 KOps/s $\color{#d91a1a}-3.93\%$
test_setitem_dim[range] 86.1420μs 67.0533μs 14.9135 KOps/s 16.6790 KOps/s $\textbf{\color{#d91a1a}-10.58\%}$
test_setitem_dim[tuple] 59.2310μs 40.5847μs 24.6398 KOps/s 26.5612 KOps/s $\textbf{\color{#d91a1a}-7.23\%}$
test_setitem 0.1261ms 18.3261μs 54.5671 KOps/s 56.5125 KOps/s $\color{#d91a1a}-3.44\%$
test_set 0.1186ms 17.9309μs 55.7697 KOps/s 59.1582 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_set_shared 1.0911ms 0.1017ms 9.8292 KOps/s 10.0849 KOps/s $\color{#d91a1a}-2.53\%$
test_update 0.4798ms 21.9252μs 45.6095 KOps/s 48.0990 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_update_nested 0.1171ms 32.5747μs 30.6986 KOps/s 32.3213 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_set_nested 0.1243ms 19.2394μs 51.9768 KOps/s 53.5875 KOps/s $\color{#d91a1a}-3.01\%$
test_set_nested_new 0.1105ms 23.5151μs 42.5258 KOps/s 43.9405 KOps/s $\color{#d91a1a}-3.22\%$
test_select 72.0510μs 46.4108μs 21.5467 KOps/s 22.0189 KOps/s $\color{#d91a1a}-2.14\%$
test_to 74.1520μs 52.2795μs 19.1280 KOps/s 19.0827 KOps/s $\color{#35bf28}+0.24\%$
test_to_nonblocking 67.5210μs 35.0337μs 28.5439 KOps/s 29.9832 KOps/s $\color{#d91a1a}-4.80\%$
test_unbind_speed 0.4037ms 0.3556ms 2.8121 KOps/s 2.8900 KOps/s $\color{#d91a1a}-2.70\%$
test_unbind_speed_stack0 60.4135ms 5.1907ms 192.6506 Ops/s 196.6943 Ops/s $\color{#d91a1a}-2.06\%$
test_unbind_speed_stack1 1.9340μs 0.5277μs 1.8950 MOps/s 1.9102 MOps/s $\color{#d91a1a}-0.80\%$
test_split 53.2579ms 1.7950ms 557.1136 Ops/s 559.9540 Ops/s $\color{#d91a1a}-0.51\%$
test_chunk 52.8635ms 1.7805ms 561.6350 Ops/s 565.6277 Ops/s $\color{#d91a1a}-0.71\%$
test_creation[device0] 0.4364ms 0.3128ms 3.1965 KOps/s 3.1980 KOps/s $\color{#d91a1a}-0.05\%$
test_creation[device1] 0.8182ms 0.3143ms 3.1812 KOps/s 3.1708 KOps/s $\color{#35bf28}+0.33\%$
test_creation_from_tensor 0.5793ms 0.3415ms 2.9282 KOps/s 2.6610 KOps/s $\textbf{\color{#35bf28}+10.04\%}$
test_add_one[memmap_tensor0] 67.0400μs 24.1241μs 41.4523 KOps/s 40.3248 KOps/s $\color{#35bf28}+2.80\%$
test_add_one[memmap_tensor1] 0.2071ms 73.9416μs 13.5242 KOps/s 13.4104 KOps/s $\color{#35bf28}+0.85\%$
test_contiguous[memmap_tensor0] 31.0210μs 5.9776μs 167.2907 KOps/s 162.7469 KOps/s $\color{#35bf28}+2.79\%$
test_contiguous[memmap_tensor1] 42.9010μs 21.9594μs 45.5385 KOps/s 44.7616 KOps/s $\color{#35bf28}+1.74\%$
test_stack[memmap_tensor0] 50.3210μs 20.0533μs 49.8671 KOps/s 50.2700 KOps/s $\color{#d91a1a}-0.80\%$
test_stack[memmap_tensor1] 0.1613ms 73.3449μs 13.6342 KOps/s 13.8482 KOps/s $\color{#d91a1a}-1.54\%$
test_memmaptd_index 0.2798ms 0.2234ms 4.4764 KOps/s 4.5072 KOps/s $\color{#d91a1a}-0.68\%$
test_memmaptd_index_astensor 0.3545ms 0.2752ms 3.6332 KOps/s 3.6163 KOps/s $\color{#35bf28}+0.47\%$
test_memmaptd_index_op 0.5863ms 0.5336ms 1.8741 KOps/s 1.9124 KOps/s $\color{#d91a1a}-2.00\%$
test_reshape_pytree 43.0710μs 21.0961μs 47.4020 KOps/s 47.8790 KOps/s $\color{#d91a1a}-1.00\%$
test_reshape_td 59.3510μs 30.0372μs 33.2920 KOps/s 33.4544 KOps/s $\color{#d91a1a}-0.49\%$
test_view_pytree 39.5810μs 20.6507μs 48.4245 KOps/s 48.3982 KOps/s $\color{#35bf28}+0.05\%$
test_view_td 17.4410μs 4.1112μs 243.2393 KOps/s 243.4925 KOps/s $\color{#d91a1a}-0.10\%$
test_unbind_pytree 50.1500μs 25.9299μs 38.5655 KOps/s 38.6605 KOps/s $\color{#d91a1a}-0.25\%$
test_unbind_td 82.2310μs 55.6533μs 17.9684 KOps/s 18.3060 KOps/s $\color{#d91a1a}-1.84\%$
test_split_pytree 45.3000μs 23.7894μs 42.0355 KOps/s 42.2336 KOps/s $\color{#d91a1a}-0.47\%$
test_split_td 66.8110μs 43.8409μs 22.8098 KOps/s 22.9275 KOps/s $\color{#d91a1a}-0.51\%$
test_add_pytree 63.0910μs 30.8642μs 32.4000 KOps/s 32.5997 KOps/s $\color{#d91a1a}-0.61\%$
test_add_td 68.9310μs 43.6345μs 22.9177 KOps/s 24.0199 KOps/s $\color{#d91a1a}-4.59\%$
test_distributed 24.6200μs 5.6841μs 175.9297 KOps/s 176.8317 KOps/s $\color{#d91a1a}-0.51\%$
test_tdmodule 46.5010μs 17.2380μs 58.0115 KOps/s 60.1468 KOps/s $\color{#d91a1a}-3.55\%$
test_tdmodule_dispatch 0.1890ms 33.4550μs 29.8909 KOps/s 31.2187 KOps/s $\color{#d91a1a}-4.25\%$
test_tdseq 43.0020μs 20.6211μs 48.4941 KOps/s 50.1566 KOps/s $\color{#d91a1a}-3.31\%$
test_tdseq_dispatch 52.7710μs 36.6020μs 27.3209 KOps/s 28.0842 KOps/s $\color{#d91a1a}-2.72\%$
test_instantiation_functorch 1.7550ms 1.6992ms 588.5075 Ops/s 593.8352 Ops/s $\color{#d91a1a}-0.90\%$
test_instantiation_td 1.8408ms 1.1818ms 846.2017 Ops/s 842.2222 Ops/s $\color{#35bf28}+0.47\%$
test_exec_functorch 0.1999ms 0.1572ms 6.3624 KOps/s 6.3367 KOps/s $\color{#35bf28}+0.41\%$
test_exec_functional_call 0.2147ms 0.1531ms 6.5314 KOps/s 6.4820 KOps/s $\color{#35bf28}+0.76\%$
test_exec_td 0.1733ms 0.1436ms 6.9638 KOps/s 6.9308 KOps/s $\color{#35bf28}+0.48\%$
test_exec_td_decorator 0.8494ms 0.2167ms 4.6152 KOps/s 4.6437 KOps/s $\color{#d91a1a}-0.61\%$
test_vmap_mlp_speed[True-True] 1.1377ms 1.0717ms 933.0822 Ops/s 940.8227 Ops/s $\color{#d91a1a}-0.82\%$
test_vmap_mlp_speed[True-False] 0.6858ms 0.6006ms 1.6649 KOps/s 1.6637 KOps/s $\color{#35bf28}+0.07\%$
test_vmap_mlp_speed[False-True] 1.0355ms 0.9796ms 1.0208 KOps/s 1.0304 KOps/s $\color{#d91a1a}-0.93\%$
test_vmap_mlp_speed[False-False] 0.6074ms 0.5305ms 1.8849 KOps/s 1.8900 KOps/s $\color{#d91a1a}-0.27\%$
test_vmap_mlp_speed_decorator[True-True] 2.6612ms 1.7805ms 561.6289 Ops/s 568.6086 Ops/s $\color{#d91a1a}-1.23\%$
test_vmap_mlp_speed_decorator[True-False] 1.0841ms 0.6856ms 1.4585 KOps/s 1.4949 KOps/s $\color{#d91a1a}-2.43\%$
test_vmap_mlp_speed_decorator[False-True] 2.0637ms 1.6050ms 623.0619 Ops/s 630.8822 Ops/s $\color{#d91a1a}-1.24\%$
test_vmap_mlp_speed_decorator[False-False] 0.9960ms 0.5693ms 1.7566 KOps/s 1.7418 KOps/s $\color{#35bf28}+0.85\%$
test_vmap_transformer_speed[True-True] 12.4932ms 12.3742ms 80.8136 Ops/s 81.2624 Ops/s $\color{#d91a1a}-0.55\%$
test_vmap_transformer_speed[True-False] 8.0526ms 7.9900ms 125.1565 Ops/s 125.0119 Ops/s $\color{#35bf28}+0.12\%$
test_vmap_transformer_speed[False-True] 12.4541ms 12.2659ms 81.5267 Ops/s 81.7358 Ops/s $\color{#d91a1a}-0.26\%$
test_vmap_transformer_speed[False-False] 8.0126ms 7.9213ms 126.2419 Ops/s 126.0078 Ops/s $\color{#35bf28}+0.19\%$
test_vmap_transformer_speed_decorator[True-True] 43.5257ms 42.3407ms 23.6179 Ops/s 23.7754 Ops/s $\color{#d91a1a}-0.66\%$
test_vmap_transformer_speed_decorator[True-False] 97.2996ms 21.2163ms 47.1336 Ops/s 47.1645 Ops/s $\color{#d91a1a}-0.07\%$
test_vmap_transformer_speed_decorator[False-True] 44.3420ms 41.9804ms 23.8206 Ops/s 24.0104 Ops/s $\color{#d91a1a}-0.79\%$
test_vmap_transformer_speed_decorator[False-False] 98.4400ms 20.7793ms 48.1248 Ops/s 48.0242 Ops/s $\color{#35bf28}+0.21\%$

@vmoens vmoens marked this pull request as ready for review November 23, 2023 10:08
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Nov 23, 2023
@vmoens vmoens merged commit dc4eb6b into main Nov 23, 2023
45 checks passed
@vmoens vmoens deleted the unlock_params branch November 23, 2023 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants