Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Weakref for unlocking tds #595

Merged
merged 7 commits into from
Dec 11, 2023
Merged

[Feature] Weakref for unlocking tds #595

merged 7 commits into from
Dec 11, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 8, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 8, 2023
Copy link

github-actions bot commented Dec 9, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}31$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 44.6330μs 15.6949μs 63.7148 KOps/s 62.7375 KOps/s $\color{#35bf28}+1.56\%$
test_plain_set_stack_nested 0.2640ms 0.1421ms 7.0374 KOps/s 6.9127 KOps/s $\color{#35bf28}+1.80\%$
test_plain_set_nested_inplace 52.4380μs 17.9004μs 55.8648 KOps/s 55.1978 KOps/s $\color{#35bf28}+1.21\%$
test_plain_set_stack_nested_inplace 0.3489ms 0.1778ms 5.6244 KOps/s 5.6387 KOps/s $\color{#d91a1a}-0.25\%$
test_items 33.5130μs 2.5984μs 384.8462 KOps/s 414.5071 KOps/s $\textbf{\color{#d91a1a}-7.16\%}$
test_items_nested 0.4807ms 0.2767ms 3.6137 KOps/s 3.7050 KOps/s $\color{#d91a1a}-2.46\%$
test_items_nested_locked 0.3487ms 0.2790ms 3.5841 KOps/s 3.7388 KOps/s $\color{#d91a1a}-4.14\%$
test_items_nested_leaf 0.5597ms 0.1691ms 5.9123 KOps/s 6.1599 KOps/s $\color{#d91a1a}-4.02\%$
test_items_stack_nested 1.9235ms 1.4859ms 672.9881 Ops/s 669.1528 Ops/s $\color{#35bf28}+0.57\%$
test_items_stack_nested_leaf 2.0437ms 1.3446ms 743.7139 Ops/s 734.3787 Ops/s $\color{#35bf28}+1.27\%$
test_items_stack_nested_locked 1.8011ms 0.7736ms 1.2926 KOps/s 1.3074 KOps/s $\color{#d91a1a}-1.13\%$
test_keys 14.4370μs 3.8559μs 259.3413 KOps/s 256.8981 KOps/s $\color{#35bf28}+0.95\%$
test_keys_nested 0.5172ms 0.1409ms 7.0987 KOps/s 6.6860 KOps/s $\textbf{\color{#35bf28}+6.17\%}$
test_keys_nested_locked 0.1927ms 0.1397ms 7.1573 KOps/s 7.1563 KOps/s $\color{#35bf28}+0.01\%$
test_keys_nested_leaf 0.2778ms 0.1402ms 7.1332 KOps/s 7.0850 KOps/s $\color{#35bf28}+0.68\%$
test_keys_stack_nested 2.1601ms 1.4214ms 703.5276 Ops/s 695.9553 Ops/s $\color{#35bf28}+1.09\%$
test_keys_stack_nested_leaf 2.1588ms 1.4188ms 704.8384 Ops/s 702.7179 Ops/s $\color{#35bf28}+0.30\%$
test_keys_stack_nested_locked 1.0866ms 0.6842ms 1.4615 KOps/s 1.4680 KOps/s $\color{#d91a1a}-0.44\%$
test_values 10.2108μs 1.1519μs 868.0985 KOps/s 867.5810 KOps/s $\color{#35bf28}+0.06\%$
test_values_nested 97.3110μs 49.4640μs 20.2167 KOps/s 20.1619 KOps/s $\color{#35bf28}+0.27\%$
test_values_nested_locked 0.1373ms 51.6213μs 19.3719 KOps/s 20.2466 KOps/s $\color{#d91a1a}-4.32\%$
test_values_nested_leaf 95.5280μs 44.0281μs 22.7128 KOps/s 22.3491 KOps/s $\color{#35bf28}+1.63\%$
test_values_stack_nested 2.4841ms 1.2180ms 820.9870 Ops/s 832.0677 Ops/s $\color{#d91a1a}-1.33\%$
test_values_stack_nested_leaf 2.4862ms 1.2010ms 832.6388 Ops/s 839.1190 Ops/s $\color{#d91a1a}-0.77\%$
test_values_stack_nested_locked 1.0461ms 0.5228ms 1.9128 KOps/s 1.9605 KOps/s $\color{#d91a1a}-2.43\%$
test_membership 22.3620μs 1.3626μs 733.9076 KOps/s 737.8256 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_nested 26.6800μs 2.8507μs 350.7963 KOps/s 351.0434 KOps/s $\color{#d91a1a}-0.07\%$
test_membership_nested_leaf 39.3940μs 2.8595μs 349.7079 KOps/s 352.8535 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_stacked_nested 52.1170μs 11.7096μs 85.4001 KOps/s 85.0173 KOps/s $\color{#35bf28}+0.45\%$
test_membership_stacked_nested_leaf 40.9570μs 11.6725μs 85.6714 KOps/s 84.6896 KOps/s $\color{#35bf28}+1.16\%$
test_membership_nested_last 18.7750μs 5.9497μs 168.0746 KOps/s 161.4871 KOps/s $\color{#35bf28}+4.08\%$
test_membership_nested_leaf_last 43.1900μs 6.0400μs 165.5636 KOps/s 167.3318 KOps/s $\color{#d91a1a}-1.06\%$
test_membership_stacked_nested_last 0.2362ms 0.1668ms 5.9965 KOps/s 5.9380 KOps/s $\color{#35bf28}+0.99\%$
test_membership_stacked_nested_leaf_last 38.0810μs 13.7370μs 72.7963 KOps/s 72.6581 KOps/s $\color{#35bf28}+0.19\%$
test_nested_getleaf 49.0120μs 10.8907μs 91.8212 KOps/s 94.2273 KOps/s $\color{#d91a1a}-2.55\%$
test_nested_get 29.7250μs 10.2917μs 97.1655 KOps/s 98.6443 KOps/s $\color{#d91a1a}-1.50\%$
test_stacked_getleaf 1.0957ms 0.6365ms 1.5712 KOps/s 1.5500 KOps/s $\color{#35bf28}+1.37\%$
test_stacked_get 0.7282ms 0.6024ms 1.6601 KOps/s 1.6163 KOps/s $\color{#35bf28}+2.71\%$
test_nested_getitemleaf 32.3500μs 11.0069μs 90.8523 KOps/s 93.4337 KOps/s $\color{#d91a1a}-2.76\%$
test_nested_getitem 36.9890μs 10.4809μs 95.4116 KOps/s 97.4099 KOps/s $\color{#d91a1a}-2.05\%$
test_stacked_getitemleaf 1.3457ms 0.6400ms 1.5625 KOps/s 1.5435 KOps/s $\color{#35bf28}+1.23\%$
test_stacked_getitem 0.7782ms 0.6052ms 1.6524 KOps/s 1.6164 KOps/s $\color{#35bf28}+2.23\%$
test_lock_nested 55.9243ms 0.4715ms 2.1207 KOps/s 1.7532 KOps/s $\textbf{\color{#35bf28}+20.96\%}$
test_lock_stack_nested 71.1813ms 6.4306ms 155.5076 Ops/s 194.7076 Ops/s $\textbf{\color{#d91a1a}-20.13\%}$
test_unlock_nested 1.0946ms 0.4282ms 2.3355 KOps/s 2.2183 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_unlock_stack_nested 75.7449ms 6.1912ms 161.5185 Ops/s 141.9828 Ops/s $\textbf{\color{#35bf28}+13.76\%}$
test_flatten_speed 0.3474ms 0.2679ms 3.7333 KOps/s 3.7229 KOps/s $\color{#35bf28}+0.28\%$
test_unflatten_speed 0.7638ms 0.4549ms 2.1983 KOps/s 2.1530 KOps/s $\color{#35bf28}+2.10\%$
test_common_ops 1.2004ms 0.6457ms 1.5487 KOps/s 1.4891 KOps/s $\color{#35bf28}+4.00\%$
test_creation 17.6630μs 2.0157μs 496.0936 KOps/s 406.6412 KOps/s $\textbf{\color{#35bf28}+22.00\%}$
test_creation_empty 24.3350μs 7.5692μs 132.1141 KOps/s 121.0339 KOps/s $\textbf{\color{#35bf28}+9.15\%}$
test_creation_nested_1 30.5370μs 10.4050μs 96.1074 KOps/s 87.0459 KOps/s $\textbf{\color{#35bf28}+10.41\%}$
test_creation_nested_2 40.1750μs 14.0774μs 71.0358 KOps/s 65.3953 KOps/s $\textbf{\color{#35bf28}+8.63\%}$
test_clone 0.3264ms 12.2028μs 81.9485 KOps/s 72.8508 KOps/s $\textbf{\color{#35bf28}+12.49\%}$
test_getitem[int] 35.6160μs 12.2242μs 81.8049 KOps/s 75.1274 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_getitem[slice_int] 69.2390μs 23.9282μs 41.7917 KOps/s 38.7669 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_getitem[range] 95.0680μs 41.3224μs 24.2000 KOps/s 21.5938 KOps/s $\textbf{\color{#35bf28}+12.07\%}$
test_getitem[tuple] 53.2490μs 19.1933μs 52.1015 KOps/s 48.7479 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_getitem[list] 93.9750μs 36.8019μs 27.1725 KOps/s 25.0990 KOps/s $\textbf{\color{#35bf28}+8.26\%}$
test_setitem_dim[int] 45.2440μs 27.6648μs 36.1470 KOps/s 35.5712 KOps/s $\color{#35bf28}+1.62\%$
test_setitem_dim[slice_int] 90.9600μs 52.2335μs 19.1448 KOps/s 18.7400 KOps/s $\color{#35bf28}+2.16\%$
test_setitem_dim[range] 0.1411ms 71.1494μs 14.0549 KOps/s 13.7763 KOps/s $\color{#35bf28}+2.02\%$
test_setitem_dim[tuple] 77.7550μs 40.8217μs 24.4968 KOps/s 23.7869 KOps/s $\color{#35bf28}+2.98\%$
test_setitem 0.3093ms 17.0114μs 58.7842 KOps/s 53.5859 KOps/s $\textbf{\color{#35bf28}+9.70\%}$
test_set 0.3457ms 16.1540μs 61.9042 KOps/s 55.9394 KOps/s $\textbf{\color{#35bf28}+10.66\%}$
test_set_shared 2.5650ms 0.1406ms 7.1144 KOps/s 6.9959 KOps/s $\color{#35bf28}+1.69\%$
test_update 0.3720ms 18.1153μs 55.2018 KOps/s 52.2696 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_update_nested 0.1675ms 25.1964μs 39.6882 KOps/s 37.8148 KOps/s $\color{#35bf28}+4.95\%$
test_set_nested 0.2813ms 18.0567μs 55.3811 KOps/s 49.4111 KOps/s $\textbf{\color{#35bf28}+12.08\%}$
test_set_nested_new 0.3022ms 22.0835μs 45.2827 KOps/s 39.4164 KOps/s $\textbf{\color{#35bf28}+14.88\%}$
test_select 0.1078ms 45.3186μs 22.0660 KOps/s 19.7527 KOps/s $\textbf{\color{#35bf28}+11.71\%}$
test_unbind_speed 0.4500ms 0.3443ms 2.9046 KOps/s 2.6621 KOps/s $\textbf{\color{#35bf28}+9.11\%}$
test_unbind_speed_stack0 62.3335ms 4.1732ms 239.6271 Ops/s 219.2663 Ops/s $\textbf{\color{#35bf28}+9.29\%}$
test_unbind_speed_stack1 1.7813μs 0.6498μs 1.5390 MOps/s 1.6011 MOps/s $\color{#d91a1a}-3.88\%$
test_split 56.3081ms 1.6727ms 597.8299 Ops/s 552.5667 Ops/s $\textbf{\color{#35bf28}+8.19\%}$
test_chunk 60.2103ms 1.6613ms 601.9490 Ops/s 561.6808 Ops/s $\textbf{\color{#35bf28}+7.17\%}$
test_creation[device0] 3.5025ms 0.2939ms 3.4022 KOps/s 3.3705 KOps/s $\color{#35bf28}+0.94\%$
test_creation_from_tensor 54.6701ms 0.3513ms 2.8465 KOps/s 2.9894 KOps/s $\color{#d91a1a}-4.78\%$
test_add_one[memmap_tensor0] 70.7920μs 25.5057μs 39.2069 KOps/s 39.4725 KOps/s $\color{#d91a1a}-0.67\%$
test_contiguous[memmap_tensor0] 23.3940μs 5.6552μs 176.8285 KOps/s 170.0438 KOps/s $\color{#35bf28}+3.99\%$
test_stack[memmap_tensor0] 64.5500μs 18.7136μs 53.4372 KOps/s 50.4549 KOps/s $\textbf{\color{#35bf28}+5.91\%}$
test_memmaptd_index 0.4096ms 0.1996ms 5.0099 KOps/s 4.9349 KOps/s $\color{#35bf28}+1.52\%$
test_memmaptd_index_astensor 0.3333ms 0.2556ms 3.9125 KOps/s 3.7663 KOps/s $\color{#35bf28}+3.88\%$
test_memmaptd_index_op 1.0171ms 0.4916ms 2.0341 KOps/s 1.9759 KOps/s $\color{#35bf28}+2.94\%$
test_reshape_pytree 55.2430μs 22.9986μs 43.4809 KOps/s 43.4009 KOps/s $\color{#35bf28}+0.18\%$
test_reshape_td 67.1250μs 30.2667μs 33.0396 KOps/s 30.9197 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_view_pytree 53.2600μs 22.7209μs 44.0123 KOps/s 42.7509 KOps/s $\color{#35bf28}+2.95\%$
test_view_td 21.1390μs 4.8715μs 205.2737 KOps/s 201.8139 KOps/s $\color{#35bf28}+1.71\%$
test_unbind_pytree 68.8590μs 26.5443μs 37.6728 KOps/s 37.4140 KOps/s $\color{#35bf28}+0.69\%$
test_unbind_td 0.1351ms 56.2104μs 17.7903 KOps/s 16.5611 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_split_pytree 57.4070μs 26.1611μs 38.2247 KOps/s 37.6205 KOps/s $\color{#35bf28}+1.61\%$
test_split_td 85.9200μs 43.9035μs 22.7772 KOps/s 21.1466 KOps/s $\textbf{\color{#35bf28}+7.71\%}$
test_add_pytree 0.1027ms 31.9582μs 31.2909 KOps/s 30.5244 KOps/s $\color{#35bf28}+2.51\%$
test_add_td 0.1225ms 42.7492μs 23.3922 KOps/s 21.7109 KOps/s $\textbf{\color{#35bf28}+7.74\%}$
test_distributed 41.2270μs 5.9761μs 167.3325 KOps/s 159.1965 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_tdmodule 1.7342ms 22.6303μs 44.1886 KOps/s 47.4603 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_tdmodule_dispatch 0.1876ms 38.4107μs 26.0344 KOps/s 25.1418 KOps/s $\color{#35bf28}+3.55\%$
test_tdseq 42.9300μs 24.2149μs 41.2968 KOps/s 40.6520 KOps/s $\color{#35bf28}+1.59\%$
test_tdseq_dispatch 0.1308ms 41.3597μs 24.1781 KOps/s 22.8769 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_instantiation_functorch 1.4233ms 1.3106ms 763.0255 Ops/s 771.2395 Ops/s $\color{#d91a1a}-1.07\%$
test_instantiation_td 1.5688ms 1.0214ms 979.0323 Ops/s 969.5362 Ops/s $\color{#35bf28}+0.98\%$
test_exec_functorch 0.2370ms 0.1594ms 6.2737 KOps/s 6.2222 KOps/s $\color{#35bf28}+0.83\%$
test_exec_functional_call 0.2092ms 0.1473ms 6.7867 KOps/s 6.7478 KOps/s $\color{#35bf28}+0.58\%$
test_exec_td 0.2239ms 0.1425ms 7.0181 KOps/s 6.7405 KOps/s $\color{#35bf28}+4.12\%$
test_exec_td_decorator 1.0429ms 0.1744ms 5.7326 KOps/s 5.5939 KOps/s $\color{#35bf28}+2.48\%$
test_vmap_mlp_speed[True-True] 1.2892ms 0.9071ms 1.1024 KOps/s 1.0648 KOps/s $\color{#35bf28}+3.53\%$
test_vmap_mlp_speed[True-False] 0.9075ms 0.4686ms 2.1339 KOps/s 2.1052 KOps/s $\color{#35bf28}+1.36\%$
test_vmap_mlp_speed[False-True] 1.2041ms 0.7815ms 1.2796 KOps/s 1.2527 KOps/s $\color{#35bf28}+2.15\%$
test_vmap_mlp_speed[False-False] 0.6113ms 0.3857ms 2.5928 KOps/s 2.5281 KOps/s $\color{#35bf28}+2.56\%$
test_vmap_mlp_speed_decorator[True-True] 2.3594ms 1.7741ms 563.6644 Ops/s 556.1261 Ops/s $\color{#35bf28}+1.36\%$
test_vmap_mlp_speed_decorator[True-False] 0.9721ms 0.5107ms 1.9581 KOps/s 1.8846 KOps/s $\color{#35bf28}+3.90\%$
test_vmap_mlp_speed_decorator[False-True] 2.2575ms 1.4916ms 670.3991 Ops/s 657.2390 Ops/s $\color{#35bf28}+2.00\%$
test_vmap_mlp_speed_decorator[False-False] 0.7826ms 0.3969ms 2.5195 KOps/s 2.4528 KOps/s $\color{#35bf28}+2.72\%$

Copy link

github-actions bot commented Dec 9, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 83.7210μs 12.7621μs 78.3569 KOps/s 79.0512 KOps/s $\color{#d91a1a}-0.88\%$
test_plain_set_stack_nested 0.1560ms 0.1156ms 8.6526 KOps/s 8.3593 KOps/s $\color{#35bf28}+3.51\%$
test_plain_set_nested_inplace 34.7810μs 13.9954μs 71.4520 KOps/s 71.3493 KOps/s $\color{#35bf28}+0.14\%$
test_plain_set_stack_nested_inplace 0.1697ms 0.1409ms 7.0960 KOps/s 7.0245 KOps/s $\color{#35bf28}+1.02\%$
test_items 30.6010μs 4.6541μs 214.8644 KOps/s 209.9038 KOps/s $\color{#35bf28}+2.36\%$
test_items_nested 0.3657ms 0.3381ms 2.9581 KOps/s 2.9364 KOps/s $\color{#35bf28}+0.74\%$
test_items_nested_locked 0.4329ms 0.3417ms 2.9269 KOps/s 2.9146 KOps/s $\color{#35bf28}+0.42\%$
test_items_nested_leaf 0.2525ms 0.1982ms 5.0446 KOps/s 4.9906 KOps/s $\color{#35bf28}+1.08\%$
test_items_stack_nested 1.4995ms 1.4476ms 690.7909 Ops/s 685.3963 Ops/s $\color{#35bf28}+0.79\%$
test_items_stack_nested_leaf 1.3183ms 1.2691ms 787.9822 Ops/s 777.2556 Ops/s $\color{#35bf28}+1.38\%$
test_items_stack_nested_locked 2.3143ms 0.8125ms 1.2308 KOps/s 1.2379 KOps/s $\color{#d91a1a}-0.58\%$
test_keys 21.2410μs 4.6094μs 216.9457 KOps/s 214.7811 KOps/s $\color{#35bf28}+1.01\%$
test_keys_nested 0.4943ms 90.0926μs 11.0997 KOps/s 11.0436 KOps/s $\color{#35bf28}+0.51\%$
test_keys_nested_locked 0.1601ms 89.4548μs 11.1788 KOps/s 11.1478 KOps/s $\color{#35bf28}+0.28\%$
test_keys_nested_leaf 42.1992ms 87.5710μs 11.4193 KOps/s 12.2694 KOps/s $\textbf{\color{#d91a1a}-6.93\%}$
test_keys_stack_nested 1.3364ms 1.2578ms 795.0387 Ops/s 796.8540 Ops/s $\color{#d91a1a}-0.23\%$
test_keys_stack_nested_leaf 1.3343ms 1.2466ms 802.1824 Ops/s 799.9001 Ops/s $\color{#35bf28}+0.29\%$
test_keys_stack_nested_locked 0.7180ms 0.6085ms 1.6435 KOps/s 1.6309 KOps/s $\color{#35bf28}+0.77\%$
test_values 30.1040μs 1.8781μs 532.4393 KOps/s 531.6429 KOps/s $\color{#35bf28}+0.15\%$
test_values_nested 60.8400μs 42.5472μs 23.5033 KOps/s 23.0152 KOps/s $\color{#35bf28}+2.12\%$
test_values_nested_locked 97.8110μs 45.0755μs 22.1850 KOps/s 22.0072 KOps/s $\color{#35bf28}+0.81\%$
test_values_nested_leaf 58.5500μs 37.0522μs 26.9889 KOps/s 26.4622 KOps/s $\color{#35bf28}+1.99\%$
test_values_stack_nested 1.1571ms 1.0965ms 912.0059 Ops/s 897.8887 Ops/s $\color{#35bf28}+1.57\%$
test_values_stack_nested_leaf 1.1542ms 1.0849ms 921.7077 Ops/s 908.8319 Ops/s $\color{#35bf28}+1.42\%$
test_values_stack_nested_locked 0.5322ms 0.4809ms 2.0795 KOps/s 2.0480 KOps/s $\color{#35bf28}+1.54\%$
test_membership 5.1962μs 0.9448μs 1.0585 MOps/s 1.0590 MOps/s $\color{#d91a1a}-0.05\%$
test_membership_nested 20.3390μs 2.1715μs 460.5208 KOps/s 450.4476 KOps/s $\color{#35bf28}+2.24\%$
test_membership_nested_leaf 17.3305μs 2.0963μs 477.0325 KOps/s 472.5390 KOps/s $\color{#35bf28}+0.95\%$
test_membership_stacked_nested 32.2100μs 10.9410μs 91.3996 KOps/s 91.2098 KOps/s $\color{#35bf28}+0.21\%$
test_membership_stacked_nested_leaf 27.5200μs 10.9551μs 91.2819 KOps/s 91.4715 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_nested_last 19.8190μs 4.6035μs 217.2263 KOps/s 215.4742 KOps/s $\color{#35bf28}+0.81\%$
test_membership_nested_leaf_last 26.5200μs 4.6650μs 214.3637 KOps/s 216.0180 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_stacked_nested_last 0.2082ms 0.1326ms 7.5406 KOps/s 7.4226 KOps/s $\color{#35bf28}+1.59\%$
test_membership_stacked_nested_leaf_last 25.4410μs 12.7372μs 78.5100 KOps/s 78.8421 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_getleaf 24.4000μs 8.3178μs 120.2242 KOps/s 119.3984 KOps/s $\color{#35bf28}+0.69\%$
test_nested_get 21.3500μs 7.8835μs 126.8473 KOps/s 125.9135 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_getleaf 0.6088ms 0.5502ms 1.8176 KOps/s 1.7929 KOps/s $\color{#35bf28}+1.38\%$
test_stacked_get 0.5942ms 0.5182ms 1.9298 KOps/s 1.8975 KOps/s $\color{#35bf28}+1.70\%$
test_nested_getitemleaf 29.8800μs 8.4548μs 118.2766 KOps/s 118.5289 KOps/s $\color{#d91a1a}-0.21\%$
test_nested_getitem 20.9600μs 7.9955μs 125.0703 KOps/s 125.0805 KOps/s $-0.01\%$
test_stacked_getitemleaf 0.5949ms 0.5619ms 1.7797 KOps/s 1.7860 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_getitem 0.5698ms 0.5235ms 1.9101 KOps/s 1.9053 KOps/s $\color{#35bf28}+0.26\%$
test_lock_nested 1.5763ms 0.4128ms 2.4224 KOps/s 1.8121 KOps/s $\textbf{\color{#35bf28}+33.68\%}$
test_lock_stack_nested 62.4200ms 5.8397ms 171.2430 Ops/s 137.8077 Ops/s $\textbf{\color{#35bf28}+24.26\%}$
test_unlock_nested 0.9092ms 0.4096ms 2.4417 KOps/s 2.3400 KOps/s $\color{#35bf28}+4.34\%$
test_unlock_stack_nested 61.0779ms 5.9527ms 167.9915 Ops/s 162.7464 Ops/s $\color{#35bf28}+3.22\%$
test_flatten_speed 0.4697ms 0.1869ms 5.3514 KOps/s 5.3659 KOps/s $\color{#d91a1a}-0.27\%$
test_unflatten_speed 0.3773ms 0.3522ms 2.8394 KOps/s 2.7979 KOps/s $\color{#35bf28}+1.48\%$
test_common_ops 1.0376ms 0.5561ms 1.7983 KOps/s 1.6709 KOps/s $\textbf{\color{#35bf28}+7.62\%}$
test_creation 36.8600μs 1.6103μs 620.9941 KOps/s 472.6666 KOps/s $\textbf{\color{#35bf28}+31.38\%}$
test_creation_empty 30.2510μs 6.8770μs 145.4130 KOps/s 138.4280 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_creation_nested_1 62.1900μs 8.7545μs 114.2266 KOps/s 105.3313 KOps/s $\textbf{\color{#35bf28}+8.45\%}$
test_creation_nested_2 30.8710μs 11.4377μs 87.4301 KOps/s 83.4612 KOps/s $\color{#35bf28}+4.76\%$
test_clone 98.1220μs 12.9280μs 77.3517 KOps/s 70.6149 KOps/s $\textbf{\color{#35bf28}+9.54\%}$
test_getitem[int] 29.3000μs 10.9160μs 91.6085 KOps/s 81.8123 KOps/s $\textbf{\color{#35bf28}+11.97\%}$
test_getitem[slice_int] 37.0810μs 21.0090μs 47.5986 KOps/s 44.8231 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_getitem[range] 0.3034ms 36.0542μs 27.7360 KOps/s 24.5839 KOps/s $\textbf{\color{#35bf28}+12.82\%}$
test_getitem[tuple] 35.6010μs 18.6207μs 53.7036 KOps/s 50.1122 KOps/s $\textbf{\color{#35bf28}+7.17\%}$
test_getitem[list] 0.3092ms 33.4237μs 29.9189 KOps/s 27.1753 KOps/s $\textbf{\color{#35bf28}+10.10\%}$
test_setitem_dim[int] 39.6200μs 24.2599μs 41.2203 KOps/s 39.9731 KOps/s $\color{#35bf28}+3.12\%$
test_setitem_dim[slice_int] 63.9510μs 43.0165μs 23.2469 KOps/s 22.8717 KOps/s $\color{#35bf28}+1.64\%$
test_setitem_dim[range] 85.9920μs 60.0378μs 16.6562 KOps/s 16.7510 KOps/s $\color{#d91a1a}-0.57\%$
test_setitem_dim[tuple] 51.9910μs 36.1757μs 27.6429 KOps/s 27.2786 KOps/s $\color{#35bf28}+1.34\%$
test_setitem 0.1197ms 16.5515μs 60.4176 KOps/s 55.9356 KOps/s $\textbf{\color{#35bf28}+8.01\%}$
test_set 0.1228ms 16.0256μs 62.4002 KOps/s 57.1606 KOps/s $\textbf{\color{#35bf28}+9.17\%}$
test_set_shared 3.0495ms 0.1022ms 9.7831 KOps/s 8.7164 KOps/s $\textbf{\color{#35bf28}+12.24\%}$
test_update 0.1194ms 18.0992μs 55.2511 KOps/s 53.7012 KOps/s $\color{#35bf28}+2.89\%$
test_update_nested 0.1045ms 23.7728μs 42.0649 KOps/s 39.3632 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_set_nested 0.1093ms 17.2676μs 57.9121 KOps/s 53.1428 KOps/s $\textbf{\color{#35bf28}+8.97\%}$
test_set_nested_new 0.1031ms 20.4462μs 48.9088 KOps/s 44.3777 KOps/s $\textbf{\color{#35bf28}+10.21\%}$
test_select 69.4210μs 41.9010μs 23.8658 KOps/s 21.7470 KOps/s $\textbf{\color{#35bf28}+9.74\%}$
test_to 70.4810μs 51.1620μs 19.5458 KOps/s 19.0235 KOps/s $\color{#35bf28}+2.75\%$
test_to_nonblocking 65.8310μs 33.0393μs 30.2670 KOps/s 29.3425 KOps/s $\color{#35bf28}+3.15\%$
test_unbind_speed 0.3753ms 0.3225ms 3.1004 KOps/s 2.8058 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_unbind_speed_stack0 60.4188ms 3.9403ms 253.7892 Ops/s 236.8110 Ops/s $\textbf{\color{#35bf28}+7.17\%}$
test_unbind_speed_stack1 1.3036μs 0.5208μs 1.9200 MOps/s 1.9167 MOps/s $\color{#35bf28}+0.17\%$
test_split 53.9667ms 1.6246ms 615.5377 Ops/s 578.6883 Ops/s $\textbf{\color{#35bf28}+6.37\%}$
test_chunk 53.6742ms 1.6076ms 622.0421 Ops/s 583.7389 Ops/s $\textbf{\color{#35bf28}+6.56\%}$
test_creation[device0] 0.5060ms 0.3100ms 3.2262 KOps/s 3.2370 KOps/s $\color{#d91a1a}-0.33\%$
test_creation[device1] 0.8045ms 0.3123ms 3.2025 KOps/s 3.2002 KOps/s $\color{#35bf28}+0.07\%$
test_creation_from_tensor 0.6975ms 0.3366ms 2.9706 KOps/s 2.9655 KOps/s $\color{#35bf28}+0.17\%$
test_add_one[memmap_tensor0] 0.1586ms 23.5327μs 42.4941 KOps/s 41.7519 KOps/s $\color{#35bf28}+1.78\%$
test_add_one[memmap_tensor1] 0.2017ms 71.5607μs 13.9741 KOps/s 13.9496 KOps/s $\color{#35bf28}+0.18\%$
test_contiguous[memmap_tensor0] 26.0200μs 5.9528μs 167.9890 KOps/s 165.3439 KOps/s $\color{#35bf28}+1.60\%$
test_contiguous[memmap_tensor1] 50.5500μs 21.1110μs 47.3686 KOps/s 46.1116 KOps/s $\color{#35bf28}+2.73\%$
test_stack[memmap_tensor0] 42.7600μs 19.2130μs 52.0480 KOps/s 50.0396 KOps/s $\color{#35bf28}+4.01\%$
test_stack[memmap_tensor1] 0.1048ms 70.7573μs 14.1328 KOps/s 14.0558 KOps/s $\color{#35bf28}+0.55\%$
test_memmaptd_index 0.2675ms 0.2349ms 4.2563 KOps/s 4.2313 KOps/s $\color{#35bf28}+0.59\%$
test_memmaptd_index_astensor 0.3203ms 0.2896ms 3.4530 KOps/s 3.4589 KOps/s $\color{#d91a1a}-0.17\%$
test_memmaptd_index_op 0.5961ms 0.5366ms 1.8635 KOps/s 1.8400 KOps/s $\color{#35bf28}+1.28\%$
test_reshape_pytree 37.9510μs 20.3916μs 49.0398 KOps/s 47.8656 KOps/s $\color{#35bf28}+2.45\%$
test_reshape_td 51.7910μs 28.9380μs 34.5567 KOps/s 32.3584 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_view_pytree 87.1210μs 20.3197μs 49.2134 KOps/s 48.4753 KOps/s $\color{#35bf28}+1.52\%$
test_view_td 17.1490μs 4.0578μs 246.4413 KOps/s 244.7759 KOps/s $\color{#35bf28}+0.68\%$
test_unbind_pytree 48.6620μs 25.1931μs 39.6934 KOps/s 38.7756 KOps/s $\color{#35bf28}+2.37\%$
test_unbind_td 85.2520μs 51.3515μs 19.4736 KOps/s 17.7604 KOps/s $\textbf{\color{#35bf28}+9.65\%}$
test_split_pytree 37.9820μs 23.2735μs 42.9673 KOps/s 42.2550 KOps/s $\color{#35bf28}+1.69\%$
test_split_td 63.0610μs 38.6434μs 25.8776 KOps/s 23.0997 KOps/s $\textbf{\color{#35bf28}+12.03\%}$
test_add_pytree 47.9710μs 30.8703μs 32.3936 KOps/s 31.8887 KOps/s $\color{#35bf28}+1.58\%$
test_add_td 67.2310μs 40.8086μs 24.5046 KOps/s 22.8902 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_distributed 17.5400μs 5.4476μs 183.5668 KOps/s 179.7491 KOps/s $\color{#35bf28}+2.12\%$
test_tdmodule 54.7110μs 16.6838μs 59.9384 KOps/s 59.8398 KOps/s $\color{#35bf28}+0.16\%$
test_tdmodule_dispatch 0.1373ms 31.7881μs 31.4583 KOps/s 30.2012 KOps/s $\color{#35bf28}+4.16\%$
test_tdseq 33.9410μs 19.7973μs 50.5119 KOps/s 50.0432 KOps/s $\color{#35bf28}+0.94\%$
test_tdseq_dispatch 57.8310μs 35.5318μs 28.1438 KOps/s 27.1659 KOps/s $\color{#35bf28}+3.60\%$
test_instantiation_functorch 1.7219ms 1.6666ms 600.0099 Ops/s 605.9721 Ops/s $\color{#d91a1a}-0.98\%$
test_instantiation_td 1.7218ms 1.1674ms 856.5825 Ops/s 844.6519 Ops/s $\color{#35bf28}+1.41\%$
test_exec_functorch 0.2032ms 0.1554ms 6.4343 KOps/s 6.4190 KOps/s $\color{#35bf28}+0.24\%$
test_exec_functional_call 0.1983ms 0.1473ms 6.7896 KOps/s 6.6548 KOps/s $\color{#35bf28}+2.03\%$
test_exec_td 0.1710ms 0.1399ms 7.1485 KOps/s 7.1394 KOps/s $\color{#35bf28}+0.13\%$
test_exec_td_decorator 0.5944ms 0.1766ms 5.6630 KOps/s 5.5151 KOps/s $\color{#35bf28}+2.68\%$
test_vmap_mlp_speed[True-True] 1.1014ms 1.0423ms 959.3900 Ops/s 939.5073 Ops/s $\color{#35bf28}+2.12\%$
test_vmap_mlp_speed[True-False] 0.7177ms 0.5936ms 1.6846 KOps/s 1.6403 KOps/s $\color{#35bf28}+2.70\%$
test_vmap_mlp_speed[False-True] 0.9906ms 0.9540ms 1.0483 KOps/s 1.0345 KOps/s $\color{#35bf28}+1.34\%$
test_vmap_mlp_speed[False-False] 0.5861ms 0.5225ms 1.9138 KOps/s 1.8752 KOps/s $\color{#35bf28}+2.06\%$
test_vmap_mlp_speed_decorator[True-True] 2.6130ms 1.9979ms 500.5159 Ops/s 495.4380 Ops/s $\color{#35bf28}+1.02\%$
test_vmap_mlp_speed_decorator[True-False] 1.0219ms 0.6318ms 1.5827 KOps/s 1.5583 KOps/s $\color{#35bf28}+1.56\%$
test_vmap_mlp_speed_decorator[False-True] 2.1407ms 1.7259ms 579.4146 Ops/s 563.6227 Ops/s $\color{#35bf28}+2.80\%$
test_vmap_mlp_speed_decorator[False-False] 0.8272ms 0.5396ms 1.8531 KOps/s 1.8137 KOps/s $\color{#35bf28}+2.17\%$
test_vmap_transformer_speed[True-True] 12.1691ms 12.1124ms 82.5597 Ops/s 81.8928 Ops/s $\color{#35bf28}+0.81\%$
test_vmap_transformer_speed[True-False] 7.9535ms 7.8991ms 126.5959 Ops/s 123.1632 Ops/s $\color{#35bf28}+2.79\%$
test_vmap_transformer_speed[False-True] 12.0650ms 12.0124ms 83.2476 Ops/s 82.7925 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed[False-False] 7.8841ms 7.7991ms 128.2200 Ops/s 124.7879 Ops/s $\color{#35bf28}+2.75\%$
test_vmap_transformer_speed_decorator[True-True] 0.1353s 67.2646ms 14.8667 Ops/s 14.5026 Ops/s $\color{#35bf28}+2.51\%$
test_vmap_transformer_speed_decorator[True-False] 20.8393ms 19.1048ms 52.3428 Ops/s 50.4640 Ops/s $\color{#35bf28}+3.72\%$
test_vmap_transformer_speed_decorator[False-True] 57.8216ms 56.9590ms 17.5565 Ops/s 16.9333 Ops/s $\color{#35bf28}+3.68\%$
test_vmap_transformer_speed_decorator[False-False] 20.3741ms 18.6562ms 53.6014 Ops/s 51.9814 Ops/s $\color{#35bf28}+3.12\%$

@vmoens vmoens marked this pull request as ready for review December 11, 2023 10:31
@vmoens vmoens merged commit 8d585bf into main Dec 11, 2023
11 of 14 checks passed
@vmoens vmoens deleted the weakref-lock branch December 11, 2023 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants