Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Robust to lazy_legacy set to false and context managers for reshape ops #634

Merged
merged 13 commits into from
Jan 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 24, 2024

Soon we will make all ops non-lazy by default (for v0.3)
I expect this to be bc-breaking for some users, but most should not be impacted. In fact, life will be much easier (a lot of .contiguous() calls will be unnecessary).

The plan is to keep LazyStackedTensorDict there, as it's a useful abstraction to carry heterogeneous data structures or whenever one does not want to stack all the tensors of a data source.

In this PR, I set the lazy_legacy to False and test if all the tests pass. The plan it to reset it to True before merging, in such a way that we're sure that this PR does not break everything in torchrl for instance.

The plan for torch.stack is that we'll be looking at the lazy_legacy env variable for 1 release:

  • if None, then the user has not said if she wanted a lazy legacy stack or not. We raise a warning asking to make this clear either with the decorator or with the environment variable.
  • If True, we use the previous API
  • if False, we use the dense stack.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 24, 2024
@vmoens vmoens changed the title [Feature] Robust to lazy_legacy set to false [Feature] Robust to lazy_legacy set to false and context managers for reshape ops Jan 24, 2024
Copy link

github-actions bot commented Jan 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.1980μs 16.8970μs 59.1820 KOps/s 56.7609 KOps/s $\color{#35bf28}+4.27\%$
test_plain_set_stack_nested 0.2873ms 0.1450ms 6.8967 KOps/s 6.6405 KOps/s $\color{#35bf28}+3.86\%$
test_plain_set_nested_inplace 74.6390μs 19.8458μs 50.3886 KOps/s 49.8573 KOps/s $\color{#35bf28}+1.07\%$
test_plain_set_stack_nested_inplace 0.3378ms 0.1819ms 5.4966 KOps/s 5.4755 KOps/s $\color{#35bf28}+0.39\%$
test_items 39.0630μs 2.5017μs 399.7218 KOps/s 404.2763 KOps/s $\color{#d91a1a}-1.13\%$
test_items_nested 0.4726ms 0.2726ms 3.6685 KOps/s 3.7348 KOps/s $\color{#d91a1a}-1.77\%$
test_items_nested_locked 1.3260ms 0.2754ms 3.6310 KOps/s 3.6948 KOps/s $\color{#d91a1a}-1.73\%$
test_items_nested_leaf 0.3087ms 0.1691ms 5.9136 KOps/s 6.0574 KOps/s $\color{#d91a1a}-2.37\%$
test_items_stack_nested 1.6208ms 1.3573ms 736.7794 Ops/s 757.6026 Ops/s $\color{#d91a1a}-2.75\%$
test_items_stack_nested_leaf 1.6526ms 1.2282ms 814.1909 Ops/s 847.2262 Ops/s $\color{#d91a1a}-3.90\%$
test_items_stack_nested_locked 1.1679ms 0.9002ms 1.1108 KOps/s 1.1449 KOps/s $\color{#d91a1a}-2.97\%$
test_keys 39.5530μs 3.8554μs 259.3785 KOps/s 253.2860 KOps/s $\color{#35bf28}+2.41\%$
test_keys_nested 60.2289ms 0.1585ms 6.3087 KOps/s 6.6199 KOps/s $\color{#d91a1a}-4.70\%$
test_keys_nested_locked 0.2583ms 0.1515ms 6.5991 KOps/s 6.4914 KOps/s $\color{#35bf28}+1.66\%$
test_keys_nested_leaf 0.2999ms 0.1302ms 7.6793 KOps/s 7.5586 KOps/s $\color{#35bf28}+1.60\%$
test_keys_stack_nested 1.7501ms 1.2952ms 772.0596 Ops/s 792.1747 Ops/s $\color{#d91a1a}-2.54\%$
test_keys_stack_nested_leaf 1.5721ms 1.3001ms 769.2001 Ops/s 794.3950 Ops/s $\color{#d91a1a}-3.17\%$
test_keys_stack_nested_locked 1.0972ms 0.8231ms 1.2149 KOps/s 1.2288 KOps/s $\color{#d91a1a}-1.13\%$
test_values 8.5678μs 1.1793μs 847.9437 KOps/s 871.6537 KOps/s $\color{#d91a1a}-2.72\%$
test_values_nested 0.1115ms 51.8978μs 19.2686 KOps/s 19.3935 KOps/s $\color{#d91a1a}-0.64\%$
test_values_nested_locked 0.1537ms 51.9222μs 19.2596 KOps/s 19.3255 KOps/s $\color{#d91a1a}-0.34\%$
test_values_nested_leaf 96.6980μs 46.1667μs 21.6606 KOps/s 21.7876 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested 1.3007ms 1.0455ms 956.4685 Ops/s 973.8104 Ops/s $\color{#d91a1a}-1.78\%$
test_values_stack_nested_leaf 1.3111ms 1.0308ms 970.1172 Ops/s 952.5286 Ops/s $\color{#35bf28}+1.85\%$
test_values_stack_nested_locked 1.0738ms 0.6145ms 1.6274 KOps/s 1.6139 KOps/s $\color{#35bf28}+0.84\%$
test_membership 15.9600μs 1.3453μs 743.3146 KOps/s 723.4350 KOps/s $\color{#35bf28}+2.75\%$
test_membership_nested 23.4030μs 3.5393μs 282.5431 KOps/s 285.9554 KOps/s $\color{#d91a1a}-1.19\%$
test_membership_nested_leaf 46.8870μs 3.5528μs 281.4668 KOps/s 291.1438 KOps/s $\color{#d91a1a}-3.32\%$
test_membership_stacked_nested 55.1930μs 11.7959μs 84.7751 KOps/s 86.3721 KOps/s $\color{#d91a1a}-1.85\%$
test_membership_stacked_nested_leaf 75.0790μs 11.8049μs 84.7108 KOps/s 85.7202 KOps/s $\color{#d91a1a}-1.18\%$
test_membership_nested_last 64.0190μs 6.8051μs 146.9476 KOps/s 150.8164 KOps/s $\color{#d91a1a}-2.57\%$
test_membership_nested_leaf_last 32.0200μs 6.7877μs 147.3257 KOps/s 150.5584 KOps/s $\color{#d91a1a}-2.15\%$
test_membership_stacked_nested_last 0.2984ms 0.1773ms 5.6405 KOps/s 5.5933 KOps/s $\color{#35bf28}+0.84\%$
test_membership_stacked_nested_leaf_last 65.1110μs 13.8645μs 72.1266 KOps/s 72.5560 KOps/s $\color{#d91a1a}-0.59\%$
test_nested_getleaf 41.5970μs 10.5484μs 94.8012 KOps/s 91.7091 KOps/s $\color{#35bf28}+3.37\%$
test_nested_get 34.9250μs 10.0197μs 99.8038 KOps/s 94.8423 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_stacked_getleaf 0.7701ms 0.3948ms 2.5328 KOps/s 2.4903 KOps/s $\color{#35bf28}+1.71\%$
test_stacked_get 0.4596ms 0.3616ms 2.7657 KOps/s 2.7227 KOps/s $\color{#35bf28}+1.58\%$
test_nested_getitemleaf 53.3990μs 12.0311μs 83.1176 KOps/s 79.6608 KOps/s $\color{#35bf28}+4.34\%$
test_nested_getitem 35.9370μs 11.5891μs 86.2883 KOps/s 83.5899 KOps/s $\color{#35bf28}+3.23\%$
test_stacked_getitemleaf 0.5010ms 0.3975ms 2.5157 KOps/s 2.4933 KOps/s $\color{#35bf28}+0.90\%$
test_stacked_getitem 0.6647ms 0.3678ms 2.7188 KOps/s 2.6896 KOps/s $\color{#35bf28}+1.09\%$
test_lock_nested 0.8614ms 0.3418ms 2.9260 KOps/s 2.8751 KOps/s $\color{#35bf28}+1.77\%$
test_lock_stack_nested 96.3494ms 6.2762ms 159.3321 Ops/s 160.1247 Ops/s $\color{#d91a1a}-0.49\%$
test_unlock_nested 1.0878ms 0.3479ms 2.8746 KOps/s 2.3845 KOps/s $\textbf{\color{#35bf28}+20.56\%}$
test_unlock_stack_nested 99.9662ms 6.3416ms 157.6896 Ops/s 153.2252 Ops/s $\color{#35bf28}+2.91\%$
test_flatten_speed 1.7280ms 0.3653ms 2.7375 KOps/s 2.6874 KOps/s $\color{#35bf28}+1.86\%$
test_unflatten_speed 0.9359ms 0.4681ms 2.1362 KOps/s 2.1169 KOps/s $\color{#35bf28}+0.91\%$
test_common_ops 3.7391ms 0.6981ms 1.4325 KOps/s 1.3865 KOps/s $\color{#35bf28}+3.31\%$
test_creation 16.7420μs 1.8484μs 541.0035 KOps/s 511.3186 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_creation_empty 31.5090μs 10.1736μs 98.2937 KOps/s 93.2626 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_creation_nested_1 51.5560μs 12.8968μs 77.5385 KOps/s 74.3629 KOps/s $\color{#35bf28}+4.27\%$
test_creation_nested_2 41.2370μs 16.0139μs 62.4458 KOps/s 60.6085 KOps/s $\color{#35bf28}+3.03\%$
test_clone 0.1293ms 13.1480μs 76.0569 KOps/s 76.0677 KOps/s $\color{#d91a1a}-0.01\%$
test_getitem[int] 43.7120μs 10.9273μs 91.5138 KOps/s 87.4933 KOps/s $\color{#35bf28}+4.60\%$
test_getitem[slice_int] 73.2470μs 22.3648μs 44.7132 KOps/s 43.3079 KOps/s $\color{#35bf28}+3.24\%$
test_getitem[range] 0.2593ms 42.6977μs 23.4205 KOps/s 23.3559 KOps/s $\color{#35bf28}+0.28\%$
test_getitem[tuple] 55.3930μs 17.9256μs 55.7861 KOps/s 54.1860 KOps/s $\color{#35bf28}+2.95\%$
test_getitem[list] 0.2728ms 37.4585μs 26.6962 KOps/s 26.4048 KOps/s $\color{#35bf28}+1.10\%$
test_setitem_dim[int] 0.1023ms 31.0600μs 32.1957 KOps/s 32.2782 KOps/s $\color{#d91a1a}-0.26\%$
test_setitem_dim[slice_int] 0.1132ms 58.1238μs 17.2047 KOps/s 17.5436 KOps/s $\color{#d91a1a}-1.93\%$
test_setitem_dim[range] 0.1411ms 76.2716μs 13.1110 KOps/s 13.0195 KOps/s $\color{#35bf28}+0.70\%$
test_setitem_dim[tuple] 90.0390μs 44.8220μs 22.3105 KOps/s 20.9754 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_setitem 0.1461ms 19.4024μs 51.5399 KOps/s 49.4559 KOps/s $\color{#35bf28}+4.21\%$
test_set 0.1942ms 18.6582μs 53.5959 KOps/s 50.6630 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_set_shared 2.1547ms 0.1471ms 6.7998 KOps/s 6.7999 KOps/s $-0.00\%$
test_update 0.1377ms 21.4446μs 46.6317 KOps/s 44.2365 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_update_nested 0.2042ms 28.8170μs 34.7017 KOps/s 33.2592 KOps/s $\color{#35bf28}+4.34\%$
test_set_nested 0.1478ms 20.4796μs 48.8290 KOps/s 46.8943 KOps/s $\color{#35bf28}+4.13\%$
test_set_nested_new 0.2056ms 24.3493μs 41.0690 KOps/s 40.1797 KOps/s $\color{#35bf28}+2.21\%$
test_select 0.1799ms 37.4507μs 26.7018 KOps/s 26.2041 KOps/s $\color{#35bf28}+1.90\%$
test_select_nested 0.1235ms 57.4250μs 17.4140 KOps/s 17.1551 KOps/s $\color{#35bf28}+1.51\%$
test_exclude_nested 0.2147ms 0.1075ms 9.2998 KOps/s 9.2201 KOps/s $\color{#35bf28}+0.87\%$
test_empty[True] 0.4802ms 0.3218ms 3.1075 KOps/s 3.0932 KOps/s $\color{#35bf28}+0.46\%$
test_empty[False] 8.0108μs 1.0307μs 970.2021 KOps/s 972.3781 KOps/s $\color{#d91a1a}-0.22\%$
test_unbind_speed 0.3771ms 0.2453ms 4.0766 KOps/s 4.0697 KOps/s $\color{#35bf28}+0.17\%$
test_unbind_speed_stack0 84.8449ms 3.3996ms 294.1512 Ops/s 292.7253 Ops/s $\color{#35bf28}+0.49\%$
test_unbind_speed_stack1 22.9430μs 2.0106μs 497.3567 KOps/s 515.6001 KOps/s $\color{#d91a1a}-3.54\%$
test_split 77.7187ms 1.6422ms 608.9278 Ops/s 589.7991 Ops/s $\color{#35bf28}+3.24\%$
test_chunk 0.1045s 1.6395ms 609.9350 Ops/s 617.6075 Ops/s $\color{#d91a1a}-1.24\%$
test_creation[device0] 3.6584ms 0.1060ms 9.4302 KOps/s 9.4975 KOps/s $\color{#d91a1a}-0.71\%$
test_creation_from_tensor 0.2507ms 83.8240μs 11.9298 KOps/s 11.9819 KOps/s $\color{#d91a1a}-0.43\%$
test_add_one[memmap_tensor0] 0.6054ms 5.3538μs 186.7849 KOps/s 182.7295 KOps/s $\color{#35bf28}+2.22\%$
test_contiguous[memmap_tensor0] 8.0050μs 0.6411μs 1.5598 MOps/s 1.5579 MOps/s $\color{#35bf28}+0.13\%$
test_stack[memmap_tensor0] 0.1620ms 3.6425μs 274.5345 KOps/s 274.4170 KOps/s $\color{#35bf28}+0.04\%$
test_memmaptd_index 1.2162ms 0.2221ms 4.5026 KOps/s 4.4714 KOps/s $\color{#35bf28}+0.70\%$
test_memmaptd_index_astensor 0.6714ms 0.2816ms 3.5518 KOps/s 3.5143 KOps/s $\color{#35bf28}+1.07\%$
test_memmaptd_index_op 0.8908ms 0.5675ms 1.7622 KOps/s 1.7077 KOps/s $\color{#35bf28}+3.19\%$
test_serialize_model 0.1888s 0.1127s 8.8770 Ops/s 8.7665 Ops/s $\color{#35bf28}+1.26\%$
test_serialize_model_pickle 0.4500s 0.3778s 2.6472 Ops/s 2.6109 Ops/s $\color{#35bf28}+1.39\%$
test_serialize_weights 0.1845s 0.1115s 8.9720 Ops/s 8.8701 Ops/s $\color{#35bf28}+1.15\%$
test_serialize_weights_returnearly 0.3224s 0.1551s 6.4471 Ops/s 8.0272 Ops/s $\textbf{\color{#d91a1a}-19.69\%}$
test_serialize_weights_pickle 0.8979s 0.6177s 1.6189 Ops/s 2.5744 Ops/s $\textbf{\color{#d91a1a}-37.12\%}$
test_serialize_weights_filesystem 0.1034s 94.3361ms 10.6004 Ops/s 10.1109 Ops/s $\color{#35bf28}+4.84\%$
test_serialize_model_filesystem 0.1808s 0.1036s 9.6549 Ops/s 9.3314 Ops/s $\color{#35bf28}+3.47\%$
test_reshape_pytree 78.9870μs 22.9130μs 43.6434 KOps/s 44.0655 KOps/s $\color{#d91a1a}-0.96\%$
test_reshape_td 77.6850μs 29.7244μs 33.6424 KOps/s 34.4624 KOps/s $\color{#d91a1a}-2.38\%$
test_view_pytree 90.9090μs 22.7611μs 43.9345 KOps/s 44.1057 KOps/s $\color{#d91a1a}-0.39\%$
test_view_td 92.9196ms 12.0704μs 82.8473 KOps/s 206.3805 KOps/s $\textbf{\color{#d91a1a}-59.86\%}$
test_unbind_pytree 59.3600μs 25.8909μs 38.6236 KOps/s 37.9758 KOps/s $\color{#35bf28}+1.71\%$
test_unbind_td 0.1268ms 34.7215μs 28.8006 KOps/s 28.3749 KOps/s $\color{#35bf28}+1.50\%$
test_split_pytree 83.2350μs 25.8320μs 38.7117 KOps/s 38.7217 KOps/s $\color{#d91a1a}-0.03\%$
test_split_td 0.1354ms 40.2334μs 24.8550 KOps/s 24.0348 KOps/s $\color{#35bf28}+3.41\%$
test_add_pytree 0.1152ms 31.7344μs 31.5116 KOps/s 31.4110 KOps/s $\color{#35bf28}+0.32\%$
test_add_td 0.1172ms 49.0615μs 20.3826 KOps/s 20.1332 KOps/s $\color{#35bf28}+1.24\%$
test_distributed 0.3415ms 0.1004ms 9.9634 KOps/s 9.7168 KOps/s $\color{#35bf28}+2.54\%$
test_tdmodule 0.2275ms 22.2541μs 44.9355 KOps/s 43.8722 KOps/s $\color{#35bf28}+2.42\%$
test_tdmodule_dispatch 0.2148ms 44.1638μs 22.6430 KOps/s 22.6723 KOps/s $\color{#d91a1a}-0.13\%$
test_tdseq 66.4340μs 25.5696μs 39.1089 KOps/s 38.4527 KOps/s $\color{#35bf28}+1.71\%$
test_tdseq_dispatch 0.1639ms 48.0628μs 20.8061 KOps/s 20.3407 KOps/s $\color{#35bf28}+2.29\%$
test_instantiation_functorch 2.0506ms 1.3072ms 764.9911 Ops/s 773.7957 Ops/s $\color{#d91a1a}-1.14\%$
test_instantiation_td 1.9145ms 1.0059ms 994.0989 Ops/s 994.5674 Ops/s $\color{#d91a1a}-0.05\%$
test_exec_functorch 0.2893ms 0.1578ms 6.3381 KOps/s 6.2240 KOps/s $\color{#35bf28}+1.83\%$
test_exec_functional_call 0.3720ms 0.1480ms 6.7581 KOps/s 6.7387 KOps/s $\color{#35bf28}+0.29\%$
test_exec_td 0.2312ms 0.1423ms 7.0280 KOps/s 6.8768 KOps/s $\color{#35bf28}+2.20\%$
test_exec_td_decorator 1.0975ms 0.1783ms 5.6089 KOps/s 5.4616 KOps/s $\color{#35bf28}+2.70\%$
test_vmap_mlp_speed[True-True] 1.1775ms 0.9068ms 1.1028 KOps/s 1.1164 KOps/s $\color{#d91a1a}-1.22\%$
test_vmap_mlp_speed[True-False] 0.8149ms 0.4797ms 2.0848 KOps/s 2.0750 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed[False-True] 1.1517ms 0.7934ms 1.2604 KOps/s 1.3116 KOps/s $\color{#d91a1a}-3.90\%$
test_vmap_mlp_speed[False-False] 0.7000ms 0.3880ms 2.5774 KOps/s 2.5468 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_mlp_speed_decorator[True-True] 3.5917ms 2.3785ms 420.4299 Ops/s 431.2044 Ops/s $\color{#d91a1a}-2.50\%$
test_vmap_mlp_speed_decorator[True-False] 0.9591ms 0.5348ms 1.8699 KOps/s 1.8599 KOps/s $\color{#35bf28}+0.53\%$
test_vmap_mlp_speed_decorator[False-True] 3.0284ms 1.9143ms 522.3898 Ops/s 519.5748 Ops/s $\color{#35bf28}+0.54\%$
test_vmap_mlp_speed_decorator[False-False] 0.1121s 0.4528ms 2.2086 KOps/s 2.4216 KOps/s $\textbf{\color{#d91a1a}-8.79\%}$

Copy link

github-actions bot commented Jan 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 63.1198ms 16.5707μs 60.3476 KOps/s 73.6924 KOps/s $\textbf{\color{#d91a1a}-18.11\%}$
test_plain_set_stack_nested 0.1630ms 0.1181ms 8.4676 KOps/s 8.3845 KOps/s $\color{#35bf28}+0.99\%$
test_plain_set_nested_inplace 48.3500μs 14.8734μs 67.2342 KOps/s 67.0859 KOps/s $\color{#35bf28}+0.22\%$
test_plain_set_stack_nested_inplace 0.2667ms 0.1461ms 6.8442 KOps/s 6.7392 KOps/s $\color{#35bf28}+1.56\%$
test_items 30.9310μs 4.7876μs 208.8731 KOps/s 209.7709 KOps/s $\color{#d91a1a}-0.43\%$
test_items_nested 0.4315ms 0.3409ms 2.9335 KOps/s 2.9529 KOps/s $\color{#d91a1a}-0.66\%$
test_items_nested_locked 0.4205ms 0.3446ms 2.9018 KOps/s 2.9212 KOps/s $\color{#d91a1a}-0.66\%$
test_items_nested_leaf 0.2408ms 0.2017ms 4.9580 KOps/s 4.9917 KOps/s $\color{#d91a1a}-0.67\%$
test_items_stack_nested 1.4273ms 1.3051ms 766.2119 Ops/s 776.3303 Ops/s $\color{#d91a1a}-1.30\%$
test_items_stack_nested_leaf 1.2616ms 1.1392ms 877.8327 Ops/s 882.3034 Ops/s $\color{#d91a1a}-0.51\%$
test_items_stack_nested_locked 1.9467ms 0.8895ms 1.1242 KOps/s 1.1416 KOps/s $\color{#d91a1a}-1.52\%$
test_keys 46.6610μs 4.5802μs 218.3295 KOps/s 218.8466 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested 0.5789ms 94.3991μs 10.5933 KOps/s 10.5688 KOps/s $\color{#35bf28}+0.23\%$
test_keys_nested_locked 0.1661ms 99.5571μs 10.0445 KOps/s 10.2508 KOps/s $\color{#d91a1a}-2.01\%$
test_keys_nested_leaf 0.1998ms 77.8415μs 12.8466 KOps/s 12.8393 KOps/s $\color{#35bf28}+0.06\%$
test_keys_stack_nested 1.2130ms 1.1236ms 890.0358 Ops/s 883.3763 Ops/s $\color{#35bf28}+0.75\%$
test_keys_stack_nested_leaf 1.1593ms 1.1095ms 901.3401 Ops/s 883.5407 Ops/s $\color{#35bf28}+2.01\%$
test_keys_stack_nested_locked 0.7726ms 0.7105ms 1.4074 KOps/s 1.4156 KOps/s $\color{#d91a1a}-0.57\%$
test_values 10.7170μs 1.8890μs 529.3863 KOps/s 527.0881 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested 82.5420μs 44.8711μs 22.2861 KOps/s 22.1974 KOps/s $\color{#35bf28}+0.40\%$
test_values_nested_locked 77.1410μs 46.9885μs 21.2818 KOps/s 21.2508 KOps/s $\color{#35bf28}+0.15\%$
test_values_nested_leaf 64.9010μs 39.2474μs 25.4794 KOps/s 25.2687 KOps/s $\color{#35bf28}+0.83\%$
test_values_stack_nested 1.0998ms 0.9340ms 1.0707 KOps/s 1.0638 KOps/s $\color{#35bf28}+0.65\%$
test_values_stack_nested_leaf 1.0217ms 0.9481ms 1.0548 KOps/s 1.0644 KOps/s $\color{#d91a1a}-0.90\%$
test_values_stack_nested_locked 0.7497ms 0.5630ms 1.7763 KOps/s 1.7823 KOps/s $\color{#d91a1a}-0.34\%$
test_membership 5.3902μs 0.9515μs 1.0509 MOps/s 924.8893 KOps/s $\textbf{\color{#35bf28}+13.63\%}$
test_membership_nested 31.8300μs 2.9343μs 340.7919 KOps/s 343.6826 KOps/s $\color{#d91a1a}-0.84\%$
test_membership_nested_leaf 36.5010μs 2.9252μs 341.8568 KOps/s 345.2472 KOps/s $\color{#d91a1a}-0.98\%$
test_membership_stacked_nested 0.2136ms 11.4452μs 87.3732 KOps/s 88.0263 KOps/s $\color{#d91a1a}-0.74\%$
test_membership_stacked_nested_leaf 0.1092ms 11.4445μs 87.3783 KOps/s 87.6316 KOps/s $\color{#d91a1a}-0.29\%$
test_membership_nested_last 30.7810μs 5.3512μs 186.8739 KOps/s 186.2732 KOps/s $\color{#35bf28}+0.32\%$
test_membership_nested_leaf_last 37.5610μs 5.3172μs 188.0684 KOps/s 187.5942 KOps/s $\color{#35bf28}+0.25\%$
test_membership_stacked_nested_last 0.1960ms 0.1552ms 6.4453 KOps/s 6.3917 KOps/s $\color{#35bf28}+0.84\%$
test_membership_stacked_nested_leaf_last 37.7810μs 13.3030μs 75.1713 KOps/s 75.9778 KOps/s $\color{#d91a1a}-1.06\%$
test_nested_getleaf 34.3310μs 8.4050μs 118.9770 KOps/s 118.9860 KOps/s $-0.01\%$
test_nested_get 35.6610μs 7.9140μs 126.3588 KOps/s 126.2931 KOps/s $\color{#35bf28}+0.05\%$
test_stacked_getleaf 0.3717ms 0.3312ms 3.0198 KOps/s 3.0534 KOps/s $\color{#d91a1a}-1.10\%$
test_stacked_get 0.3381ms 0.2922ms 3.4223 KOps/s 3.3940 KOps/s $\color{#35bf28}+0.83\%$
test_nested_getitemleaf 92.6420μs 9.7831μs 102.2172 KOps/s 102.0406 KOps/s $\color{#35bf28}+0.17\%$
test_nested_getitem 45.2300μs 9.3103μs 107.4084 KOps/s 106.8863 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_getitemleaf 0.3964ms 0.3322ms 3.0103 KOps/s 3.0337 KOps/s $\color{#d91a1a}-0.77\%$
test_stacked_getitem 0.3373ms 0.3006ms 3.3261 KOps/s 3.3375 KOps/s $\color{#d91a1a}-0.34\%$
test_lock_nested 0.8067ms 0.3483ms 2.8711 KOps/s 2.9002 KOps/s $\color{#d91a1a}-1.00\%$
test_lock_stack_nested 90.5102ms 6.3063ms 158.5723 Ops/s 159.3859 Ops/s $\color{#d91a1a}-0.51\%$
test_unlock_nested 86.2669ms 0.4318ms 2.3157 KOps/s 2.3458 KOps/s $\color{#d91a1a}-1.28\%$
test_unlock_stack_nested 90.3924ms 6.3845ms 156.6281 Ops/s 158.2681 Ops/s $\color{#d91a1a}-1.04\%$
test_flatten_speed 0.6486ms 0.2615ms 3.8238 KOps/s 3.7875 KOps/s $\color{#35bf28}+0.96\%$
test_unflatten_speed 0.4035ms 0.3592ms 2.7843 KOps/s 2.7537 KOps/s $\color{#35bf28}+1.11\%$
test_common_ops 1.0517ms 0.5927ms 1.6872 KOps/s 1.7323 KOps/s $\color{#d91a1a}-2.61\%$
test_creation 45.5810μs 1.5555μs 642.8700 KOps/s 647.4263 KOps/s $\color{#d91a1a}-0.70\%$
test_creation_empty 27.5100μs 8.2288μs 121.5248 KOps/s 123.3079 KOps/s $\color{#d91a1a}-1.45\%$
test_creation_nested_1 45.2810μs 9.9301μs 100.7040 KOps/s 101.1327 KOps/s $\color{#d91a1a}-0.42\%$
test_creation_nested_2 35.4200μs 12.3403μs 81.0351 KOps/s 81.8396 KOps/s $\color{#d91a1a}-0.98\%$
test_clone 82.9310μs 13.6118μs 73.4656 KOps/s 74.8148 KOps/s $\color{#d91a1a}-1.80\%$
test_getitem[int] 43.8510μs 10.5118μs 95.1312 KOps/s 95.3816 KOps/s $\color{#d91a1a}-0.26\%$
test_getitem[slice_int] 47.1710μs 20.1582μs 49.6076 KOps/s 48.8041 KOps/s $\color{#35bf28}+1.65\%$
test_getitem[range] 0.1515ms 35.3505μs 28.2881 KOps/s 28.5866 KOps/s $\color{#d91a1a}-1.04\%$
test_getitem[tuple] 62.2710μs 18.2131μs 54.9056 KOps/s 54.4739 KOps/s $\color{#35bf28}+0.79\%$
test_getitem[list] 0.1730ms 31.9813μs 31.2683 KOps/s 32.0131 KOps/s $\color{#d91a1a}-2.33\%$
test_setitem_dim[int] 0.1533ms 25.7352μs 38.8572 KOps/s 36.9770 KOps/s $\textbf{\color{#35bf28}+5.08\%}$
test_setitem_dim[slice_int] 71.0710μs 45.6561μs 21.9029 KOps/s 21.6948 KOps/s $\color{#35bf28}+0.96\%$
test_setitem_dim[range] 83.8910μs 58.4405μs 17.1114 KOps/s 16.4174 KOps/s $\color{#35bf28}+4.23\%$
test_setitem_dim[tuple] 65.2310μs 40.0421μs 24.9737 KOps/s 24.2618 KOps/s $\color{#35bf28}+2.93\%$
test_setitem 65.2210μs 18.1367μs 55.1369 KOps/s 54.9165 KOps/s $\color{#35bf28}+0.40\%$
test_set 68.1010μs 17.7229μs 56.4242 KOps/s 56.5483 KOps/s $\color{#d91a1a}-0.22\%$
test_set_shared 2.8645ms 0.1019ms 9.8172 KOps/s 9.2530 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_update 77.9120μs 20.3988μs 49.0225 KOps/s 50.1076 KOps/s $\color{#d91a1a}-2.17\%$
test_update_nested 72.7020μs 26.7708μs 37.3542 KOps/s 37.5501 KOps/s $\color{#d91a1a}-0.52\%$
test_set_nested 73.0510μs 18.6605μs 53.5891 KOps/s 53.3405 KOps/s $\color{#35bf28}+0.47\%$
test_set_nested_new 0.1507ms 21.9829μs 45.4899 KOps/s 44.8353 KOps/s $\color{#35bf28}+1.46\%$
test_select 0.1549ms 34.6094μs 28.8939 KOps/s 29.0932 KOps/s $\color{#d91a1a}-0.68\%$
test_select_nested 83.5110μs 53.4305μs 18.7159 KOps/s 18.8529 KOps/s $\color{#d91a1a}-0.73\%$
test_exclude_nested 0.1523ms 0.1080ms 9.2630 KOps/s 9.4041 KOps/s $\color{#d91a1a}-1.50\%$
test_empty[True] 0.3979ms 0.3189ms 3.1354 KOps/s 3.1203 KOps/s $\color{#35bf28}+0.48\%$
test_empty[False] 2.8971μs 0.8534μs 1.1718 MOps/s 1.1653 MOps/s $\color{#35bf28}+0.56\%$
test_to 70.5010μs 50.5188μs 19.7946 KOps/s 19.2431 KOps/s $\color{#35bf28}+2.87\%$
test_to_nonblocking 0.1775ms 32.0314μs 31.2194 KOps/s 30.5073 KOps/s $\color{#35bf28}+2.33\%$
test_unbind_speed 0.3539ms 0.2594ms 3.8550 KOps/s 3.8277 KOps/s $\color{#35bf28}+0.71\%$
test_unbind_speed_stack0 88.8827ms 3.7073ms 269.7395 Ops/s 269.0517 Ops/s $\color{#35bf28}+0.26\%$
test_unbind_speed_stack1 20.9200μs 1.7993μs 555.7570 KOps/s 559.3722 KOps/s $\color{#d91a1a}-0.65\%$
test_split 2.1643ms 1.5258ms 655.3886 Ops/s 586.3826 Ops/s $\textbf{\color{#35bf28}+11.77\%}$
test_chunk 82.4656ms 1.6555ms 604.0583 Ops/s 611.9235 Ops/s $\color{#d91a1a}-1.29\%$
test_creation[device0] 0.2182ms 70.0396μs 14.2776 KOps/s 14.3688 KOps/s $\color{#d91a1a}-0.63\%$
test_creation_from_tensor 0.2318ms 53.9095μs 18.5496 KOps/s 18.1711 KOps/s $\color{#35bf28}+2.08\%$
test_add_one[memmap_tensor0] 0.2323ms 6.2598μs 159.7491 KOps/s 159.9731 KOps/s $\color{#d91a1a}-0.14\%$
test_contiguous[memmap_tensor0] 11.9200μs 0.6439μs 1.5530 MOps/s 1.6080 MOps/s $\color{#d91a1a}-3.42\%$
test_stack[memmap_tensor0] 52.8810μs 4.3416μs 230.3280 KOps/s 233.0558 KOps/s $\color{#d91a1a}-1.17\%$
test_memmaptd_index 1.0615ms 0.2590ms 3.8616 KOps/s 3.8381 KOps/s $\color{#35bf28}+0.61\%$
test_memmaptd_index_astensor 0.5732ms 0.3151ms 3.1736 KOps/s 3.1652 KOps/s $\color{#35bf28}+0.27\%$
test_memmaptd_index_op 0.9178ms 0.5826ms 1.7165 KOps/s 1.7058 KOps/s $\color{#35bf28}+0.62\%$
test_serialize_model 0.1715s 97.6774ms 10.2378 Ops/s 9.7012 Ops/s $\textbf{\color{#35bf28}+5.53\%}$
test_serialize_model_pickle 1.3491s 1.2358s 0.8092 Ops/s 0.8084 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_weights 0.1731s 96.2534ms 10.3892 Ops/s 10.1526 Ops/s $\color{#35bf28}+2.33\%$
test_serialize_weights_returnearly 0.2796s 82.8811ms 12.0655 Ops/s 14.1696 Ops/s $\textbf{\color{#d91a1a}-14.85\%}$
test_serialize_weights_pickle 1.3551s 1.2369s 0.8084 Ops/s 0.8025 Ops/s $\color{#35bf28}+0.74\%$
test_reshape_pytree 57.3210μs 24.3702μs 41.0337 KOps/s 40.9014 KOps/s $\color{#35bf28}+0.32\%$
test_reshape_td 0.1727ms 28.6792μs 34.8685 KOps/s 35.2958 KOps/s $\color{#d91a1a}-1.21\%$
test_view_pytree 0.1465ms 24.0822μs 41.5245 KOps/s 42.2855 KOps/s $\color{#d91a1a}-1.80\%$
test_view_td 0.5082ms 6.7057μs 149.1276 KOps/s 233.0941 KOps/s $\textbf{\color{#d91a1a}-36.02\%}$
test_unbind_pytree 68.5110μs 29.7736μs 33.5868 KOps/s 33.5378 KOps/s $\color{#35bf28}+0.15\%$
test_unbind_td 0.3155ms 39.6207μs 25.2393 KOps/s 24.1321 KOps/s $\color{#35bf28}+4.59\%$
test_split_pytree 62.9910μs 27.6777μs 36.1302 KOps/s 35.5294 KOps/s $\color{#35bf28}+1.69\%$
test_split_td 0.1817ms 38.2237μs 26.1618 KOps/s 26.2763 KOps/s $\color{#d91a1a}-0.44\%$
test_add_pytree 0.1482ms 34.4325μs 29.0423 KOps/s 29.3697 KOps/s $\color{#d91a1a}-1.11\%$
test_add_td 84.0220μs 44.6760μs 22.3834 KOps/s 21.1279 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_distributed 0.2094ms 69.1120μs 14.4693 KOps/s 13.5714 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_tdmodule 37.6610μs 17.4688μs 57.2449 KOps/s 57.1544 KOps/s $\color{#35bf28}+0.16\%$
test_tdmodule_dispatch 0.2096ms 36.1061μs 27.6961 KOps/s 27.6691 KOps/s $\color{#35bf28}+0.10\%$
test_tdseq 0.1356ms 20.3192μs 49.2145 KOps/s 48.9279 KOps/s $\color{#35bf28}+0.59\%$
test_tdseq_dispatch 67.4720μs 38.4089μs 26.0356 KOps/s 25.7441 KOps/s $\color{#35bf28}+1.13\%$
test_instantiation_functorch 1.8052ms 1.6502ms 605.9815 Ops/s 608.6008 Ops/s $\color{#d91a1a}-0.43\%$
test_instantiation_td 1.6708ms 1.1430ms 874.9251 Ops/s 868.7172 Ops/s $\color{#35bf28}+0.71\%$
test_exec_functorch 0.2136ms 0.1561ms 6.4075 KOps/s 6.4281 KOps/s $\color{#d91a1a}-0.32\%$
test_exec_functional_call 0.2268ms 0.1511ms 6.6188 KOps/s 6.5880 KOps/s $\color{#35bf28}+0.47\%$
test_exec_td 0.2384ms 0.1455ms 6.8741 KOps/s 7.0015 KOps/s $\color{#d91a1a}-1.82\%$
test_exec_td_decorator 0.1149s 0.2096ms 4.7711 KOps/s 5.4386 KOps/s $\textbf{\color{#d91a1a}-12.27\%}$
test_vmap_mlp_speed[True-True] 1.2841ms 0.9987ms 1.0013 KOps/s 963.2093 Ops/s $\color{#35bf28}+3.96\%$
test_vmap_mlp_speed[True-False] 0.7155ms 0.5735ms 1.7436 KOps/s 1.6892 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed[False-True] 1.0473ms 0.9103ms 1.0985 KOps/s 1.0557 KOps/s $\color{#35bf28}+4.05\%$
test_vmap_mlp_speed[False-False] 0.6560ms 0.5019ms 1.9926 KOps/s 1.9632 KOps/s $\color{#35bf28}+1.50\%$
test_vmap_mlp_speed_decorator[True-True] 2.8769ms 2.2657ms 441.3618 Ops/s 436.2224 Ops/s $\color{#35bf28}+1.18\%$
test_vmap_mlp_speed_decorator[True-False] 1.0580ms 0.6166ms 1.6218 KOps/s 1.5884 KOps/s $\color{#35bf28}+2.10\%$
test_vmap_mlp_speed_decorator[False-True] 2.3326ms 1.8946ms 527.8186 Ops/s 527.0352 Ops/s $\color{#35bf28}+0.15\%$
test_vmap_mlp_speed_decorator[False-False] 0.9009ms 0.5203ms 1.9220 KOps/s 1.9158 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_transformer_speed[True-True] 11.9208ms 11.7518ms 85.0937 Ops/s 85.0095 Ops/s $\color{#35bf28}+0.10\%$
test_vmap_transformer_speed[True-False] 7.7924ms 7.6843ms 130.1347 Ops/s 130.0908 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_transformer_speed[False-True] 11.8732ms 11.6624ms 85.7458 Ops/s 85.2202 Ops/s $\color{#35bf28}+0.62\%$
test_vmap_transformer_speed[False-False] 7.7854ms 7.5950ms 131.6654 Ops/s 130.8730 Ops/s $\color{#35bf28}+0.61\%$
test_vmap_transformer_speed_decorator[True-True] 71.5844ms 70.8657ms 14.1112 Ops/s 12.8763 Ops/s $\textbf{\color{#35bf28}+9.59\%}$
test_vmap_transformer_speed_decorator[True-False] 20.0229ms 18.4275ms 54.2667 Ops/s 53.7577 Ops/s $\color{#35bf28}+0.95\%$
test_vmap_transformer_speed_decorator[False-True] 65.2971ms 63.7827ms 15.6782 Ops/s 15.5960 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_transformer_speed_decorator[False-False] 19.6511ms 18.0549ms 55.3868 Ops/s 50.0764 Ops/s $\textbf{\color{#35bf28}+10.60\%}$

@vmoens vmoens merged commit b1d814b into main Jan 25, 2024
44 of 45 checks passed
@vmoens vmoens deleted the lazy-legacy-false branch January 25, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants