Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster __init__ #576

Merged
merged 6 commits into from
Nov 24, 2023
Merged

[Performance] Faster __init__ #576

merged 6 commits into from
Nov 24, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 24, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2023
Copy link

github-actions bot commented Nov 24, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 27.7520μs 16.2458μs 61.5545 KOps/s 62.5788 KOps/s $\color{#d91a1a}-1.64\%$
test_plain_set_stack_nested 0.1848ms 0.1436ms 6.9655 KOps/s 6.8446 KOps/s $\color{#35bf28}+1.77\%$
test_plain_set_nested_inplace 45.3550μs 19.4324μs 51.4605 KOps/s 52.1931 KOps/s $\color{#d91a1a}-1.40\%$
test_plain_set_stack_nested_inplace 0.3943ms 0.1740ms 5.7460 KOps/s 5.4785 KOps/s $\color{#35bf28}+4.88\%$
test_items 18.8560μs 2.4049μs 415.8092 KOps/s 414.6935 KOps/s $\color{#35bf28}+0.27\%$
test_items_nested 0.3481ms 0.2653ms 3.7686 KOps/s 3.7741 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_locked 1.3824ms 0.2696ms 3.7091 KOps/s 3.7807 KOps/s $\color{#d91a1a}-1.90\%$
test_items_nested_leaf 1.0040ms 0.1631ms 6.1315 KOps/s 6.0925 KOps/s $\color{#35bf28}+0.64\%$
test_items_stack_nested 1.5997ms 1.4748ms 678.0480 Ops/s 672.2472 Ops/s $\color{#35bf28}+0.86\%$
test_items_stack_nested_leaf 1.7665ms 1.3485ms 741.5715 Ops/s 723.4473 Ops/s $\color{#35bf28}+2.51\%$
test_items_stack_nested_locked 1.7869ms 0.7610ms 1.3140 KOps/s 1.3149 KOps/s $\color{#d91a1a}-0.06\%$
test_keys 30.6970μs 3.8994μs 256.4518 KOps/s 260.5925 KOps/s $\color{#d91a1a}-1.59\%$
test_keys_nested 0.5954ms 0.1421ms 7.0349 KOps/s 6.5799 KOps/s $\textbf{\color{#35bf28}+6.92\%}$
test_keys_nested_locked 0.2964ms 0.1415ms 7.0654 KOps/s 7.0693 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_nested_leaf 0.3833ms 0.1479ms 6.7609 KOps/s 7.0076 KOps/s $\color{#d91a1a}-3.52\%$
test_keys_stack_nested 2.0087ms 1.4114ms 708.5112 Ops/s 711.7721 Ops/s $\color{#d91a1a}-0.46\%$
test_keys_stack_nested_leaf 1.7851ms 1.4080ms 710.2409 Ops/s 708.3654 Ops/s $\color{#35bf28}+0.26\%$
test_keys_stack_nested_locked 1.2292ms 0.6855ms 1.4589 KOps/s 1.4595 KOps/s $\color{#d91a1a}-0.04\%$
test_values 33.4075μs 1.1874μs 842.2078 KOps/s 889.4031 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_values_nested 96.6610μs 49.3832μs 20.2498 KOps/s 20.1497 KOps/s $\color{#35bf28}+0.50\%$
test_values_nested_locked 0.3318ms 51.5222μs 19.4091 KOps/s 20.0422 KOps/s $\color{#d91a1a}-3.16\%$
test_values_nested_leaf 67.2660μs 44.4710μs 22.4866 KOps/s 22.4800 KOps/s $\color{#35bf28}+0.03\%$
test_values_stack_nested 1.8409ms 1.1855ms 843.5166 Ops/s 837.0549 Ops/s $\color{#35bf28}+0.77\%$
test_values_stack_nested_leaf 1.9137ms 1.1842ms 844.4246 Ops/s 837.3378 Ops/s $\color{#35bf28}+0.85\%$
test_values_stack_nested_locked 0.9341ms 0.5095ms 1.9626 KOps/s 1.9479 KOps/s $\color{#35bf28}+0.76\%$
test_membership 16.8110μs 1.3583μs 736.2321 KOps/s 740.2884 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested 22.1920μs 2.7834μs 359.2725 KOps/s 362.1514 KOps/s $\color{#d91a1a}-0.79\%$
test_membership_nested_leaf 0.1351ms 2.8111μs 355.7349 KOps/s 360.2995 KOps/s $\color{#d91a1a}-1.27\%$
test_membership_stacked_nested 52.4090μs 11.6193μs 86.0635 KOps/s 82.8484 KOps/s $\color{#35bf28}+3.88\%$
test_membership_stacked_nested_leaf 0.2140ms 12.3572μs 80.9245 KOps/s 80.2100 KOps/s $\color{#35bf28}+0.89\%$
test_membership_nested_last 34.8250μs 5.8962μs 169.6018 KOps/s 170.7603 KOps/s $\color{#d91a1a}-0.68\%$
test_membership_nested_leaf_last 38.0610μs 5.8948μs 169.6413 KOps/s 170.0306 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_stacked_nested_last 0.3416ms 0.1667ms 5.9999 KOps/s 5.9220 KOps/s $\color{#35bf28}+1.32\%$
test_membership_stacked_nested_leaf_last 45.0240μs 13.7228μs 72.8715 KOps/s 72.1076 KOps/s $\color{#35bf28}+1.06\%$
test_nested_getleaf 64.0600μs 10.7466μs 93.0524 KOps/s 93.9317 KOps/s $\color{#d91a1a}-0.94\%$
test_nested_get 34.7150μs 10.2538μs 97.5244 KOps/s 99.7611 KOps/s $\color{#d91a1a}-2.24\%$
test_stacked_getleaf 1.2662ms 0.6348ms 1.5754 KOps/s 1.5680 KOps/s $\color{#35bf28}+0.47\%$
test_stacked_get 1.3376ms 0.6071ms 1.6471 KOps/s 1.6436 KOps/s $\color{#35bf28}+0.22\%$
test_nested_getitemleaf 35.2460μs 10.6422μs 93.9658 KOps/s 95.0440 KOps/s $\color{#d91a1a}-1.13\%$
test_nested_getitem 32.8520μs 10.2238μs 97.8107 KOps/s 100.1010 KOps/s $\color{#d91a1a}-2.29\%$
test_stacked_getitemleaf 0.7521ms 0.6367ms 1.5705 KOps/s 1.5706 KOps/s $-0.00\%$
test_stacked_getitem 1.0664ms 0.6078ms 1.6453 KOps/s 1.6439 KOps/s $\color{#35bf28}+0.08\%$
test_lock_nested 74.5251ms 0.6541ms 1.5288 KOps/s 2.0493 KOps/s $\textbf{\color{#d91a1a}-25.40\%}$
test_lock_stack_nested 10.1125ms 5.2267ms 191.3243 Ops/s 121.1546 Ops/s $\textbf{\color{#35bf28}+57.92\%}$
test_unlock_nested 0.9741ms 0.4444ms 2.2501 KOps/s 1.9886 KOps/s $\textbf{\color{#35bf28}+13.15\%}$
test_unlock_stack_nested 78.6142ms 7.3497ms 136.0592 Ops/s 207.8619 Ops/s $\textbf{\color{#d91a1a}-34.54\%}$
test_flatten_speed 0.5721ms 0.2694ms 3.7117 KOps/s 3.7070 KOps/s $\color{#35bf28}+0.13\%$
test_unflatten_speed 0.7851ms 0.4644ms 2.1534 KOps/s 2.1495 KOps/s $\color{#35bf28}+0.18\%$
test_common_ops 1.4001ms 0.6835ms 1.4630 KOps/s 1.5145 KOps/s $\color{#d91a1a}-3.40\%$
test_creation 32.2300μs 2.4873μs 402.0502 KOps/s 415.1656 KOps/s $\color{#d91a1a}-3.16\%$
test_creation_empty 29.8660μs 8.6523μs 115.5763 KOps/s 124.2870 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_creation_nested_1 51.3260μs 11.9300μs 83.8223 KOps/s 87.1964 KOps/s $\color{#d91a1a}-3.87\%$
test_creation_nested_2 54.4820μs 15.6936μs 63.7200 KOps/s 67.7525 KOps/s $\textbf{\color{#d91a1a}-5.95\%}$
test_clone 0.1601ms 13.1904μs 75.8127 KOps/s 77.4640 KOps/s $\color{#d91a1a}-2.13\%$
test_getitem[int] 39.6040μs 13.0808μs 76.4480 KOps/s 78.2743 KOps/s $\color{#d91a1a}-2.33\%$
test_getitem[slice_int] 0.1361ms 24.9276μs 40.1162 KOps/s 40.9009 KOps/s $\color{#d91a1a}-1.92\%$
test_getitem[range] 90.3800μs 44.1557μs 22.6472 KOps/s 22.6648 KOps/s $\color{#d91a1a}-0.08\%$
test_getitem[tuple] 86.0250μs 20.2078μs 49.4858 KOps/s 50.3881 KOps/s $\color{#d91a1a}-1.79\%$
test_getitem[list] 0.2348ms 41.5139μs 24.0883 KOps/s 25.7723 KOps/s $\textbf{\color{#d91a1a}-6.53\%}$
test_setitem_dim[int] 52.2980μs 29.2742μs 34.1598 KOps/s 34.3747 KOps/s $\color{#d91a1a}-0.63\%$
test_setitem_dim[slice_int] 89.5680μs 54.1818μs 18.4564 KOps/s 18.4422 KOps/s $\color{#35bf28}+0.08\%$
test_setitem_dim[range] 0.1091ms 75.6596μs 13.2171 KOps/s 13.4658 KOps/s $\color{#d91a1a}-1.85\%$
test_setitem_dim[tuple] 0.1016ms 44.2084μs 22.6201 KOps/s 23.4924 KOps/s $\color{#d91a1a}-3.71\%$
test_setitem 0.1378ms 18.4501μs 54.2003 KOps/s 56.1873 KOps/s $\color{#d91a1a}-3.54\%$
test_set 0.1762ms 18.0075μs 55.5323 KOps/s 59.0231 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_set_shared 3.0737ms 0.1444ms 6.9242 KOps/s 7.1269 KOps/s $\color{#d91a1a}-2.84\%$
test_update 0.1105ms 19.6894μs 50.7887 KOps/s 54.2388 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_update_nested 0.1712ms 26.9812μs 37.0629 KOps/s 36.6188 KOps/s $\color{#35bf28}+1.21\%$
test_set_nested 0.1491ms 19.7660μs 50.5919 KOps/s 52.7011 KOps/s $\color{#d91a1a}-4.00\%$
test_set_nested_new 0.1363ms 24.9007μs 40.1595 KOps/s 41.3988 KOps/s $\color{#d91a1a}-2.99\%$
test_select 0.1300ms 51.6239μs 19.3709 KOps/s 20.5125 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_unbind_speed 0.7463ms 0.3676ms 2.7201 KOps/s 2.7043 KOps/s $\color{#35bf28}+0.58\%$
test_unbind_speed_stack0 71.9756ms 4.5562ms 219.4795 Ops/s 182.9332 Ops/s $\textbf{\color{#35bf28}+19.98\%}$
test_unbind_speed_stack1 2.5392μs 0.6346μs 1.5757 MOps/s 1.5536 MOps/s $\color{#35bf28}+1.42\%$
test_split 63.6809ms 1.7865ms 559.7444 Ops/s 602.9797 Ops/s $\textbf{\color{#d91a1a}-7.17\%}$
test_chunk 63.2697ms 1.7439ms 573.4436 Ops/s 609.9920 Ops/s $\textbf{\color{#d91a1a}-5.99\%}$
test_creation[device0] 3.3918ms 0.3043ms 3.2861 KOps/s 3.2336 KOps/s $\color{#35bf28}+1.62\%$
test_creation_from_tensor 2.6679ms 0.3346ms 2.9882 KOps/s 2.8869 KOps/s $\color{#35bf28}+3.51\%$
test_add_one[memmap_tensor0] 82.2450μs 24.9082μs 40.1475 KOps/s 40.0588 KOps/s $\color{#35bf28}+0.22\%$
test_contiguous[memmap_tensor0] 30.0570μs 5.7966μs 172.5155 KOps/s 175.5795 KOps/s $\color{#d91a1a}-1.75\%$
test_stack[memmap_tensor0] 0.1220ms 19.6190μs 50.9710 KOps/s 53.5790 KOps/s $\color{#d91a1a}-4.87\%$
test_memmaptd_index 0.7641ms 0.3975ms 2.5157 KOps/s 2.5072 KOps/s $\color{#35bf28}+0.34\%$
test_memmaptd_index_astensor 0.5957ms 0.4582ms 2.1825 KOps/s 2.1730 KOps/s $\color{#35bf28}+0.43\%$
test_memmaptd_index_op 1.3118ms 0.7047ms 1.4191 KOps/s 1.4184 KOps/s $\color{#35bf28}+0.05\%$
test_reshape_pytree 84.4280μs 22.9536μs 43.5662 KOps/s 42.7644 KOps/s $\color{#35bf28}+1.87\%$
test_reshape_td 0.1365ms 31.2604μs 31.9893 KOps/s 31.7479 KOps/s $\color{#35bf28}+0.76\%$
test_view_pytree 0.4868ms 23.0631μs 43.3592 KOps/s 42.5491 KOps/s $\color{#35bf28}+1.90\%$
test_view_td 31.8300μs 4.9991μs 200.0362 KOps/s 208.0765 KOps/s $\color{#d91a1a}-3.86\%$
test_unbind_pytree 63.2390μs 26.6742μs 37.4894 KOps/s 37.3904 KOps/s $\color{#35bf28}+0.26\%$
test_unbind_td 0.1282ms 59.3548μs 16.8478 KOps/s 17.3361 KOps/s $\color{#d91a1a}-2.82\%$
test_split_pytree 57.8990μs 26.3568μs 37.9408 KOps/s 37.6941 KOps/s $\color{#35bf28}+0.65\%$
test_split_td 0.1012ms 45.8028μs 21.8327 KOps/s 21.5452 KOps/s $\color{#35bf28}+1.33\%$
test_add_pytree 0.1000ms 32.1701μs 31.0847 KOps/s 31.0532 KOps/s $\color{#35bf28}+0.10\%$
test_add_td 98.8250μs 44.3499μs 22.5480 KOps/s 22.2060 KOps/s $\color{#35bf28}+1.54\%$
test_distributed 19.4970μs 5.9635μs 167.6875 KOps/s 168.3150 KOps/s $\color{#d91a1a}-0.37\%$
test_tdmodule 0.8908ms 22.4675μs 44.5088 KOps/s 43.9878 KOps/s $\color{#35bf28}+1.18\%$
test_tdmodule_dispatch 0.2205ms 39.7878μs 25.1333 KOps/s 26.1277 KOps/s $\color{#d91a1a}-3.81\%$
test_tdseq 0.1463ms 24.3614μs 41.0486 KOps/s 42.4128 KOps/s $\color{#d91a1a}-3.22\%$
test_tdseq_dispatch 0.1409ms 43.8875μs 22.7855 KOps/s 23.9345 KOps/s $\color{#d91a1a}-4.80\%$
test_instantiation_functorch 2.0252ms 1.2922ms 773.8501 Ops/s 784.4766 Ops/s $\color{#d91a1a}-1.35\%$
test_instantiation_td 1.6295ms 1.0200ms 980.3782 Ops/s 1.0038 KOps/s $\color{#d91a1a}-2.33\%$
test_exec_functorch 0.2441ms 0.1583ms 6.3176 KOps/s 5.9378 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_exec_functional_call 0.3370ms 0.1474ms 6.7825 KOps/s 6.7606 KOps/s $\color{#35bf28}+0.32\%$
test_exec_td 0.2539ms 0.1408ms 7.1024 KOps/s 7.0019 KOps/s $\color{#35bf28}+1.44\%$
test_exec_td_decorator 0.9416ms 0.1784ms 5.6061 KOps/s 5.1160 KOps/s $\textbf{\color{#35bf28}+9.58\%}$
test_vmap_mlp_speed[True-True] 1.4479ms 0.9099ms 1.0990 KOps/s 1.1091 KOps/s $\color{#d91a1a}-0.91\%$
test_vmap_mlp_speed[True-False] 0.7671ms 0.4812ms 2.0781 KOps/s 2.1177 KOps/s $\color{#d91a1a}-1.87\%$
test_vmap_mlp_speed[False-True] 1.1420ms 0.7839ms 1.2756 KOps/s 1.2789 KOps/s $\color{#d91a1a}-0.26\%$
test_vmap_mlp_speed[False-False] 0.7990ms 0.3978ms 2.5139 KOps/s 2.5727 KOps/s $\color{#d91a1a}-2.29\%$
test_vmap_mlp_speed_decorator[True-True] 2.6138ms 1.7838ms 560.5947 Ops/s 554.8152 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_mlp_speed_decorator[True-False] 1.1221ms 0.5216ms 1.9172 KOps/s 1.9107 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed_decorator[False-True] 2.0247ms 1.4786ms 676.3009 Ops/s 662.3262 Ops/s $\color{#35bf28}+2.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.8460ms 0.4012ms 2.4923 KOps/s 2.4462 KOps/s $\color{#35bf28}+1.89\%$

Copy link

github-actions bot commented Nov 24, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.6879ms 12.7184μs 78.6263 KOps/s 79.4180 KOps/s $\color{#d91a1a}-1.00\%$
test_plain_set_stack_nested 0.1724ms 0.1137ms 8.7958 KOps/s 8.4791 KOps/s $\color{#35bf28}+3.74\%$
test_plain_set_nested_inplace 29.0010μs 15.1387μs 66.0559 KOps/s 66.3733 KOps/s $\color{#d91a1a}-0.48\%$
test_plain_set_stack_nested_inplace 0.1738ms 0.1391ms 7.1885 KOps/s 7.1587 KOps/s $\color{#35bf28}+0.42\%$
test_items 24.5500μs 4.6473μs 215.1772 KOps/s 214.3780 KOps/s $\color{#35bf28}+0.37\%$
test_items_nested 0.3572ms 0.3370ms 2.9672 KOps/s 2.9648 KOps/s $\color{#35bf28}+0.08\%$
test_items_nested_locked 0.3771ms 0.3386ms 2.9536 KOps/s 2.9742 KOps/s $\color{#d91a1a}-0.69\%$
test_items_nested_leaf 0.2228ms 0.1969ms 5.0790 KOps/s 5.0372 KOps/s $\color{#35bf28}+0.83\%$
test_items_stack_nested 1.5117ms 1.4582ms 685.7841 Ops/s 677.8419 Ops/s $\color{#35bf28}+1.17\%$
test_items_stack_nested_leaf 1.3326ms 1.2868ms 777.1007 Ops/s 768.9885 Ops/s $\color{#35bf28}+1.05\%$
test_items_stack_nested_locked 1.8376ms 0.8140ms 1.2284 KOps/s 1.2508 KOps/s $\color{#d91a1a}-1.79\%$
test_keys 34.9300μs 4.5454μs 220.0046 KOps/s 219.5543 KOps/s $\color{#35bf28}+0.21\%$
test_keys_nested 0.5242ms 89.5896μs 11.1620 KOps/s 11.1007 KOps/s $\color{#35bf28}+0.55\%$
test_keys_nested_locked 0.1110ms 89.4184μs 11.1834 KOps/s 11.1126 KOps/s $\color{#35bf28}+0.64\%$
test_keys_nested_leaf 43.1561ms 86.5948μs 11.5480 KOps/s 12.3092 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_keys_stack_nested 1.3226ms 1.2662ms 789.7845 Ops/s 781.7098 Ops/s $\color{#35bf28}+1.03\%$
test_keys_stack_nested_leaf 1.2990ms 1.2593ms 794.1049 Ops/s 791.3633 Ops/s $\color{#35bf28}+0.35\%$
test_keys_stack_nested_locked 0.6622ms 0.6136ms 1.6297 KOps/s 1.6790 KOps/s $\color{#d91a1a}-2.94\%$
test_values 9.3837μs 1.8690μs 535.0328 KOps/s 515.3924 KOps/s $\color{#35bf28}+3.81\%$
test_values_nested 67.7520μs 42.5153μs 23.5210 KOps/s 23.4008 KOps/s $\color{#35bf28}+0.51\%$
test_values_nested_locked 73.1310μs 45.0099μs 22.2173 KOps/s 23.3180 KOps/s $\color{#d91a1a}-4.72\%$
test_values_nested_leaf 61.7320μs 37.0103μs 27.0195 KOps/s 26.8366 KOps/s $\color{#35bf28}+0.68\%$
test_values_stack_nested 1.1693ms 1.1239ms 889.7382 Ops/s 897.4386 Ops/s $\color{#d91a1a}-0.86\%$
test_values_stack_nested_leaf 1.3023ms 1.1131ms 898.4022 Ops/s 910.1991 Ops/s $\color{#d91a1a}-1.30\%$
test_values_stack_nested_locked 0.5311ms 0.4887ms 2.0461 KOps/s 2.0994 KOps/s $\color{#d91a1a}-2.54\%$
test_membership 6.6060μs 0.9188μs 1.0884 MOps/s 1.0515 MOps/s $\color{#35bf28}+3.51\%$
test_membership_nested 17.4510μs 2.2498μs 444.4746 KOps/s 450.5587 KOps/s $\color{#d91a1a}-1.35\%$
test_membership_nested_leaf 16.4505μs 2.1305μs 469.3814 KOps/s 468.5226 KOps/s $\color{#35bf28}+0.18\%$
test_membership_stacked_nested 35.6700μs 11.1026μs 90.0691 KOps/s 92.5335 KOps/s $\color{#d91a1a}-2.66\%$
test_membership_stacked_nested_leaf 55.5800μs 10.9957μs 90.9446 KOps/s 92.3534 KOps/s $\color{#d91a1a}-1.53\%$
test_membership_nested_last 17.6000μs 4.6247μs 216.2302 KOps/s 217.4369 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested_leaf_last 21.5310μs 4.6345μs 215.7719 KOps/s 216.2116 KOps/s $\color{#d91a1a}-0.20\%$
test_membership_stacked_nested_last 0.1848ms 0.1340ms 7.4604 KOps/s 7.4725 KOps/s $\color{#d91a1a}-0.16\%$
test_membership_stacked_nested_leaf_last 26.1510μs 12.9977μs 76.9369 KOps/s 77.6306 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_getleaf 33.5210μs 8.4495μs 118.3498 KOps/s 119.7695 KOps/s $\color{#d91a1a}-1.19\%$
test_nested_get 30.1210μs 7.9425μs 125.9045 KOps/s 125.6443 KOps/s $\color{#35bf28}+0.21\%$
test_stacked_getleaf 0.6401ms 0.5576ms 1.7933 KOps/s 1.7779 KOps/s $\color{#35bf28}+0.87\%$
test_stacked_get 0.6140ms 0.5289ms 1.8906 KOps/s 1.8570 KOps/s $\color{#35bf28}+1.81\%$
test_nested_getitemleaf 29.2110μs 8.4768μs 117.9697 KOps/s 118.2978 KOps/s $\color{#d91a1a}-0.28\%$
test_nested_getitem 29.6500μs 7.9774μs 125.3543 KOps/s 125.8414 KOps/s $\color{#d91a1a}-0.39\%$
test_stacked_getitemleaf 0.6326ms 0.5682ms 1.7599 KOps/s 1.7752 KOps/s $\color{#d91a1a}-0.86\%$
test_stacked_getitem 0.5684ms 0.5378ms 1.8595 KOps/s 1.8810 KOps/s $\color{#d91a1a}-1.15\%$
test_lock_nested 3.1552ms 0.5490ms 1.8214 KOps/s 2.2302 KOps/s $\textbf{\color{#d91a1a}-18.33\%}$
test_lock_stack_nested 80.3645ms 7.1062ms 140.7213 Ops/s 152.0633 Ops/s $\textbf{\color{#d91a1a}-7.46\%}$
test_unlock_nested 2.3505ms 0.4241ms 2.3580 KOps/s 2.0651 KOps/s $\textbf{\color{#35bf28}+14.18\%}$
test_unlock_stack_nested 66.1733ms 6.1468ms 162.6867 Ops/s 138.6734 Ops/s $\textbf{\color{#35bf28}+17.32\%}$
test_flatten_speed 0.2256ms 0.1867ms 5.3555 KOps/s 5.3728 KOps/s $\color{#d91a1a}-0.32\%$
test_unflatten_speed 0.3928ms 0.3637ms 2.7491 KOps/s 2.7551 KOps/s $\color{#d91a1a}-0.22\%$
test_common_ops 1.1094ms 0.5894ms 1.6965 KOps/s 1.7166 KOps/s $\color{#d91a1a}-1.17\%$
test_creation 31.6000μs 2.0941μs 477.5239 KOps/s 519.0920 KOps/s $\textbf{\color{#d91a1a}-8.01\%}$
test_creation_empty 37.1300μs 7.2860μs 137.2500 KOps/s 143.1736 KOps/s $\color{#d91a1a}-4.14\%$
test_creation_nested_1 29.7200μs 9.6585μs 103.5362 KOps/s 106.1102 KOps/s $\color{#d91a1a}-2.43\%$
test_creation_nested_2 41.8300μs 12.3710μs 80.8341 KOps/s 83.9276 KOps/s $\color{#d91a1a}-3.69\%$
test_clone 81.7420μs 14.1076μs 70.8840 KOps/s 72.2927 KOps/s $\color{#d91a1a}-1.95\%$
test_getitem[int] 29.7810μs 12.1551μs 82.2697 KOps/s 84.7239 KOps/s $\color{#d91a1a}-2.90\%$
test_getitem[slice_int] 70.4810μs 24.4006μs 40.9827 KOps/s 43.7938 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_getitem[range] 0.2341ms 40.2054μs 24.8723 KOps/s 27.7200 KOps/s $\textbf{\color{#d91a1a}-10.27\%}$
test_getitem[tuple] 42.6600μs 20.3169μs 49.2201 KOps/s 53.0606 KOps/s $\textbf{\color{#d91a1a}-7.24\%}$
test_getitem[list] 0.2654ms 34.1106μs 29.3164 KOps/s 29.2232 KOps/s $\color{#35bf28}+0.32\%$
test_setitem_dim[int] 43.5800μs 26.3220μs 37.9910 KOps/s 39.9268 KOps/s $\color{#d91a1a}-4.85\%$
test_setitem_dim[slice_int] 63.5100μs 44.5592μs 22.4421 KOps/s 22.5297 KOps/s $\color{#d91a1a}-0.39\%$
test_setitem_dim[range] 78.6420μs 59.9423μs 16.6827 KOps/s 16.3380 KOps/s $\color{#35bf28}+2.11\%$
test_setitem_dim[tuple] 55.2210μs 37.3510μs 26.7730 KOps/s 26.2396 KOps/s $\color{#35bf28}+2.03\%$
test_setitem 93.3520μs 17.7533μs 56.3275 KOps/s 56.6182 KOps/s $\color{#d91a1a}-0.51\%$
test_set 89.3610μs 17.1413μs 58.3388 KOps/s 58.9610 KOps/s $\color{#d91a1a}-1.06\%$
test_set_shared 2.9695ms 0.1009ms 9.9066 KOps/s 10.1046 KOps/s $\color{#d91a1a}-1.96\%$
test_update 77.8410μs 18.6330μs 53.6683 KOps/s 54.0508 KOps/s $\color{#d91a1a}-0.71\%$
test_update_nested 0.1070ms 25.4467μs 39.2978 KOps/s 40.5131 KOps/s $\color{#d91a1a}-3.00\%$
test_set_nested 91.1420μs 18.6137μs 53.7238 KOps/s 55.5278 KOps/s $\color{#d91a1a}-3.25\%$
test_set_nested_new 77.4110μs 22.7350μs 43.9851 KOps/s 45.5653 KOps/s $\color{#d91a1a}-3.47\%$
test_select 0.1099ms 45.9232μs 21.7755 KOps/s 23.1037 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_to 71.6510μs 51.4066μs 19.4528 KOps/s 18.8834 KOps/s $\color{#35bf28}+3.02\%$
test_to_nonblocking 67.0810μs 32.7775μs 30.5087 KOps/s 30.2559 KOps/s $\color{#35bf28}+0.84\%$
test_unbind_speed 0.3917ms 0.3556ms 2.8124 KOps/s 2.9233 KOps/s $\color{#d91a1a}-3.79\%$
test_unbind_speed_stack0 62.0583ms 4.2923ms 232.9766 Ops/s 197.8716 Ops/s $\textbf{\color{#35bf28}+17.74\%}$
test_unbind_speed_stack1 1.9365μs 0.5209μs 1.9198 MOps/s 1.8858 MOps/s $\color{#35bf28}+1.80\%$
test_split 52.9582ms 1.7692ms 565.2403 Ops/s 564.5365 Ops/s $\color{#35bf28}+0.12\%$
test_chunk 52.7456ms 1.7466ms 572.5351 Ops/s 573.1644 Ops/s $\color{#d91a1a}-0.11\%$
test_creation[device0] 0.5319ms 0.3078ms 3.2485 KOps/s 3.2576 KOps/s $\color{#d91a1a}-0.28\%$
test_creation[device1] 0.8392ms 0.3120ms 3.2049 KOps/s 3.2103 KOps/s $\color{#d91a1a}-0.17\%$
test_creation_from_tensor 56.6480ms 0.3636ms 2.7504 KOps/s 2.9731 KOps/s $\textbf{\color{#d91a1a}-7.49\%}$
test_add_one[memmap_tensor0] 61.1310μs 22.2941μs 44.8550 KOps/s 42.8934 KOps/s $\color{#35bf28}+4.57\%$
test_add_one[memmap_tensor1] 0.2041ms 69.9699μs 14.2919 KOps/s 13.8670 KOps/s $\color{#35bf28}+3.06\%$
test_contiguous[memmap_tensor0] 36.4810μs 5.6948μs 175.5985 KOps/s 174.8387 KOps/s $\color{#35bf28}+0.43\%$
test_contiguous[memmap_tensor1] 47.5110μs 20.8428μs 47.9782 KOps/s 45.4825 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_stack[memmap_tensor0] 36.9020μs 18.7849μs 53.2344 KOps/s 53.3996 KOps/s $\color{#d91a1a}-0.31\%$
test_stack[memmap_tensor1] 0.1533ms 70.9613μs 14.0922 KOps/s 14.0654 KOps/s $\color{#35bf28}+0.19\%$
test_memmaptd_index 0.4603ms 0.4115ms 2.4303 KOps/s 2.3883 KOps/s $\color{#35bf28}+1.76\%$
test_memmaptd_index_astensor 0.5258ms 0.4689ms 2.1326 KOps/s 2.0888 KOps/s $\color{#35bf28}+2.10\%$
test_memmaptd_index_op 0.7801ms 0.7146ms 1.3995 KOps/s 1.3784 KOps/s $\color{#35bf28}+1.53\%$
test_reshape_pytree 36.5010μs 20.5431μs 48.6782 KOps/s 48.6298 KOps/s $\color{#35bf28}+0.10\%$
test_reshape_td 59.4000μs 29.2373μs 34.2028 KOps/s 34.9743 KOps/s $\color{#d91a1a}-2.21\%$
test_view_pytree 45.7310μs 20.4657μs 48.8622 KOps/s 49.0798 KOps/s $\color{#d91a1a}-0.44\%$
test_view_td 19.2400μs 4.0409μs 247.4708 KOps/s 249.1835 KOps/s $\color{#d91a1a}-0.69\%$
test_unbind_pytree 52.3620μs 25.2214μs 39.6488 KOps/s 39.5259 KOps/s $\color{#35bf28}+0.31\%$
test_unbind_td 80.6910μs 55.1321μs 18.1383 KOps/s 18.5430 KOps/s $\color{#d91a1a}-2.18\%$
test_split_pytree 38.9420μs 23.2892μs 42.9384 KOps/s 42.4578 KOps/s $\color{#35bf28}+1.13\%$
test_split_td 70.5220μs 42.9314μs 23.2930 KOps/s 23.9377 KOps/s $\color{#d91a1a}-2.69\%$
test_add_pytree 77.1310μs 30.4831μs 32.8051 KOps/s 32.9500 KOps/s $\color{#d91a1a}-0.44\%$
test_add_td 74.7710μs 41.5073μs 24.0922 KOps/s 24.1054 KOps/s $\color{#d91a1a}-0.06\%$
test_distributed 18.6200μs 5.4551μs 183.3132 KOps/s 187.9182 KOps/s $\color{#d91a1a}-2.45\%$
test_tdmodule 90.5620μs 16.7608μs 59.6631 KOps/s 60.3500 KOps/s $\color{#d91a1a}-1.14\%$
test_tdmodule_dispatch 0.2176ms 33.0225μs 30.2824 KOps/s 31.2168 KOps/s $\color{#d91a1a}-2.99\%$
test_tdseq 42.3400μs 20.0394μs 49.9018 KOps/s 50.1430 KOps/s $\color{#d91a1a}-0.48\%$
test_tdseq_dispatch 55.5800μs 36.4281μs 27.4514 KOps/s 28.4315 KOps/s $\color{#d91a1a}-3.45\%$
test_instantiation_functorch 1.8756ms 1.6608ms 602.1100 Ops/s 599.2819 Ops/s $\color{#35bf28}+0.47\%$
test_instantiation_td 1.6378ms 1.1652ms 858.2399 Ops/s 870.4144 Ops/s $\color{#d91a1a}-1.40\%$
test_exec_functorch 0.1912ms 0.1509ms 6.6249 KOps/s 6.5954 KOps/s $\color{#35bf28}+0.45\%$
test_exec_functional_call 0.2091ms 0.1440ms 6.9458 KOps/s 6.8469 KOps/s $\color{#35bf28}+1.44\%$
test_exec_td 0.1788ms 0.1355ms 7.3810 KOps/s 7.2366 KOps/s $\color{#35bf28}+2.00\%$
test_exec_td_decorator 0.7950ms 0.1710ms 5.8471 KOps/s 5.5294 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_vmap_mlp_speed[True-True] 1.5317ms 1.0733ms 931.6941 Ops/s 955.4466 Ops/s $\color{#d91a1a}-2.49\%$
test_vmap_mlp_speed[True-False] 0.6488ms 0.5894ms 1.6965 KOps/s 1.6805 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed[False-True] 1.2178ms 0.9723ms 1.0285 KOps/s 1.0379 KOps/s $\color{#d91a1a}-0.91\%$
test_vmap_mlp_speed[False-False] 0.5734ms 0.5320ms 1.8795 KOps/s 1.8956 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[True-True] 2.7216ms 2.0202ms 494.9995 Ops/s 497.4890 Ops/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed_decorator[True-False] 1.1391ms 0.6414ms 1.5591 KOps/s 1.5421 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[False-True] 2.1416ms 1.7185ms 581.9097 Ops/s 577.0918 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_mlp_speed_decorator[False-False] 1.0445ms 0.5335ms 1.8743 KOps/s 1.8397 KOps/s $\color{#35bf28}+1.88\%$
test_vmap_transformer_speed[True-True] 12.6679ms 12.2732ms 81.4782 Ops/s 81.3134 Ops/s $\color{#35bf28}+0.20\%$
test_vmap_transformer_speed[True-False] 8.0139ms 7.8758ms 126.9720 Ops/s 125.1492 Ops/s $\color{#35bf28}+1.46\%$
test_vmap_transformer_speed[False-True] 12.3404ms 12.0005ms 83.3295 Ops/s 82.3915 Ops/s $\color{#35bf28}+1.14\%$
test_vmap_transformer_speed[False-False] 8.0894ms 7.8053ms 128.1183 Ops/s 126.1403 Ops/s $\color{#35bf28}+1.57\%$
test_vmap_transformer_speed_decorator[True-True] 63.4798ms 62.5220ms 15.9944 Ops/s 15.9309 Ops/s $\color{#35bf28}+0.40\%$
test_vmap_transformer_speed_decorator[True-False] 21.1333ms 18.9633ms 52.7335 Ops/s 48.1198 Ops/s $\textbf{\color{#35bf28}+9.59\%}$
test_vmap_transformer_speed_decorator[False-True] 0.1351s 62.0451ms 16.1173 Ops/s 17.4227 Ops/s $\textbf{\color{#d91a1a}-7.49\%}$
test_vmap_transformer_speed_decorator[False-False] 20.8002ms 18.5889ms 53.7956 Ops/s 48.7598 Ops/s $\textbf{\color{#35bf28}+10.33\%}$

@vmoens vmoens merged commit fb1b589 into main Nov 24, 2023
23 of 30 checks passed
@vmoens vmoens deleted the faster-init branch November 24, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants