Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Faster empty_like for MemoryMappedTensor #585

Merged
merged 2 commits into from
Nov 30, 2023
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 30, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 30, 2023
@vmoens vmoens added bug Something isn't working Performance labels Nov 30, 2023
@vmoens vmoens marked this pull request as ready for review November 30, 2023 10:12
@vmoens vmoens merged commit 795e39a into main Nov 30, 2023
3 checks passed
@vmoens vmoens deleted the fix_memmap_empty branch November 30, 2023 10:12
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.8500μs 15.7819μs 63.3636 KOps/s 63.0403 KOps/s $\color{#35bf28}+0.51\%$
test_plain_set_stack_nested 0.1793ms 0.1435ms 6.9673 KOps/s 6.9051 KOps/s $\color{#35bf28}+0.90\%$
test_plain_set_nested_inplace 43.5710μs 18.7853μs 53.2331 KOps/s 52.2175 KOps/s $\color{#35bf28}+1.94\%$
test_plain_set_stack_nested_inplace 0.3254ms 0.1720ms 5.8125 KOps/s 5.8058 KOps/s $\color{#35bf28}+0.12\%$
test_items 31.3580μs 2.4211μs 413.0283 KOps/s 402.7880 KOps/s $\color{#35bf28}+2.54\%$
test_items_nested 0.4411ms 0.2838ms 3.5242 KOps/s 3.6204 KOps/s $\color{#d91a1a}-2.66\%$
test_items_nested_locked 0.3233ms 0.2692ms 3.7150 KOps/s 3.5956 KOps/s $\color{#35bf28}+3.32\%$
test_items_nested_leaf 0.8253ms 0.1686ms 5.9317 KOps/s 5.8707 KOps/s $\color{#35bf28}+1.04\%$
test_items_stack_nested 1.6585ms 1.4835ms 674.0658 Ops/s 656.7914 Ops/s $\color{#35bf28}+2.63\%$
test_items_stack_nested_leaf 1.7671ms 1.3481ms 741.7949 Ops/s 714.9756 Ops/s $\color{#35bf28}+3.75\%$
test_items_stack_nested_locked 0.8474ms 0.7607ms 1.3146 KOps/s 1.2609 KOps/s $\color{#35bf28}+4.26\%$
test_keys 52.4780μs 3.8851μs 257.3912 KOps/s 259.3479 KOps/s $\color{#d91a1a}-0.75\%$
test_keys_nested 3.1677ms 0.1409ms 7.0960 KOps/s 6.7144 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_keys_nested_locked 0.1857ms 0.1390ms 7.1952 KOps/s 7.0608 KOps/s $\color{#35bf28}+1.90\%$
test_keys_nested_leaf 0.3809ms 0.1389ms 7.2000 KOps/s 6.9710 KOps/s $\color{#35bf28}+3.28\%$
test_keys_stack_nested 1.5095ms 1.4072ms 710.6479 Ops/s 695.0822 Ops/s $\color{#35bf28}+2.24\%$
test_keys_stack_nested_leaf 1.7588ms 1.4070ms 710.7531 Ops/s 697.4718 Ops/s $\color{#35bf28}+1.90\%$
test_keys_stack_nested_locked 0.7550ms 0.6704ms 1.4916 KOps/s 1.4541 KOps/s $\color{#35bf28}+2.58\%$
test_values 6.5924μs 1.1730μs 852.5254 KOps/s 772.4969 KOps/s $\textbf{\color{#35bf28}+10.36\%}$
test_values_nested 89.2960μs 49.2461μs 20.3062 KOps/s 20.0247 KOps/s $\color{#35bf28}+1.41\%$
test_values_nested_locked 82.5730μs 49.3704μs 20.2551 KOps/s 19.8548 KOps/s $\color{#35bf28}+2.02\%$
test_values_nested_leaf 99.2950μs 44.6909μs 22.3759 KOps/s 22.3729 KOps/s $\color{#35bf28}+0.01\%$
test_values_stack_nested 1.4297ms 1.1966ms 835.7200 Ops/s 813.3827 Ops/s $\color{#35bf28}+2.75\%$
test_values_stack_nested_leaf 1.4125ms 1.1913ms 839.4367 Ops/s 824.1280 Ops/s $\color{#35bf28}+1.86\%$
test_values_stack_nested_locked 0.9665ms 0.5118ms 1.9540 KOps/s 1.9014 KOps/s $\color{#35bf28}+2.77\%$
test_membership 9.7480μs 1.3523μs 739.4646 KOps/s 727.6418 KOps/s $\color{#35bf28}+1.62\%$
test_membership_nested 20.7280μs 2.8023μs 356.8508 KOps/s 351.4277 KOps/s $\color{#35bf28}+1.54\%$
test_membership_nested_leaf 28.1720μs 2.8511μs 350.7442 KOps/s 348.1738 KOps/s $\color{#35bf28}+0.74\%$
test_membership_stacked_nested 46.7170μs 11.8225μs 84.5844 KOps/s 83.1906 KOps/s $\color{#35bf28}+1.68\%$
test_membership_stacked_nested_leaf 34.9650μs 11.8419μs 84.4458 KOps/s 79.7554 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_membership_nested_last 34.5540μs 5.9203μs 168.9110 KOps/s 157.1748 KOps/s $\textbf{\color{#35bf28}+7.47\%}$
test_membership_nested_leaf_last 35.3150μs 6.0254μs 165.9630 KOps/s 166.2569 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_stacked_nested_last 0.2391ms 0.1683ms 5.9428 KOps/s 5.9803 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_stacked_nested_leaf_last 47.4580μs 13.7307μs 72.8297 KOps/s 71.8509 KOps/s $\color{#35bf28}+1.36\%$
test_nested_getleaf 39.7940μs 10.6570μs 93.8348 KOps/s 94.1917 KOps/s $\color{#d91a1a}-0.38\%$
test_nested_get 41.1470μs 10.1225μs 98.7900 KOps/s 100.5221 KOps/s $\color{#d91a1a}-1.72\%$
test_stacked_getleaf 1.0610ms 0.6459ms 1.5481 KOps/s 1.5174 KOps/s $\color{#35bf28}+2.02\%$
test_stacked_get 1.1805ms 0.6138ms 1.6293 KOps/s 1.5753 KOps/s $\color{#35bf28}+3.43\%$
test_nested_getitemleaf 29.9060μs 10.6520μs 93.8793 KOps/s 93.0424 KOps/s $\color{#35bf28}+0.90\%$
test_nested_getitem 39.2730μs 10.0967μs 99.0420 KOps/s 97.6376 KOps/s $\color{#35bf28}+1.44\%$
test_stacked_getitemleaf 0.7623ms 0.6427ms 1.5560 KOps/s 1.5005 KOps/s $\color{#35bf28}+3.69\%$
test_stacked_getitem 1.3129ms 0.6098ms 1.6399 KOps/s 1.5617 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_lock_nested 7.1352ms 0.5680ms 1.7606 KOps/s 1.7704 KOps/s $\color{#d91a1a}-0.56\%$
test_lock_stack_nested 7.6001ms 5.0558ms 197.7943 Ops/s 197.2205 Ops/s $\color{#35bf28}+0.29\%$
test_unlock_nested 70.3042ms 0.5134ms 1.9478 KOps/s 2.2528 KOps/s $\textbf{\color{#d91a1a}-13.54\%}$
test_unlock_stack_nested 66.0627ms 6.7768ms 147.5613 Ops/s 147.0538 Ops/s $\color{#35bf28}+0.35\%$
test_flatten_speed 0.5727ms 0.2676ms 3.7364 KOps/s 3.7207 KOps/s $\color{#35bf28}+0.42\%$
test_unflatten_speed 1.7455ms 0.4835ms 2.0684 KOps/s 2.1800 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_common_ops 1.2089ms 0.6642ms 1.5056 KOps/s 1.4846 KOps/s $\color{#35bf28}+1.42\%$
test_creation 58.8390μs 2.4758μs 403.9046 KOps/s 398.9798 KOps/s $\color{#35bf28}+1.23\%$
test_creation_empty 30.2060μs 8.1581μs 122.5778 KOps/s 123.3529 KOps/s $\color{#d91a1a}-0.63\%$
test_creation_nested_1 40.4450μs 11.2943μs 88.5406 KOps/s 87.6794 KOps/s $\color{#35bf28}+0.98\%$
test_creation_nested_2 39.6440μs 15.0413μs 66.4835 KOps/s 66.1438 KOps/s $\color{#35bf28}+0.51\%$
test_clone 87.8630μs 13.3905μs 74.6799 KOps/s 74.5813 KOps/s $\color{#35bf28}+0.13\%$
test_getitem[int] 34.8650μs 13.0017μs 76.9133 KOps/s 76.7222 KOps/s $\color{#35bf28}+0.25\%$
test_getitem[slice_int] 0.1301ms 26.3031μs 38.0183 KOps/s 38.9085 KOps/s $\color{#d91a1a}-2.29\%$
test_getitem[range] 87.9140μs 44.0657μs 22.6934 KOps/s 22.3669 KOps/s $\color{#35bf28}+1.46\%$
test_getitem[tuple] 63.3180μs 20.2697μs 49.3347 KOps/s 48.1991 KOps/s $\color{#35bf28}+2.36\%$
test_getitem[list] 94.2850μs 38.6804μs 25.8529 KOps/s 25.2651 KOps/s $\color{#35bf28}+2.33\%$
test_setitem_dim[int] 59.4300μs 28.7006μs 34.8425 KOps/s 35.0077 KOps/s $\color{#d91a1a}-0.47\%$
test_setitem_dim[slice_int] 92.5210μs 52.2875μs 19.1250 KOps/s 18.8562 KOps/s $\color{#35bf28}+1.43\%$
test_setitem_dim[range] 0.1487ms 73.1579μs 13.6691 KOps/s 13.6812 KOps/s $\color{#d91a1a}-0.09\%$
test_setitem_dim[tuple] 84.6870μs 41.9583μs 23.8332 KOps/s 23.8323 KOps/s $+0.00\%$
test_setitem 78.8260μs 18.4254μs 54.2729 KOps/s 54.1930 KOps/s $\color{#35bf28}+0.15\%$
test_set 79.7080μs 17.5207μs 57.0754 KOps/s 55.5369 KOps/s $\color{#35bf28}+2.77\%$
test_set_shared 1.6807ms 0.1411ms 7.0867 KOps/s 7.0358 KOps/s $\color{#35bf28}+0.72\%$
test_update 0.1111ms 19.4289μs 51.4698 KOps/s 53.3670 KOps/s $\color{#d91a1a}-3.56\%$
test_update_nested 89.2860μs 26.2988μs 38.0245 KOps/s 37.5058 KOps/s $\color{#35bf28}+1.38\%$
test_set_nested 89.8680μs 19.3517μs 51.6751 KOps/s 51.2719 KOps/s $\color{#35bf28}+0.79\%$
test_set_nested_new 78.2360μs 24.4348μs 40.9252 KOps/s 39.0178 KOps/s $\color{#35bf28}+4.89\%$
test_select 0.1050ms 49.8648μs 20.0542 KOps/s 19.7768 KOps/s $\color{#35bf28}+1.40\%$
test_unbind_speed 0.4551ms 0.3760ms 2.6597 KOps/s 2.6768 KOps/s $\color{#d91a1a}-0.64\%$
test_unbind_speed_stack0 63.5713ms 4.7261ms 211.5898 Ops/s 224.1267 Ops/s $\textbf{\color{#d91a1a}-5.59\%}$
test_unbind_speed_stack1 1.6471μs 0.6518μs 1.5342 MOps/s 1.5732 MOps/s $\color{#d91a1a}-2.48\%$
test_split 54.9724ms 1.7616ms 567.6578 Ops/s 567.6658 Ops/s $-0.00\%$
test_chunk 52.6340ms 1.7449ms 573.0871 Ops/s 549.7865 Ops/s $\color{#35bf28}+4.24\%$
test_creation[device0] 0.6078ms 0.2940ms 3.4013 KOps/s 3.3334 KOps/s $\color{#35bf28}+2.04\%$
test_creation_from_tensor 2.6767ms 0.3320ms 3.0121 KOps/s 3.0086 KOps/s $\color{#35bf28}+0.11\%$
test_add_one[memmap_tensor0] 93.8840μs 25.3146μs 39.5029 KOps/s 39.4299 KOps/s $\color{#35bf28}+0.19\%$
test_contiguous[memmap_tensor0] 30.4970μs 5.7286μs 174.5633 KOps/s 173.2443 KOps/s $\color{#35bf28}+0.76\%$
test_stack[memmap_tensor0] 87.5230μs 19.5318μs 51.1985 KOps/s 53.9055 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_memmaptd_index 1.2355ms 0.2251ms 4.4426 KOps/s 2.4373 KOps/s $\textbf{\color{#35bf28}+82.27\%}$
test_memmaptd_index_astensor 0.3215ms 0.2573ms 3.8860 KOps/s 2.1348 KOps/s $\textbf{\color{#35bf28}+82.03\%}$
test_memmaptd_index_op 0.6072ms 0.4956ms 2.0176 KOps/s 1.4201 KOps/s $\textbf{\color{#35bf28}+42.07\%}$
test_reshape_pytree 55.8040μs 23.4572μs 42.6308 KOps/s 43.2654 KOps/s $\color{#d91a1a}-1.47\%$
test_reshape_td 0.3969ms 31.6100μs 31.6355 KOps/s 30.6044 KOps/s $\color{#35bf28}+3.37\%$
test_view_pytree 58.5290μs 23.4583μs 42.6289 KOps/s 43.0650 KOps/s $\color{#d91a1a}-1.01\%$
test_view_td 20.3280μs 4.8598μs 205.7678 KOps/s 207.6055 KOps/s $\color{#d91a1a}-0.89\%$
test_unbind_pytree 76.8330μs 26.8161μs 37.2911 KOps/s 38.1050 KOps/s $\color{#d91a1a}-2.14\%$
test_unbind_td 0.1258ms 59.3170μs 16.8586 KOps/s 15.2398 KOps/s $\textbf{\color{#35bf28}+10.62\%}$
test_split_pytree 55.5440μs 26.7533μs 37.3785 KOps/s 38.4671 KOps/s $\color{#d91a1a}-2.83\%$
test_split_td 88.8260μs 46.3286μs 21.5849 KOps/s 20.9907 KOps/s $\color{#35bf28}+2.83\%$
test_add_pytree 73.7570μs 32.2100μs 31.0462 KOps/s 27.3006 KOps/s $\textbf{\color{#35bf28}+13.72\%}$
test_add_td 0.1308ms 45.4094μs 22.0219 KOps/s 21.8670 KOps/s $\color{#35bf28}+0.71\%$
test_distributed 49.1220μs 5.9400μs 168.3509 KOps/s 167.9495 KOps/s $\color{#35bf28}+0.24\%$
test_tdmodule 0.1728ms 20.8578μs 47.9438 KOps/s 44.3328 KOps/s $\textbf{\color{#35bf28}+8.15\%}$
test_tdmodule_dispatch 0.1682ms 37.7168μs 26.5134 KOps/s 25.3878 KOps/s $\color{#35bf28}+4.43\%$
test_tdseq 50.9540μs 23.7263μs 42.1474 KOps/s 40.6494 KOps/s $\color{#35bf28}+3.69\%$
test_tdseq_dispatch 0.1372ms 42.6089μs 23.4693 KOps/s 22.9931 KOps/s $\color{#35bf28}+2.07\%$
test_instantiation_functorch 1.4130ms 1.2995ms 769.5415 Ops/s 778.5909 Ops/s $\color{#d91a1a}-1.16\%$
test_instantiation_td 1.6170ms 1.0298ms 971.0264 Ops/s 923.1350 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_exec_functorch 0.3696ms 0.1589ms 6.2947 KOps/s 6.1495 KOps/s $\color{#35bf28}+2.36\%$
test_exec_functional_call 0.4108ms 0.1495ms 6.6896 KOps/s 6.6385 KOps/s $\color{#35bf28}+0.77\%$
test_exec_td 0.3274ms 0.1435ms 6.9695 KOps/s 6.8402 KOps/s $\color{#35bf28}+1.89\%$
test_exec_td_decorator 0.9495ms 0.1772ms 5.6420 KOps/s 5.4875 KOps/s $\color{#35bf28}+2.82\%$
test_vmap_mlp_speed[True-True] 1.4163ms 0.8990ms 1.1124 KOps/s 1.1181 KOps/s $\color{#d91a1a}-0.51\%$
test_vmap_mlp_speed[True-False] 0.9511ms 0.4808ms 2.0799 KOps/s 2.1178 KOps/s $\color{#d91a1a}-1.79\%$
test_vmap_mlp_speed[False-True] 1.1643ms 0.7811ms 1.2802 KOps/s 1.2923 KOps/s $\color{#d91a1a}-0.94\%$
test_vmap_mlp_speed[False-False] 0.5431ms 0.3814ms 2.6222 KOps/s 2.6003 KOps/s $\color{#35bf28}+0.84\%$
test_vmap_mlp_speed_decorator[True-True] 2.6583ms 1.7782ms 562.3693 Ops/s 565.0756 Ops/s $\color{#d91a1a}-0.48\%$
test_vmap_mlp_speed_decorator[True-False] 0.9007ms 0.5122ms 1.9523 KOps/s 1.9405 KOps/s $\color{#35bf28}+0.61\%$
test_vmap_mlp_speed_decorator[False-True] 2.0033ms 1.4830ms 674.3256 Ops/s 681.4433 Ops/s $\color{#d91a1a}-1.04\%$
test_vmap_mlp_speed_decorator[False-False] 0.7601ms 0.3961ms 2.5247 KOps/s 2.5088 KOps/s $\color{#35bf28}+0.64\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4615ms 12.6032μs 79.3450 KOps/s 78.3180 KOps/s $\color{#35bf28}+1.31\%$
test_plain_set_stack_nested 0.1967ms 0.1146ms 8.7284 KOps/s 8.6301 KOps/s $\color{#35bf28}+1.14\%$
test_plain_set_nested_inplace 33.4710μs 14.8612μs 67.2893 KOps/s 66.2095 KOps/s $\color{#35bf28}+1.63\%$
test_plain_set_stack_nested_inplace 0.1805ms 0.1399ms 7.1486 KOps/s 7.0619 KOps/s $\color{#35bf28}+1.23\%$
test_items 28.8110μs 4.6702μs 214.1250 KOps/s 212.5502 KOps/s $\color{#35bf28}+0.74\%$
test_items_nested 0.3774ms 0.3381ms 2.9574 KOps/s 2.9861 KOps/s $\color{#d91a1a}-0.96\%$
test_items_nested_locked 0.4287ms 0.3426ms 2.9185 KOps/s 2.9522 KOps/s $\color{#d91a1a}-1.14\%$
test_items_nested_leaf 0.2438ms 0.1991ms 5.0218 KOps/s 5.0531 KOps/s $\color{#d91a1a}-0.62\%$
test_items_stack_nested 1.5435ms 1.4793ms 675.9844 Ops/s 668.7865 Ops/s $\color{#35bf28}+1.08\%$
test_items_stack_nested_leaf 1.3606ms 1.3065ms 765.4036 Ops/s 759.3990 Ops/s $\color{#35bf28}+0.79\%$
test_items_stack_nested_locked 0.8497ms 0.8129ms 1.2302 KOps/s 1.2095 KOps/s $\color{#35bf28}+1.71\%$
test_keys 41.3110μs 4.5990μs 217.4367 KOps/s 215.9634 KOps/s $\color{#35bf28}+0.68\%$
test_keys_nested 3.2786ms 90.7523μs 11.0190 KOps/s 11.0600 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_nested_locked 0.1143ms 90.2784μs 11.0768 KOps/s 11.0639 KOps/s $\color{#35bf28}+0.12\%$
test_keys_nested_leaf 41.3409ms 86.7979μs 11.5210 KOps/s 12.1759 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_keys_stack_nested 1.5592ms 1.3013ms 768.4596 Ops/s 769.2222 Ops/s $\color{#d91a1a}-0.10\%$
test_keys_stack_nested_leaf 1.3658ms 1.2875ms 776.7148 Ops/s 767.7453 Ops/s $\color{#35bf28}+1.17\%$
test_keys_stack_nested_locked 0.6650ms 0.6234ms 1.6040 KOps/s 1.5779 KOps/s $\color{#35bf28}+1.65\%$
test_values 14.9237μs 1.8928μs 528.3141 KOps/s 526.2636 KOps/s $\color{#35bf28}+0.39\%$
test_values_nested 68.0830μs 43.1929μs 23.1519 KOps/s 23.0592 KOps/s $\color{#35bf28}+0.40\%$
test_values_nested_locked 0.1073ms 45.3581μs 22.0468 KOps/s 21.8343 KOps/s $\color{#35bf28}+0.97\%$
test_values_nested_leaf 58.0430μs 37.2890μs 26.8176 KOps/s 26.4450 KOps/s $\color{#35bf28}+1.41\%$
test_values_stack_nested 1.2051ms 1.1445ms 873.7288 Ops/s 866.9907 Ops/s $\color{#35bf28}+0.78\%$
test_values_stack_nested_leaf 1.1836ms 1.1219ms 891.3454 Ops/s 883.8879 Ops/s $\color{#35bf28}+0.84\%$
test_values_stack_nested_locked 0.5468ms 0.4992ms 2.0032 KOps/s 1.9616 KOps/s $\color{#35bf28}+2.12\%$
test_membership 6.1744μs 0.9373μs 1.0669 MOps/s 948.6238 KOps/s $\textbf{\color{#35bf28}+12.47\%}$
test_membership_nested 18.6710μs 2.1887μs 456.8875 KOps/s 456.7165 KOps/s $\color{#35bf28}+0.04\%$
test_membership_nested_leaf 14.0505μs 2.1169μs 472.3892 KOps/s 473.1406 KOps/s $\color{#d91a1a}-0.16\%$
test_membership_stacked_nested 40.7510μs 11.0455μs 90.5344 KOps/s 91.2699 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_stacked_nested_leaf 73.6830μs 11.0465μs 90.5268 KOps/s 90.8871 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_nested_last 35.8820μs 4.6606μs 214.5650 KOps/s 218.4917 KOps/s $\color{#d91a1a}-1.80\%$
test_membership_nested_leaf_last 22.7810μs 4.6893μs 213.2503 KOps/s 218.3642 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_stacked_nested_last 0.1866ms 0.1342ms 7.4530 KOps/s 7.3967 KOps/s $\color{#35bf28}+0.76\%$
test_membership_stacked_nested_leaf_last 51.6820μs 12.7416μs 78.4830 KOps/s 78.2578 KOps/s $\color{#35bf28}+0.29\%$
test_nested_getleaf 29.2310μs 8.3546μs 119.6950 KOps/s 118.8351 KOps/s $\color{#35bf28}+0.72\%$
test_nested_get 28.6510μs 7.9278μs 126.1391 KOps/s 125.9206 KOps/s $\color{#35bf28}+0.17\%$
test_stacked_getleaf 0.6987ms 0.5690ms 1.7576 KOps/s 1.7796 KOps/s $\color{#d91a1a}-1.24\%$
test_stacked_get 0.6061ms 0.5440ms 1.8384 KOps/s 1.8715 KOps/s $\color{#d91a1a}-1.77\%$
test_nested_getitemleaf 27.7610μs 8.5018μs 117.6219 KOps/s 118.0450 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_getitem 31.2610μs 8.0607μs 124.0594 KOps/s 124.8637 KOps/s $\color{#d91a1a}-0.64\%$
test_stacked_getitemleaf 0.8116ms 0.5668ms 1.7643 KOps/s 1.7569 KOps/s $\color{#35bf28}+0.42\%$
test_stacked_getitem 0.5586ms 0.5328ms 1.8768 KOps/s 1.8543 KOps/s $\color{#35bf28}+1.21\%$
test_lock_nested 3.2113ms 0.5539ms 1.8053 KOps/s 1.7612 KOps/s $\color{#35bf28}+2.51\%$
test_lock_stack_nested 82.9370ms 7.2093ms 138.7105 Ops/s 137.9740 Ops/s $\color{#35bf28}+0.53\%$
test_unlock_nested 2.3688ms 0.4265ms 2.3444 KOps/s 2.3029 KOps/s $\color{#35bf28}+1.80\%$
test_unlock_stack_nested 66.5467ms 6.2426ms 160.1906 Ops/s 158.0031 Ops/s $\color{#35bf28}+1.38\%$
test_flatten_speed 0.2246ms 0.1861ms 5.3732 KOps/s 5.3233 KOps/s $\color{#35bf28}+0.94\%$
test_unflatten_speed 0.4375ms 0.3647ms 2.7420 KOps/s 2.7481 KOps/s $\color{#d91a1a}-0.22\%$
test_common_ops 1.1255ms 0.5933ms 1.6856 KOps/s 1.6475 KOps/s $\color{#35bf28}+2.31\%$
test_creation 64.1220μs 2.1034μs 475.4099 KOps/s 474.4200 KOps/s $\color{#35bf28}+0.21\%$
test_creation_empty 27.5710μs 6.7310μs 148.5669 KOps/s 138.8676 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_creation_nested_1 42.2310μs 9.1010μs 109.8778 KOps/s 104.8260 KOps/s $\color{#35bf28}+4.82\%$
test_creation_nested_2 41.1420μs 11.8089μs 84.6816 KOps/s 82.0787 KOps/s $\color{#35bf28}+3.17\%$
test_clone 97.3440μs 14.2661μs 70.0965 KOps/s 68.7352 KOps/s $\color{#35bf28}+1.98\%$
test_getitem[int] 30.3310μs 12.1694μs 82.1736 KOps/s 81.2715 KOps/s $\color{#35bf28}+1.11\%$
test_getitem[slice_int] 50.9020μs 23.7061μs 42.1833 KOps/s 42.0469 KOps/s $\color{#35bf28}+0.32\%$
test_getitem[range] 0.2405ms 40.1628μs 24.8986 KOps/s 24.8468 KOps/s $\color{#35bf28}+0.21\%$
test_getitem[tuple] 40.4810μs 20.0523μs 49.8696 KOps/s 48.8674 KOps/s $\color{#35bf28}+2.05\%$
test_getitem[list] 0.2554ms 36.5407μs 27.3668 KOps/s 26.4924 KOps/s $\color{#35bf28}+3.30\%$
test_setitem_dim[int] 56.7230μs 25.3252μs 39.4863 KOps/s 37.9526 KOps/s $\color{#35bf28}+4.04\%$
test_setitem_dim[slice_int] 61.6930μs 45.3231μs 22.0638 KOps/s 21.4143 KOps/s $\color{#35bf28}+3.03\%$
test_setitem_dim[range] 97.3540μs 62.7407μs 15.9386 KOps/s 15.7419 KOps/s $\color{#35bf28}+1.25\%$
test_setitem_dim[tuple] 59.4820μs 38.9700μs 25.6608 KOps/s 25.7412 KOps/s $\color{#d91a1a}-0.31\%$
test_setitem 94.3340μs 17.9339μs 55.7603 KOps/s 53.9342 KOps/s $\color{#35bf28}+3.39\%$
test_set 88.2050μs 17.4950μs 57.1593 KOps/s 56.3603 KOps/s $\color{#35bf28}+1.42\%$
test_set_shared 2.8966ms 0.1047ms 9.5485 KOps/s 8.6112 KOps/s $\textbf{\color{#35bf28}+10.88\%}$
test_update 94.9040μs 18.4243μs 54.2762 KOps/s 52.2170 KOps/s $\color{#35bf28}+3.94\%$
test_update_nested 0.1077ms 25.2293μs 39.6364 KOps/s 38.9236 KOps/s $\color{#35bf28}+1.83\%$
test_set_nested 98.4540μs 18.7529μs 53.3250 KOps/s 52.2182 KOps/s $\color{#35bf28}+2.12\%$
test_set_nested_new 96.3240μs 22.9429μs 43.5864 KOps/s 42.5748 KOps/s $\color{#35bf28}+2.38\%$
test_select 73.8830μs 46.3572μs 21.5716 KOps/s 21.2255 KOps/s $\color{#35bf28}+1.63\%$
test_to 74.6230μs 54.8299μs 18.2382 KOps/s 18.2716 KOps/s $\color{#d91a1a}-0.18\%$
test_to_nonblocking 68.5530μs 34.8684μs 28.6793 KOps/s 25.7870 KOps/s $\textbf{\color{#35bf28}+11.22\%}$
test_unbind_speed 0.4077ms 0.3618ms 2.7636 KOps/s 2.7548 KOps/s $\color{#35bf28}+0.32\%$
test_unbind_speed_stack0 63.2311ms 4.3730ms 228.6754 Ops/s 233.1590 Ops/s $\color{#d91a1a}-1.92\%$
test_unbind_speed_stack1 1.6431μs 0.5259μs 1.9016 MOps/s 1.8929 MOps/s $\color{#35bf28}+0.46\%$
test_split 54.0366ms 1.8082ms 553.0379 Ops/s 541.5858 Ops/s $\color{#35bf28}+2.11\%$
test_chunk 54.3971ms 1.7994ms 555.7271 Ops/s 544.9208 Ops/s $\color{#35bf28}+1.98\%$
test_creation[device0] 0.5046ms 0.3092ms 3.2346 KOps/s 3.2283 KOps/s $\color{#35bf28}+0.20\%$
test_creation[device1] 0.8266ms 0.3145ms 3.1797 KOps/s 3.1659 KOps/s $\color{#35bf28}+0.44\%$
test_creation_from_tensor 57.3963ms 0.3675ms 2.7208 KOps/s 2.9052 KOps/s $\textbf{\color{#d91a1a}-6.35\%}$
test_add_one[memmap_tensor0] 91.0340μs 23.9931μs 41.6787 KOps/s 40.5487 KOps/s $\color{#35bf28}+2.79\%$
test_add_one[memmap_tensor1] 0.2175ms 72.9038μs 13.7167 KOps/s 13.3934 KOps/s $\color{#35bf28}+2.41\%$
test_contiguous[memmap_tensor0] 20.4710μs 5.7952μs 172.5577 KOps/s 172.1464 KOps/s $\color{#35bf28}+0.24\%$
test_contiguous[memmap_tensor1] 61.6230μs 21.9253μs 45.6094 KOps/s 45.4119 KOps/s $\color{#35bf28}+0.43\%$
test_stack[memmap_tensor0] 88.9550μs 19.2375μs 51.9817 KOps/s 50.8372 KOps/s $\color{#35bf28}+2.25\%$
test_stack[memmap_tensor1] 0.1515ms 73.3245μs 13.6380 KOps/s 13.7088 KOps/s $\color{#d91a1a}-0.52\%$
test_memmaptd_index 0.2721ms 0.2350ms 4.2548 KOps/s 2.2685 KOps/s $\textbf{\color{#35bf28}+87.56\%}$
test_memmaptd_index_astensor 0.3734ms 0.2938ms 3.4039 KOps/s 2.0154 KOps/s $\textbf{\color{#35bf28}+68.90\%}$
test_memmaptd_index_op 0.5966ms 0.5452ms 1.8341 KOps/s 1.3118 KOps/s $\textbf{\color{#35bf28}+39.82\%}$
test_reshape_pytree 39.7710μs 20.6530μs 48.4192 KOps/s 47.2713 KOps/s $\color{#35bf28}+2.43\%$
test_reshape_td 59.2020μs 30.6095μs 32.6696 KOps/s 31.9493 KOps/s $\color{#35bf28}+2.25\%$
test_view_pytree 36.4320μs 20.6473μs 48.4324 KOps/s 47.8569 KOps/s $\color{#35bf28}+1.20\%$
test_view_td 17.7210μs 4.0879μs 244.6240 KOps/s 243.0415 KOps/s $\color{#35bf28}+0.65\%$
test_unbind_pytree 55.5120μs 25.8276μs 38.7183 KOps/s 37.6220 KOps/s $\color{#35bf28}+2.91\%$
test_unbind_td 96.3440μs 56.0317μs 17.8470 KOps/s 17.4211 KOps/s $\color{#35bf28}+2.45\%$
test_split_pytree 52.9920μs 23.8551μs 41.9197 KOps/s 41.3863 KOps/s $\color{#35bf28}+1.29\%$
test_split_td 71.4130μs 45.0372μs 22.2039 KOps/s 22.0811 KOps/s $\color{#35bf28}+0.56\%$
test_add_pytree 68.3730μs 31.7818μs 31.4646 KOps/s 30.9881 KOps/s $\color{#35bf28}+1.54\%$
test_add_td 73.6630μs 43.8924μs 22.7830 KOps/s 21.1781 KOps/s $\textbf{\color{#35bf28}+7.58\%}$
test_distributed 18.1410μs 5.7663μs 173.4202 KOps/s 180.5504 KOps/s $\color{#d91a1a}-3.95\%$
test_tdmodule 32.5110μs 16.7638μs 59.6523 KOps/s 58.6250 KOps/s $\color{#35bf28}+1.75\%$
test_tdmodule_dispatch 0.1200ms 32.8489μs 30.4424 KOps/s 30.0855 KOps/s $\color{#35bf28}+1.19\%$
test_tdseq 35.8810μs 19.6918μs 50.7826 KOps/s 48.9194 KOps/s $\color{#35bf28}+3.81\%$
test_tdseq_dispatch 50.9820μs 35.6687μs 28.0358 KOps/s 27.6472 KOps/s $\color{#35bf28}+1.41\%$
test_instantiation_functorch 1.7997ms 1.6697ms 598.9247 Ops/s 597.3339 Ops/s $\color{#35bf28}+0.27\%$
test_instantiation_td 1.6844ms 1.1908ms 839.7842 Ops/s 843.6504 Ops/s $\color{#d91a1a}-0.46\%$
test_exec_functorch 0.2141ms 0.1600ms 6.2499 KOps/s 6.1964 KOps/s $\color{#35bf28}+0.86\%$
test_exec_functional_call 0.2113ms 0.1618ms 6.1791 KOps/s 6.3270 KOps/s $\color{#d91a1a}-2.34\%$
test_exec_td 0.2088ms 0.1516ms 6.5962 KOps/s 6.7300 KOps/s $\color{#d91a1a}-1.99\%$
test_exec_td_decorator 0.7449ms 0.1922ms 5.2030 KOps/s 5.3099 KOps/s $\color{#d91a1a}-2.01\%$
test_vmap_mlp_speed[True-True] 1.1842ms 1.0811ms 925.0141 Ops/s 881.7027 Ops/s $\color{#35bf28}+4.91\%$
test_vmap_mlp_speed[True-False] 0.6555ms 0.6194ms 1.6145 KOps/s 1.5536 KOps/s $\color{#35bf28}+3.92\%$
test_vmap_mlp_speed[False-True] 1.0816ms 0.9977ms 1.0023 KOps/s 970.9010 Ops/s $\color{#35bf28}+3.24\%$
test_vmap_mlp_speed[False-False] 0.6074ms 0.5470ms 1.8280 KOps/s 1.7407 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_vmap_mlp_speed_decorator[True-True] 3.0028ms 2.0612ms 485.1652 Ops/s 467.5803 Ops/s $\color{#35bf28}+3.76\%$
test_vmap_mlp_speed_decorator[True-False] 1.1849ms 0.6629ms 1.5085 KOps/s 1.4614 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed_decorator[False-True] 2.2447ms 1.7828ms 560.9020 Ops/s 531.9862 Ops/s $\textbf{\color{#35bf28}+5.44\%}$
test_vmap_mlp_speed_decorator[False-False] 0.9809ms 0.5640ms 1.7731 KOps/s 1.6935 KOps/s $\color{#35bf28}+4.70\%$
test_vmap_transformer_speed[True-True] 12.8824ms 12.7318ms 78.5434 Ops/s 76.7213 Ops/s $\color{#35bf28}+2.37\%$
test_vmap_transformer_speed[True-False] 8.3775ms 8.2961ms 120.5385 Ops/s 118.3804 Ops/s $\color{#35bf28}+1.82\%$
test_vmap_transformer_speed[False-True] 12.7165ms 12.6251ms 79.2075 Ops/s 77.4830 Ops/s $\color{#35bf28}+2.23\%$
test_vmap_transformer_speed[False-False] 8.2684ms 8.1915ms 122.0779 Ops/s 119.0515 Ops/s $\color{#35bf28}+2.54\%$
test_vmap_transformer_speed_decorator[True-True] 66.1297ms 65.0234ms 15.3791 Ops/s 14.0207 Ops/s $\textbf{\color{#35bf28}+9.69\%}$
test_vmap_transformer_speed_decorator[True-False] 22.0925ms 20.0479ms 49.8805 Ops/s 48.9901 Ops/s $\color{#35bf28}+1.82\%$
test_vmap_transformer_speed_decorator[False-True] 0.1380s 63.3128ms 15.7946 Ops/s 16.5634 Ops/s $\color{#d91a1a}-4.64\%$
test_vmap_transformer_speed_decorator[False-False] 21.6904ms 19.6205ms 50.9670 Ops/s 46.2853 Ops/s $\textbf{\color{#35bf28}+10.11\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants