Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix key ordering in pointwise ops #855

Merged
merged 4 commits into from
Jul 5, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 5, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 5, 2024
@vmoens vmoens added the bug Something isn't working label Jul 5, 2024
Copy link

github-actions bot commented Jul 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 48.0400μs 17.4676μs 57.2487 KOps/s 59.3692 KOps/s $\color{#d91a1a}-3.57\%$
test_plain_set_stack_nested 44.9830μs 17.6326μs 56.7133 KOps/s 59.0901 KOps/s $\color{#d91a1a}-4.02\%$
test_plain_set_nested_inplace 52.9980μs 19.7295μs 50.6856 KOps/s 51.5276 KOps/s $\color{#d91a1a}-1.63\%$
test_plain_set_stack_nested_inplace 58.3480μs 19.2812μs 51.8640 KOps/s 51.7272 KOps/s $\color{#35bf28}+0.26\%$
test_items 33.1020μs 2.5346μs 394.5341 KOps/s 388.2307 KOps/s $\color{#35bf28}+1.62\%$
test_items_nested 0.8977ms 0.2870ms 3.4838 KOps/s 3.6669 KOps/s $\color{#d91a1a}-4.99\%$
test_items_nested_locked 0.4075ms 0.2873ms 3.4801 KOps/s 3.6545 KOps/s $\color{#d91a1a}-4.77\%$
test_items_nested_leaf 0.1500ms 79.5179μs 12.5758 KOps/s 12.7497 KOps/s $\color{#d91a1a}-1.36\%$
test_items_stack_nested 0.5658ms 0.2888ms 3.4628 KOps/s 3.6509 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_items_stack_nested_leaf 0.1464ms 79.4202μs 12.5913 KOps/s 12.6915 KOps/s $\color{#d91a1a}-0.79\%$
test_items_stack_nested_locked 0.7013ms 0.2901ms 3.4470 KOps/s 3.5722 KOps/s $\color{#d91a1a}-3.50\%$
test_keys 32.8320μs 3.9897μs 250.6479 KOps/s 245.5290 KOps/s $\color{#35bf28}+2.08\%$
test_keys_nested 0.2269ms 0.1372ms 7.2860 KOps/s 7.2812 KOps/s $\color{#35bf28}+0.07\%$
test_keys_nested_locked 0.7233ms 0.1410ms 7.0918 KOps/s 6.8766 KOps/s $\color{#35bf28}+3.13\%$
test_keys_nested_leaf 0.2068ms 0.1159ms 8.6302 KOps/s 8.4744 KOps/s $\color{#35bf28}+1.84\%$
test_keys_stack_nested 0.2325ms 0.1371ms 7.2936 KOps/s 7.0867 KOps/s $\color{#35bf28}+2.92\%$
test_keys_stack_nested_leaf 0.2517ms 0.1166ms 8.5729 KOps/s 8.5373 KOps/s $\color{#35bf28}+0.42\%$
test_keys_stack_nested_locked 0.2305ms 0.1410ms 7.0936 KOps/s 6.9584 KOps/s $\color{#35bf28}+1.94\%$
test_values 5.1634μs 1.1584μs 863.2640 KOps/s 866.1952 KOps/s $\color{#d91a1a}-0.34\%$
test_values_nested 96.3490μs 50.7123μs 19.7191 KOps/s 20.0204 KOps/s $\color{#d91a1a}-1.51\%$
test_values_nested_locked 0.1062ms 51.0613μs 19.5843 KOps/s 19.4873 KOps/s $\color{#35bf28}+0.50\%$
test_values_nested_leaf 90.6490μs 46.1707μs 21.6587 KOps/s 22.1367 KOps/s $\color{#d91a1a}-2.16\%$
test_values_stack_nested 0.1029ms 51.2310μs 19.5194 KOps/s 19.6480 KOps/s $\color{#d91a1a}-0.65\%$
test_values_stack_nested_leaf 96.8600μs 45.8616μs 21.8047 KOps/s 21.9991 KOps/s $\color{#d91a1a}-0.88\%$
test_values_stack_nested_locked 0.1022ms 51.1773μs 19.5399 KOps/s 19.8016 KOps/s $\color{#d91a1a}-1.32\%$
test_membership 33.4420μs 1.3493μs 741.1241 KOps/s 742.3769 KOps/s $\color{#d91a1a}-0.17\%$
test_membership_nested 39.9440μs 3.4440μs 290.3642 KOps/s 292.1907 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_nested_leaf 32.2610μs 3.4141μs 292.9023 KOps/s 286.1330 KOps/s $\color{#35bf28}+2.37\%$
test_membership_stacked_nested 32.0790μs 3.3910μs 294.8950 KOps/s 292.6537 KOps/s $\color{#35bf28}+0.77\%$
test_membership_stacked_nested_leaf 19.3460μs 3.4253μs 291.9477 KOps/s 284.5214 KOps/s $\color{#35bf28}+2.61\%$
test_membership_nested_last 23.7940μs 4.2204μs 236.9462 KOps/s 237.5760 KOps/s $\color{#d91a1a}-0.27\%$
test_membership_nested_leaf_last 21.7700μs 4.2264μs 236.6062 KOps/s 238.7599 KOps/s $\color{#d91a1a}-0.90\%$
test_membership_stacked_nested_last 45.0740μs 4.1381μs 241.6560 KOps/s 239.7896 KOps/s $\color{#35bf28}+0.78\%$
test_membership_stacked_nested_leaf_last 26.8100μs 4.1600μs 240.3844 KOps/s 236.6099 KOps/s $\color{#35bf28}+1.60\%$
test_nested_getleaf 43.9410μs 10.7662μs 92.8836 KOps/s 90.3187 KOps/s $\color{#35bf28}+2.84\%$
test_nested_get 43.4710μs 10.1914μs 98.1222 KOps/s 97.0166 KOps/s $\color{#35bf28}+1.14\%$
test_stacked_getleaf 90.4380μs 10.8017μs 92.5777 KOps/s 92.5464 KOps/s $\color{#35bf28}+0.03\%$
test_stacked_get 27.4110μs 10.2003μs 98.0364 KOps/s 97.5946 KOps/s $\color{#35bf28}+0.45\%$
test_nested_getitemleaf 50.9850μs 11.3136μs 88.3891 KOps/s 87.1703 KOps/s $\color{#35bf28}+1.40\%$
test_nested_getitem 47.3980μs 10.4270μs 95.9046 KOps/s 96.5153 KOps/s $\color{#d91a1a}-0.63\%$
test_stacked_getitemleaf 65.9920μs 11.1999μs 89.2864 KOps/s 88.2859 KOps/s $\color{#35bf28}+1.13\%$
test_stacked_getitem 49.5820μs 10.4183μs 95.9848 KOps/s 94.4609 KOps/s $\color{#35bf28}+1.61\%$
test_lock_nested 1.9713ms 0.3321ms 3.0114 KOps/s 3.0383 KOps/s $\color{#d91a1a}-0.88\%$
test_lock_stack_nested 0.3647ms 0.3019ms 3.3125 KOps/s 3.3095 KOps/s $\color{#35bf28}+0.09\%$
test_unlock_nested 0.8446ms 0.3336ms 2.9976 KOps/s 3.0015 KOps/s $\color{#d91a1a}-0.13\%$
test_unlock_stack_nested 0.4362ms 0.3106ms 3.2197 KOps/s 3.2256 KOps/s $\color{#d91a1a}-0.18\%$
test_flatten_speed 0.3599ms 97.8982μs 10.2147 KOps/s 10.3779 KOps/s $\color{#d91a1a}-1.57\%$
test_unflatten_speed 0.5928ms 0.4081ms 2.4502 KOps/s 2.4476 KOps/s $\color{#35bf28}+0.10\%$
test_common_ops 3.9255ms 0.7355ms 1.3595 KOps/s 1.3751 KOps/s $\color{#d91a1a}-1.13\%$
test_creation 15.6990μs 1.8890μs 529.3852 KOps/s 530.2646 KOps/s $\color{#d91a1a}-0.17\%$
test_creation_empty 40.6960μs 11.1645μs 89.5696 KOps/s 91.8493 KOps/s $\color{#d91a1a}-2.48\%$
test_creation_nested_1 58.1180μs 13.5524μs 73.7878 KOps/s 73.9783 KOps/s $\color{#d91a1a}-0.26\%$
test_creation_nested_2 41.0770μs 16.8276μs 59.4261 KOps/s 59.1302 KOps/s $\color{#35bf28}+0.50\%$
test_clone 86.9710μs 12.7595μs 78.3729 KOps/s 75.6635 KOps/s $\color{#35bf28}+3.58\%$
test_getitem[int] 86.6910μs 10.9375μs 91.4282 KOps/s 90.4363 KOps/s $\color{#35bf28}+1.10\%$
test_getitem[slice_int] 79.5280μs 22.0188μs 45.4157 KOps/s 43.5293 KOps/s $\color{#35bf28}+4.33\%$
test_getitem[range] 78.2050μs 59.0516μs 16.9343 KOps/s 17.1898 KOps/s $\color{#d91a1a}-1.49\%$
test_getitem[tuple] 50.4640μs 18.4174μs 54.2965 KOps/s 53.4881 KOps/s $\color{#35bf28}+1.51\%$
test_getitem[list] 77.9650μs 39.6869μs 25.1972 KOps/s 24.7631 KOps/s $\color{#35bf28}+1.75\%$
test_setitem_dim[int] 0.1581ms 35.5473μs 28.1315 KOps/s 28.4799 KOps/s $\color{#d91a1a}-1.22\%$
test_setitem_dim[slice_int] 89.6670μs 60.8280μs 16.4398 KOps/s 16.0174 KOps/s $\color{#35bf28}+2.64\%$
test_setitem_dim[range] 0.1454ms 82.3422μs 12.1444 KOps/s 12.0598 KOps/s $\color{#35bf28}+0.70\%$
test_setitem_dim[tuple] 91.3500μs 48.4054μs 20.6588 KOps/s 19.6323 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_setitem 51.9960μs 19.3406μs 51.7046 KOps/s 48.1569 KOps/s $\textbf{\color{#35bf28}+7.37\%}$
test_set 73.2740μs 18.8158μs 53.1468 KOps/s 50.7435 KOps/s $\color{#35bf28}+4.74\%$
test_set_shared 3.9772ms 0.1423ms 7.0269 KOps/s 6.9668 KOps/s $\color{#35bf28}+0.86\%$
test_update 0.1298ms 22.2611μs 44.9215 KOps/s 43.2631 KOps/s $\color{#35bf28}+3.83\%$
test_update_nested 74.4080μs 31.0185μs 32.2388 KOps/s 30.9600 KOps/s $\color{#35bf28}+4.13\%$
test_update__nested 0.1183ms 24.8029μs 40.3179 KOps/s 39.3909 KOps/s $\color{#35bf28}+2.35\%$
test_set_nested 67.1250μs 21.0451μs 47.5170 KOps/s 45.2224 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_set_nested_new 83.0850μs 25.1771μs 39.7186 KOps/s 38.0336 KOps/s $\color{#35bf28}+4.43\%$
test_select 0.1014ms 40.3256μs 24.7982 KOps/s 24.0472 KOps/s $\color{#35bf28}+3.12\%$
test_select_nested 0.1173ms 57.2151μs 17.4779 KOps/s 17.5885 KOps/s $\color{#d91a1a}-0.63\%$
test_exclude_nested 0.2675ms 0.1176ms 8.5069 KOps/s 8.3296 KOps/s $\color{#35bf28}+2.13\%$
test_empty[True] 0.5769ms 0.3937ms 2.5397 KOps/s 2.5363 KOps/s $\color{#35bf28}+0.13\%$
test_empty[False] 8.2874μs 1.0231μs 977.4286 KOps/s 989.4440 KOps/s $\color{#d91a1a}-1.21\%$
test_unbind_speed 0.4523ms 0.2435ms 4.1065 KOps/s 4.0806 KOps/s $\color{#35bf28}+0.64\%$
test_unbind_speed_stack0 0.4401ms 0.2438ms 4.1018 KOps/s 4.1893 KOps/s $\color{#d91a1a}-2.09\%$
test_unbind_speed_stack1 67.6612ms 0.7046ms 1.4192 KOps/s 1.4664 KOps/s $\color{#d91a1a}-3.22\%$
test_split 72.7578ms 1.5814ms 632.3617 Ops/s 636.4504 Ops/s $\color{#d91a1a}-0.64\%$
test_chunk 65.0146ms 1.5658ms 638.6522 Ops/s 633.8638 Ops/s $\color{#35bf28}+0.76\%$
test_creation[device0] 0.1627ms 82.4405μs 12.1300 KOps/s 11.8951 KOps/s $\color{#35bf28}+1.97\%$
test_creation_from_tensor 4.0303ms 84.8635μs 11.7836 KOps/s 11.4287 KOps/s $\color{#35bf28}+3.11\%$
test_add_one[memmap_tensor0] 41.2670μs 5.7853μs 172.8504 KOps/s 182.0908 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_contiguous[memmap_tensor0] 20.0270μs 0.6532μs 1.5310 MOps/s 1.5558 MOps/s $\color{#d91a1a}-1.60\%$
test_stack[memmap_tensor0] 22.8430μs 3.6951μs 270.6283 KOps/s 264.5751 KOps/s $\color{#35bf28}+2.29\%$
test_memmaptd_index 1.0457ms 0.2629ms 3.8040 KOps/s 3.8622 KOps/s $\color{#d91a1a}-1.51\%$
test_memmaptd_index_astensor 0.6812ms 0.3355ms 2.9808 KOps/s 2.9821 KOps/s $\color{#d91a1a}-0.05\%$
test_memmaptd_index_op 0.9768ms 0.6388ms 1.5655 KOps/s 1.5752 KOps/s $\color{#d91a1a}-0.62\%$
test_serialize_model 0.1703s 0.1051s 9.5115 Ops/s 10.3705 Ops/s $\textbf{\color{#d91a1a}-8.28\%}$
test_serialize_model_pickle 0.4564s 0.3785s 2.6422 Ops/s 2.6248 Ops/s $\color{#35bf28}+0.66\%$
test_serialize_weights 97.3648ms 93.5610ms 10.6882 Ops/s 9.5075 Ops/s $\textbf{\color{#35bf28}+12.42\%}$
test_serialize_weights_returnearly 0.1829s 0.1273s 7.8555 Ops/s 8.6030 Ops/s $\textbf{\color{#d91a1a}-8.69\%}$
test_serialize_weights_pickle 0.7621s 0.4896s 2.0423 Ops/s 1.5805 Ops/s $\textbf{\color{#35bf28}+29.22\%}$
test_serialize_weights_filesystem 99.1071ms 92.2741ms 10.8373 Ops/s 10.3740 Ops/s $\color{#35bf28}+4.47\%$
test_serialize_model_filesystem 0.1545s 98.2059ms 10.1827 Ops/s 10.7650 Ops/s $\textbf{\color{#d91a1a}-5.41\%}$
test_reshape_pytree 73.3960μs 25.4517μs 39.2902 KOps/s 39.1798 KOps/s $\color{#35bf28}+0.28\%$
test_reshape_td 79.5080μs 33.1326μs 30.1817 KOps/s 29.7268 KOps/s $\color{#35bf28}+1.53\%$
test_view_pytree 71.2320μs 25.3080μs 39.5131 KOps/s 39.2397 KOps/s $\color{#35bf28}+0.70\%$
test_view_td 83.7760μs 37.6721μs 26.5448 KOps/s 25.5600 KOps/s $\color{#35bf28}+3.85\%$
test_unbind_pytree 90.9590μs 29.4025μs 34.0108 KOps/s 34.0353 KOps/s $\color{#d91a1a}-0.07\%$
test_unbind_td 0.4424ms 36.2495μs 27.5866 KOps/s 27.5263 KOps/s $\color{#35bf28}+0.22\%$
test_split_pytree 80.7210μs 28.9392μs 34.5553 KOps/s 33.8029 KOps/s $\color{#35bf28}+2.23\%$
test_split_td 0.1193ms 38.1275μs 26.2278 KOps/s 25.0536 KOps/s $\color{#35bf28}+4.69\%$
test_add_pytree 0.1209ms 35.0464μs 28.5336 KOps/s 27.6699 KOps/s $\color{#35bf28}+3.12\%$
test_add_td 0.1332ms 57.7992μs 17.3013 KOps/s 17.4066 KOps/s $\color{#d91a1a}-0.61\%$
test_distributed 0.1780ms 0.1033ms 9.6787 KOps/s 9.6895 KOps/s $\color{#d91a1a}-0.11\%$
test_tdmodule 45.4650μs 18.5516μs 53.9036 KOps/s 56.8430 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_tdmodule_dispatch 53.7400μs 36.3330μs 27.5232 KOps/s 27.8732 KOps/s $\color{#d91a1a}-1.26\%$
test_tdseq 42.0080μs 21.2272μs 47.1093 KOps/s 49.3227 KOps/s $\color{#d91a1a}-4.49\%$
test_tdseq_dispatch 77.7740μs 41.6460μs 24.0119 KOps/s 24.8745 KOps/s $\color{#d91a1a}-3.47\%$
test_instantiation_functorch 1.9451ms 1.3217ms 756.5899 Ops/s 742.2690 Ops/s $\color{#35bf28}+1.93\%$
test_instantiation_td 1.5508ms 1.0194ms 980.9335 Ops/s 967.4085 Ops/s $\color{#35bf28}+1.40\%$
test_exec_functorch 0.2979ms 0.1750ms 5.7151 KOps/s 6.1332 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_exec_functional_call 0.2985ms 0.1536ms 6.5119 KOps/s 6.5214 KOps/s $\color{#d91a1a}-0.15\%$
test_exec_td 0.2819ms 0.1477ms 6.7690 KOps/s 6.7960 KOps/s $\color{#d91a1a}-0.40\%$
test_exec_td_decorator 0.3603ms 0.2237ms 4.4701 KOps/s 4.4692 KOps/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed[True-True] 0.8252ms 0.4962ms 2.0154 KOps/s 2.0567 KOps/s $\color{#d91a1a}-2.01\%$
test_vmap_mlp_speed[True-False] 0.9009ms 0.4912ms 2.0357 KOps/s 2.0670 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed[False-True] 0.6609ms 0.4034ms 2.4789 KOps/s 2.5394 KOps/s $\color{#d91a1a}-2.38\%$
test_vmap_mlp_speed[False-False] 0.6138ms 0.4011ms 2.4930 KOps/s 2.5261 KOps/s $\color{#d91a1a}-1.31\%$
test_vmap_mlp_speed_decorator[True-True] 0.8938ms 0.5672ms 1.7632 KOps/s 1.7913 KOps/s $\color{#d91a1a}-1.57\%$
test_vmap_mlp_speed_decorator[True-False] 0.9672ms 0.5669ms 1.7640 KOps/s 1.8062 KOps/s $\color{#d91a1a}-2.33\%$
test_vmap_mlp_speed_decorator[False-True] 0.7698ms 0.4636ms 2.1570 KOps/s 2.2038 KOps/s $\color{#d91a1a}-2.12\%$
test_vmap_mlp_speed_decorator[False-False] 0.8328ms 0.4636ms 2.1571 KOps/s 2.1972 KOps/s $\color{#d91a1a}-1.83\%$
test_to_module_speed[True] 2.7421ms 1.6719ms 598.1370 Ops/s 593.3370 Ops/s $\color{#35bf28}+0.81\%$
test_to_module_speed[False] 1.7697ms 1.6353ms 611.4960 Ops/s 613.9396 Ops/s $\color{#d91a1a}-0.40\%$
test_tc_init 0.1193ms 57.8828μs 17.2763 KOps/s 16.4827 KOps/s $\color{#35bf28}+4.81\%$
test_tc_init_nested 0.1974ms 0.1187ms 8.4245 KOps/s 8.4609 KOps/s $\color{#d91a1a}-0.43\%$
test_tc_first_layer_tensor 28.3130μs 8.2675μs 120.9561 KOps/s 114.8011 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_tc_first_layer_nontensor 48.9110μs 8.1887μs 122.1193 KOps/s 113.8747 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_tc_second_layer_tensor 25.7980μs 2.5271μs 395.7068 KOps/s 379.3959 KOps/s $\color{#35bf28}+4.30\%$
test_tc_second_layer_nontensor 50.3230μs 9.2188μs 108.4736 KOps/s 101.4017 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_unbind 84.9729ms 14.5708ms 68.6304 Ops/s 71.1493 Ops/s $\color{#d91a1a}-3.54\%$
test_full_like 8.8912ms 7.1534ms 139.7940 Ops/s 146.4651 Ops/s $\color{#d91a1a}-4.55\%$
test_zeros_like 11.5723ms 5.3233ms 187.8549 Ops/s 170.8059 Ops/s $\textbf{\color{#35bf28}+9.98\%}$
test_ones_like 12.3629ms 6.2157ms 160.8827 Ops/s 168.7361 Ops/s $\color{#d91a1a}-4.65\%$
test_clone 15.3356ms 8.0565ms 124.1230 Ops/s 132.4394 Ops/s $\textbf{\color{#d91a1a}-6.28\%}$
test_squeeze 66.8540μs 12.8595μs 77.7637 KOps/s 76.5473 KOps/s $\color{#35bf28}+1.59\%$
test_unsqueeze 0.2603ms 96.7769μs 10.3330 KOps/s 9.8892 KOps/s $\color{#35bf28}+4.49\%$
test_split 0.6016ms 0.2798ms 3.5740 KOps/s 3.5770 KOps/s $\color{#d91a1a}-0.08\%$
test_permute 0.4804ms 0.2268ms 4.4085 KOps/s 4.3523 KOps/s $\color{#35bf28}+1.29\%$
test_stack 23.8765ms 21.6894ms 46.1054 Ops/s 46.8003 Ops/s $\color{#d91a1a}-1.48\%$
test_cat 24.0961ms 21.2849ms 46.9816 Ops/s 47.5332 Ops/s $\color{#d91a1a}-1.16\%$

Copy link

github-actions bot commented Jul 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 26.1500μs 13.9320μs 71.7771 KOps/s 80.1548 KOps/s $\textbf{\color{#d91a1a}-10.45\%}$
test_plain_set_stack_nested 32.1810μs 13.8584μs 72.1582 KOps/s 79.9748 KOps/s $\textbf{\color{#d91a1a}-9.77\%}$
test_plain_set_nested_inplace 39.3300μs 15.1630μs 65.9499 KOps/s 72.6469 KOps/s $\textbf{\color{#d91a1a}-9.22\%}$
test_plain_set_stack_nested_inplace 31.8510μs 15.1347μs 66.0733 KOps/s 72.6988 KOps/s $\textbf{\color{#d91a1a}-9.11\%}$
test_items 28.5200μs 4.6457μs 215.2550 KOps/s 212.9098 KOps/s $\color{#35bf28}+1.10\%$
test_items_nested 0.3786ms 0.3407ms 2.9350 KOps/s 2.9311 KOps/s $\color{#35bf28}+0.13\%$
test_items_nested_locked 0.3591ms 0.3381ms 2.9581 KOps/s 2.8855 KOps/s $\color{#35bf28}+2.51\%$
test_items_nested_leaf 99.1610μs 82.6127μs 12.1047 KOps/s 12.1198 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested 0.3712ms 0.3452ms 2.8970 KOps/s 2.9347 KOps/s $\color{#d91a1a}-1.29\%$
test_items_stack_nested_leaf 0.1058ms 83.2768μs 12.0081 KOps/s 11.7012 KOps/s $\color{#35bf28}+2.62\%$
test_items_stack_nested_locked 0.3755ms 0.3436ms 2.9107 KOps/s 2.9287 KOps/s $\color{#d91a1a}-0.61\%$
test_keys 28.2210μs 4.3562μs 229.5588 KOps/s 228.1846 KOps/s $\color{#35bf28}+0.60\%$
test_keys_nested 93.6020μs 68.6672μs 14.5630 KOps/s 14.4130 KOps/s $\color{#35bf28}+1.04\%$
test_keys_nested_locked 0.7580ms 74.2838μs 13.4619 KOps/s 13.4048 KOps/s $\color{#35bf28}+0.43\%$
test_keys_nested_leaf 84.8210μs 59.5444μs 16.7942 KOps/s 16.8691 KOps/s $\color{#d91a1a}-0.44\%$
test_keys_stack_nested 98.7410μs 68.1472μs 14.6741 KOps/s 14.3885 KOps/s $\color{#35bf28}+1.99\%$
test_keys_stack_nested_leaf 82.8410μs 57.4708μs 17.4002 KOps/s 16.8073 KOps/s $\color{#35bf28}+3.53\%$
test_keys_stack_nested_locked 97.4320μs 72.9814μs 13.7021 KOps/s 13.3041 KOps/s $\color{#35bf28}+2.99\%$
test_values 9.7870μs 1.8054μs 553.8880 KOps/s 555.2515 KOps/s $\color{#d91a1a}-0.25\%$
test_values_nested 65.5410μs 35.1973μs 28.4113 KOps/s 28.3311 KOps/s $\color{#35bf28}+0.28\%$
test_values_nested_locked 66.8010μs 37.1937μs 26.8863 KOps/s 26.8611 KOps/s $\color{#35bf28}+0.09\%$
test_values_nested_leaf 48.9510μs 31.4843μs 31.7619 KOps/s 31.5795 KOps/s $\color{#35bf28}+0.58\%$
test_values_stack_nested 66.9310μs 36.2381μs 27.5953 KOps/s 27.7524 KOps/s $\color{#d91a1a}-0.57\%$
test_values_stack_nested_leaf 59.2810μs 32.3150μs 30.9453 KOps/s 31.2352 KOps/s $\color{#d91a1a}-0.93\%$
test_values_stack_nested_locked 61.1010μs 38.1045μs 26.2436 KOps/s 26.6896 KOps/s $\color{#d91a1a}-1.67\%$
test_membership 1.7140μs 0.6932μs 1.4425 MOps/s 1.4463 MOps/s $\color{#d91a1a}-0.27\%$
test_membership_nested 17.6800μs 2.6391μs 378.9142 KOps/s 388.1110 KOps/s $\color{#d91a1a}-2.37\%$
test_membership_nested_leaf 30.5610μs 2.6160μs 382.2583 KOps/s 388.8326 KOps/s $\color{#d91a1a}-1.69\%$
test_membership_stacked_nested 20.2100μs 2.5756μs 388.2560 KOps/s 392.9615 KOps/s $\color{#d91a1a}-1.20\%$
test_membership_stacked_nested_leaf 17.8100μs 2.5794μs 387.6842 KOps/s 388.9431 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_last 32.8610μs 3.0640μs 326.3743 KOps/s 326.7200 KOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested_leaf_last 21.3110μs 3.1207μs 320.4453 KOps/s 325.7590 KOps/s $\color{#d91a1a}-1.63\%$
test_membership_stacked_nested_last 33.9400μs 3.1235μs 320.1540 KOps/s 323.4086 KOps/s $\color{#d91a1a}-1.01\%$
test_membership_stacked_nested_leaf_last 21.6310μs 3.1310μs 319.3901 KOps/s 325.6722 KOps/s $\color{#d91a1a}-1.93\%$
test_nested_getleaf 34.1800μs 8.2974μs 120.5197 KOps/s 119.0074 KOps/s $\color{#35bf28}+1.27\%$
test_nested_get 31.8900μs 7.8031μs 128.1541 KOps/s 127.4612 KOps/s $\color{#35bf28}+0.54\%$
test_stacked_getleaf 27.7010μs 8.3832μs 119.2858 KOps/s 118.8295 KOps/s $\color{#35bf28}+0.38\%$
test_stacked_get 30.8310μs 7.7933μs 128.3147 KOps/s 127.0020 KOps/s $\color{#35bf28}+1.03\%$
test_nested_getitemleaf 30.0700μs 8.4632μs 118.1589 KOps/s 116.4495 KOps/s $\color{#35bf28}+1.47\%$
test_nested_getitem 33.9410μs 7.9988μs 125.0180 KOps/s 124.5599 KOps/s $\color{#35bf28}+0.37\%$
test_stacked_getitemleaf 35.6310μs 8.5558μs 116.8797 KOps/s 116.1692 KOps/s $\color{#35bf28}+0.61\%$
test_stacked_getitem 25.0200μs 7.9922μs 125.1227 KOps/s 124.2232 KOps/s $\color{#35bf28}+0.72\%$
test_lock_nested 57.4349ms 0.3969ms 2.5194 KOps/s 2.4649 KOps/s $\color{#35bf28}+2.21\%$
test_lock_stack_nested 0.3425ms 0.2919ms 3.4258 KOps/s 3.3111 KOps/s $\color{#35bf28}+3.47\%$
test_unlock_nested 59.8476ms 0.3998ms 2.5015 KOps/s 2.4628 KOps/s $\color{#35bf28}+1.57\%$
test_unlock_stack_nested 0.3212ms 0.3017ms 3.3150 KOps/s 3.2362 KOps/s $\color{#35bf28}+2.43\%$
test_flatten_speed 0.4123ms 0.1017ms 9.8362 KOps/s 9.6626 KOps/s $\color{#35bf28}+1.80\%$
test_unflatten_speed 0.3139ms 0.2898ms 3.4502 KOps/s 3.4180 KOps/s $\color{#35bf28}+0.94\%$
test_common_ops 1.0930ms 0.6233ms 1.6043 KOps/s 1.7273 KOps/s $\textbf{\color{#d91a1a}-7.12\%}$
test_creation 29.0000μs 1.6481μs 606.7445 KOps/s 616.8917 KOps/s $\color{#d91a1a}-1.64\%$
test_creation_empty 26.4110μs 10.7088μs 93.3812 KOps/s 126.8563 KOps/s $\textbf{\color{#d91a1a}-26.39\%}$
test_creation_nested_1 28.0310μs 12.5112μs 79.9284 KOps/s 103.4794 KOps/s $\textbf{\color{#d91a1a}-22.76\%}$
test_creation_nested_2 51.3810μs 14.7616μs 67.7434 KOps/s 84.3790 KOps/s $\textbf{\color{#d91a1a}-19.72\%}$
test_clone 56.9610μs 11.6527μs 85.8174 KOps/s 85.3292 KOps/s $\color{#35bf28}+0.57\%$
test_getitem[int] 31.0600μs 10.6368μs 94.0133 KOps/s 94.0386 KOps/s $\color{#d91a1a}-0.03\%$
test_getitem[slice_int] 46.9210μs 20.4875μs 48.8102 KOps/s 48.8220 KOps/s $\color{#d91a1a}-0.02\%$
test_getitem[range] 64.4310μs 46.5046μs 21.5032 KOps/s 22.0678 KOps/s $\color{#d91a1a}-2.56\%$
test_getitem[tuple] 42.0410μs 18.3026μs 54.6370 KOps/s 54.0202 KOps/s $\color{#35bf28}+1.14\%$
test_getitem[list] 0.1519ms 31.9083μs 31.3398 KOps/s 30.5250 KOps/s $\color{#35bf28}+2.67\%$
test_setitem_dim[int] 66.8410μs 28.4367μs 35.1658 KOps/s 37.2978 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_setitem_dim[slice_int] 70.0310μs 48.5305μs 20.6056 KOps/s 20.8238 KOps/s $\color{#d91a1a}-1.05\%$
test_setitem_dim[range] 88.6910μs 65.2632μs 15.3226 KOps/s 15.8482 KOps/s $\color{#d91a1a}-3.32\%$
test_setitem_dim[tuple] 60.5010μs 42.1918μs 23.7013 KOps/s 23.9793 KOps/s $\color{#d91a1a}-1.16\%$
test_setitem 47.4700μs 17.6002μs 56.8174 KOps/s 62.6276 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_set 49.3110μs 17.1548μs 58.2927 KOps/s 63.7642 KOps/s $\textbf{\color{#d91a1a}-8.58\%}$
test_set_shared 1.6643ms 98.4177μs 10.1608 KOps/s 10.1831 KOps/s $\color{#d91a1a}-0.22\%$
test_update 90.7210μs 20.5981μs 48.5483 KOps/s 55.5269 KOps/s $\textbf{\color{#d91a1a}-12.57\%}$
test_update_nested 69.1020μs 26.5024μs 37.7325 KOps/s 42.1088 KOps/s $\textbf{\color{#d91a1a}-10.39\%}$
test_update__nested 67.7410μs 22.3876μs 44.6675 KOps/s 43.9879 KOps/s $\color{#35bf28}+1.55\%$
test_set_nested 54.0410μs 17.9797μs 55.6183 KOps/s 59.3294 KOps/s $\textbf{\color{#d91a1a}-6.26\%}$
test_set_nested_new 69.0810μs 20.8766μs 47.9006 KOps/s 51.3847 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_select 66.6510μs 33.4357μs 29.9081 KOps/s 30.4024 KOps/s $\color{#d91a1a}-1.63\%$
test_select_nested 0.6063ms 51.1052μs 19.5675 KOps/s 19.1088 KOps/s $\color{#35bf28}+2.40\%$
test_exclude_nested 0.1409ms 0.1077ms 9.2833 KOps/s 9.0389 KOps/s $\color{#35bf28}+2.70\%$
test_empty[True] 0.3779ms 0.3406ms 2.9360 KOps/s 2.8756 KOps/s $\color{#35bf28}+2.10\%$
test_empty[False] 2.7930μs 0.7932μs 1.2608 MOps/s 1.2431 MOps/s $\color{#35bf28}+1.42\%$
test_to 89.8920μs 58.9966μs 16.9501 KOps/s 17.1530 KOps/s $\color{#d91a1a}-1.18\%$
test_to_nonblocking 75.2710μs 35.2139μs 28.3979 KOps/s 27.6741 KOps/s $\color{#35bf28}+2.62\%$
test_unbind_speed 0.2927ms 0.2540ms 3.9376 KOps/s 3.8755 KOps/s $\color{#35bf28}+1.60\%$
test_unbind_speed_stack0 0.2927ms 0.2547ms 3.9258 KOps/s 3.8177 KOps/s $\color{#35bf28}+2.83\%$
test_unbind_speed_stack1 75.3276ms 0.7656ms 1.3062 KOps/s 1.2705 KOps/s $\color{#35bf28}+2.81\%$
test_split 75.0175ms 1.6498ms 606.1403 Ops/s 600.4106 Ops/s $\color{#35bf28}+0.95\%$
test_chunk 76.0235ms 1.6517ms 605.4437 Ops/s 599.9829 Ops/s $\color{#35bf28}+0.91\%$
test_creation[device0] 0.1280ms 59.3094μs 16.8607 KOps/s 17.5575 KOps/s $\color{#d91a1a}-3.97\%$
test_creation_from_tensor 0.1334ms 56.0179μs 17.8514 KOps/s 17.6168 KOps/s $\color{#35bf28}+1.33\%$
test_add_one[memmap_tensor0] 51.7810μs 6.8567μs 145.8435 KOps/s 146.0735 KOps/s $\color{#d91a1a}-0.16\%$
test_contiguous[memmap_tensor0] 29.5800μs 0.6601μs 1.5150 MOps/s 1.5153 MOps/s $\color{#d91a1a}-0.02\%$
test_stack[memmap_tensor0] 29.8500μs 4.6944μs 213.0203 KOps/s 210.0070 KOps/s $\color{#35bf28}+1.43\%$
test_memmaptd_index 1.0432ms 0.2764ms 3.6177 KOps/s 3.4875 KOps/s $\color{#35bf28}+3.73\%$
test_memmaptd_index_astensor 0.5899ms 0.3329ms 3.0043 KOps/s 2.8611 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_memmaptd_index_op 0.9466ms 0.6587ms 1.5182 KOps/s 1.5932 KOps/s $\color{#d91a1a}-4.71\%$
test_serialize_model 96.0782ms 90.8790ms 11.0036 Ops/s 10.4370 Ops/s $\textbf{\color{#35bf28}+5.43\%}$
test_serialize_model_pickle 1.3485s 1.2353s 0.8095 Ops/s 0.8083 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_weights 93.0226ms 89.2009ms 11.2106 Ops/s 9.6680 Ops/s $\textbf{\color{#35bf28}+15.96\%}$
test_serialize_weights_returnearly 0.2211s 75.8420ms 13.1853 Ops/s 13.2590 Ops/s $\color{#d91a1a}-0.56\%$
test_serialize_weights_pickle 1.4070s 1.2542s 0.7973 Ops/s 0.7975 Ops/s $\color{#d91a1a}-0.02\%$
test_reshape_pytree 51.2110μs 26.0183μs 38.4344 KOps/s 38.2878 KOps/s $\color{#35bf28}+0.38\%$
test_reshape_td 56.0110μs 31.6022μs 31.6433 KOps/s 31.5799 KOps/s $\color{#35bf28}+0.20\%$
test_view_pytree 48.8010μs 25.6677μs 38.9594 KOps/s 38.7218 KOps/s $\color{#35bf28}+0.61\%$
test_view_td 67.1210μs 36.4615μs 27.4262 KOps/s 26.1745 KOps/s $\color{#35bf28}+4.78\%$
test_unbind_pytree 54.9910μs 31.8505μs 31.3967 KOps/s 30.5265 KOps/s $\color{#35bf28}+2.85\%$
test_unbind_td 0.4760ms 39.5292μs 25.2978 KOps/s 24.7536 KOps/s $\color{#35bf28}+2.20\%$
test_split_pytree 65.4110μs 34.5099μs 28.9772 KOps/s 28.2453 KOps/s $\color{#35bf28}+2.59\%$
test_split_td 0.4740ms 38.9037μs 25.7045 KOps/s 25.6009 KOps/s $\color{#35bf28}+0.40\%$
test_add_pytree 64.7610μs 37.7100μs 26.5181 KOps/s 26.4722 KOps/s $\color{#35bf28}+0.17\%$
test_add_td 82.1310μs 52.8089μs 18.9362 KOps/s 19.6436 KOps/s $\color{#d91a1a}-3.60\%$
test_distributed 1.7340ms 85.8737μs 11.6450 KOps/s 11.4122 KOps/s $\color{#35bf28}+2.04\%$
test_tdmodule 37.3710μs 16.4573μs 60.7632 KOps/s 65.2114 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_tdmodule_dispatch 47.7100μs 31.7356μs 31.5103 KOps/s 33.5855 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_tdseq 42.2200μs 17.6800μs 56.5612 KOps/s 59.2304 KOps/s $\color{#d91a1a}-4.51\%$
test_tdseq_dispatch 63.1210μs 36.1993μs 27.6249 KOps/s 30.6682 KOps/s $\textbf{\color{#d91a1a}-9.92\%}$
test_instantiation_functorch 1.5290ms 1.4049ms 711.8021 Ops/s 706.5376 Ops/s $\color{#35bf28}+0.75\%$
test_instantiation_td 77.9213ms 1.0753ms 929.9976 Ops/s 922.8870 Ops/s $\color{#35bf28}+0.77\%$
test_exec_functorch 0.1887ms 0.1432ms 6.9826 KOps/s 6.9129 KOps/s $\color{#35bf28}+1.01\%$
test_exec_functional_call 0.3081ms 0.1323ms 7.5589 KOps/s 7.5565 KOps/s $\color{#35bf28}+0.03\%$
test_exec_td 0.1770ms 0.1278ms 7.8243 KOps/s 7.7741 KOps/s $\color{#35bf28}+0.64\%$
test_exec_td_decorator 0.5043ms 0.2011ms 4.9725 KOps/s 4.9529 KOps/s $\color{#35bf28}+0.39\%$
test_vmap_mlp_speed[True-True] 0.5914ms 0.5585ms 1.7904 KOps/s 1.7964 KOps/s $\color{#d91a1a}-0.34\%$
test_vmap_mlp_speed[True-False] 1.2862ms 0.5592ms 1.7883 KOps/s 1.7968 KOps/s $\color{#d91a1a}-0.47\%$
test_vmap_mlp_speed[False-True] 0.5506ms 0.4856ms 2.0594 KOps/s 2.0485 KOps/s $\color{#35bf28}+0.53\%$
test_vmap_mlp_speed[False-False] 0.6313ms 0.5115ms 1.9549 KOps/s 2.0096 KOps/s $\color{#d91a1a}-2.72\%$
test_vmap_mlp_speed_decorator[True-True] 0.9343ms 0.6594ms 1.5165 KOps/s 1.4251 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7961ms 0.6583ms 1.5190 KOps/s 1.6195 KOps/s $\textbf{\color{#d91a1a}-6.21\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7185ms 0.5780ms 1.7302 KOps/s 1.8290 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7028ms 0.5761ms 1.7358 KOps/s 1.8171 KOps/s $\color{#d91a1a}-4.47\%$
test_vmap_transformer_speed[True-True] 7.7595ms 7.3822ms 135.4607 Ops/s 135.9745 Ops/s $\color{#d91a1a}-0.38\%$
test_vmap_transformer_speed[True-False] 7.3799ms 7.3087ms 136.8223 Ops/s 134.7067 Ops/s $\color{#35bf28}+1.57\%$
test_vmap_transformer_speed[False-True] 7.3217ms 7.2553ms 137.8304 Ops/s 136.5672 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_transformer_speed[False-False] 7.7074ms 7.3509ms 136.0377 Ops/s 136.6042 Ops/s $\color{#d91a1a}-0.41\%$
test_vmap_transformer_speed_decorator[True-True] 18.6781ms 17.7377ms 56.3771 Ops/s 56.0912 Ops/s $\color{#35bf28}+0.51\%$
test_vmap_transformer_speed_decorator[True-False] 17.8692ms 17.7769ms 56.2529 Ops/s 56.1483 Ops/s $\color{#35bf28}+0.19\%$
test_vmap_transformer_speed_decorator[False-True] 18.8808ms 17.6272ms 56.7305 Ops/s 56.5991 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed_decorator[False-False] 18.2325ms 17.6550ms 56.6411 Ops/s 56.3115 Ops/s $\color{#35bf28}+0.59\%$
test_to_module_speed[True] 1.5995ms 1.4811ms 675.1798 Ops/s 669.7717 Ops/s $\color{#35bf28}+0.81\%$
test_to_module_speed[False] 1.5838ms 1.4791ms 676.1063 Ops/s 677.8007 Ops/s $\color{#d91a1a}-0.25\%$
test_tc_init 83.4710μs 59.5664μs 16.7880 KOps/s 19.4609 KOps/s $\textbf{\color{#d91a1a}-13.73\%}$
test_tc_init_nested 0.1507ms 0.1174ms 8.5191 KOps/s 9.7781 KOps/s $\textbf{\color{#d91a1a}-12.88\%}$
test_tc_first_layer_tensor 23.8410μs 3.6843μs 271.4221 KOps/s 268.8321 KOps/s $\color{#35bf28}+0.96\%$
test_tc_first_layer_nontensor 26.1200μs 3.7084μs 269.6562 KOps/s 267.0216 KOps/s $\color{#35bf28}+0.99\%$
test_tc_second_layer_tensor 6.5002μs 1.2089μs 827.1746 KOps/s 789.0325 KOps/s $\color{#35bf28}+4.83\%$
test_tc_second_layer_nontensor 24.3000μs 4.2164μs 237.1689 KOps/s 234.1734 KOps/s $\color{#35bf28}+1.28\%$
test_unbind 0.1115s 14.7457ms 67.8162 Ops/s 69.4159 Ops/s $\color{#d91a1a}-2.30\%$
test_full_like 14.3288ms 13.5820ms 73.6267 Ops/s 72.2418 Ops/s $\color{#35bf28}+1.92\%$
test_zeros_like 8.3559ms 7.9687ms 125.4908 Ops/s 124.7093 Ops/s $\color{#35bf28}+0.63\%$
test_ones_like 8.5419ms 7.9642ms 125.5618 Ops/s 124.9674 Ops/s $\color{#35bf28}+0.48\%$
test_clone 9.7521ms 9.4975ms 105.2905 Ops/s 100.2970 Ops/s $\color{#35bf28}+4.98\%$
test_squeeze 72.1410μs 10.8217μs 92.4065 KOps/s 94.4496 KOps/s $\color{#d91a1a}-2.16\%$
test_unsqueeze 0.1345ms 87.2276μs 11.4643 KOps/s 11.1088 KOps/s $\color{#35bf28}+3.20\%$
test_split 3.4263ms 3.1011ms 322.4684 Ops/s 311.7268 Ops/s $\color{#35bf28}+3.45\%$
test_permute 0.2630ms 0.2061ms 4.8509 KOps/s 4.9157 KOps/s $\color{#d91a1a}-1.32\%$
test_stack 27.4510ms 27.2169ms 36.7419 Ops/s 36.3331 Ops/s $\color{#35bf28}+1.13\%$
test_cat 27.2567ms 27.0044ms 37.0309 Ops/s 36.3955 Ops/s $\color{#35bf28}+1.75\%$

@vmoens vmoens merged commit f9ef888 into main Jul 5, 2024
9 of 19 checks passed
@vmoens vmoens deleted the fix-values-ordering branch July 5, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
2 participants