Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] determinstic_sample for composite dist #827

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 24, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2024
@vmoens vmoens added the enhancement New feature or request label Jun 24, 2024
@vmoens vmoens merged commit d612144 into main Jun 24, 2024
15 of 25 checks passed
@vmoens vmoens deleted the composite-dist-deterministic branch June 24, 2024 09:33
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.7590μs 16.7688μs 59.6344 KOps/s 61.9688 KOps/s $\color{#d91a1a}-3.77\%$
test_plain_set_stack_nested 38.2110μs 16.9339μs 59.0532 KOps/s 60.9168 KOps/s $\color{#d91a1a}-3.06\%$
test_plain_set_nested_inplace 56.8260μs 19.4013μs 51.5429 KOps/s 53.8980 KOps/s $\color{#d91a1a}-4.37\%$
test_plain_set_stack_nested_inplace 43.5410μs 18.9837μs 52.6769 KOps/s 54.3214 KOps/s $\color{#d91a1a}-3.03\%$
test_items 37.9110μs 2.5030μs 399.5129 KOps/s 396.4423 KOps/s $\color{#35bf28}+0.77\%$
test_items_nested 0.8117ms 0.2660ms 3.7591 KOps/s 3.7702 KOps/s $\color{#d91a1a}-0.29\%$
test_items_nested_locked 0.5080ms 0.2637ms 3.7925 KOps/s 3.7868 KOps/s $\color{#35bf28}+0.15\%$
test_items_nested_leaf 0.1301ms 77.7429μs 12.8629 KOps/s 12.9970 KOps/s $\color{#d91a1a}-1.03\%$
test_items_stack_nested 0.5125ms 0.2665ms 3.7524 KOps/s 3.7800 KOps/s $\color{#d91a1a}-0.73\%$
test_items_stack_nested_leaf 0.1651ms 76.6818μs 13.0409 KOps/s 13.0557 KOps/s $\color{#d91a1a}-0.11\%$
test_items_stack_nested_locked 0.4487ms 0.2663ms 3.7546 KOps/s 3.7969 KOps/s $\color{#d91a1a}-1.11\%$
test_keys 20.9990μs 3.8616μs 258.9621 KOps/s 172.9426 KOps/s $\textbf{\color{#35bf28}+49.74\%}$
test_keys_nested 0.2623ms 0.1398ms 7.1549 KOps/s 7.2432 KOps/s $\color{#d91a1a}-1.22\%$
test_keys_nested_locked 0.7095ms 0.1460ms 6.8487 KOps/s 6.9718 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_nested_leaf 0.2376ms 0.1182ms 8.4582 KOps/s 8.4915 KOps/s $\color{#d91a1a}-0.39\%$
test_keys_stack_nested 0.2170ms 0.1399ms 7.1499 KOps/s 7.2724 KOps/s $\color{#d91a1a}-1.69\%$
test_keys_stack_nested_leaf 0.2496ms 0.1188ms 8.4178 KOps/s 8.5408 KOps/s $\color{#d91a1a}-1.44\%$
test_keys_stack_nested_locked 0.2366ms 0.1456ms 6.8675 KOps/s 7.0321 KOps/s $\color{#d91a1a}-2.34\%$
test_values 9.8760μs 1.1744μs 851.5007 KOps/s 861.5391 KOps/s $\color{#d91a1a}-1.17\%$
test_values_nested 0.1063ms 51.3600μs 19.4704 KOps/s 19.9910 KOps/s $\color{#d91a1a}-2.60\%$
test_values_nested_locked 0.1085ms 50.6017μs 19.7622 KOps/s 19.7940 KOps/s $\color{#d91a1a}-0.16\%$
test_values_nested_leaf 89.1660μs 46.0193μs 21.7300 KOps/s 21.8099 KOps/s $\color{#d91a1a}-0.37\%$
test_values_stack_nested 96.9310μs 53.8271μs 18.5780 KOps/s 19.7355 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_values_stack_nested_leaf 96.0090μs 46.9248μs 21.3107 KOps/s 21.9080 KOps/s $\color{#d91a1a}-2.73\%$
test_values_stack_nested_locked 98.6030μs 52.7847μs 18.9449 KOps/s 19.6365 KOps/s $\color{#d91a1a}-3.52\%$
test_membership 11.4010μs 1.3234μs 755.6380 KOps/s 759.4444 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_nested 34.6550μs 3.4157μs 292.7670 KOps/s 293.4854 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_nested_leaf 20.5080μs 3.4364μs 291.0010 KOps/s 291.1697 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_stacked_nested 37.3100μs 3.4140μs 292.9139 KOps/s 296.4372 KOps/s $\color{#d91a1a}-1.19\%$
test_membership_stacked_nested_leaf 25.9290μs 3.3924μs 294.7757 KOps/s 296.0550 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested_last 52.1710μs 4.1253μs 242.4067 KOps/s 239.1063 KOps/s $\color{#35bf28}+1.38\%$
test_membership_nested_leaf_last 22.3020μs 4.1565μs 240.5879 KOps/s 241.5540 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_stacked_nested_last 37.2860μs 5.2327μs 191.1050 KOps/s 239.8694 KOps/s $\textbf{\color{#d91a1a}-20.33\%}$
test_membership_stacked_nested_leaf_last 38.9520μs 5.2248μs 191.3966 KOps/s 237.0256 KOps/s $\textbf{\color{#d91a1a}-19.25\%}$
test_nested_getleaf 50.9150μs 10.6291μs 94.0812 KOps/s 94.4616 KOps/s $\color{#d91a1a}-0.40\%$
test_nested_get 51.1960μs 9.9808μs 100.1923 KOps/s 100.1431 KOps/s $\color{#35bf28}+0.05\%$
test_stacked_getleaf 35.4060μs 10.3435μs 96.6794 KOps/s 95.4085 KOps/s $\color{#35bf28}+1.33\%$
test_stacked_get 47.2380μs 9.7865μs 102.1815 KOps/s 100.3556 KOps/s $\color{#35bf28}+1.82\%$
test_nested_getitemleaf 54.3520μs 11.1303μs 89.8449 KOps/s 88.0925 KOps/s $\color{#35bf28}+1.99\%$
test_nested_getitem 38.8320μs 10.3881μs 96.2644 KOps/s 96.8074 KOps/s $\color{#d91a1a}-0.56\%$
test_stacked_getitemleaf 52.8780μs 10.9969μs 90.9343 KOps/s 90.7921 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_getitem 54.4710μs 10.2336μs 97.7177 KOps/s 98.4071 KOps/s $\color{#d91a1a}-0.70\%$
test_lock_nested 50.6845ms 0.3885ms 2.5743 KOps/s 2.9692 KOps/s $\textbf{\color{#d91a1a}-13.30\%}$
test_lock_stack_nested 0.4587ms 0.2996ms 3.3373 KOps/s 3.2367 KOps/s $\color{#35bf28}+3.11\%$
test_unlock_nested 0.6806ms 0.3390ms 2.9497 KOps/s 2.8940 KOps/s $\color{#35bf28}+1.92\%$
test_unlock_stack_nested 0.4014ms 0.3076ms 3.2511 KOps/s 3.1498 KOps/s $\color{#35bf28}+3.22\%$
test_flatten_speed 0.1887ms 94.0362μs 10.6342 KOps/s 10.4412 KOps/s $\color{#35bf28}+1.85\%$
test_unflatten_speed 0.5050ms 0.4079ms 2.4516 KOps/s 2.4834 KOps/s $\color{#d91a1a}-1.28\%$
test_common_ops 5.5372ms 0.7258ms 1.3777 KOps/s 1.4508 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_creation 36.9890μs 1.8525μs 539.8020 KOps/s 529.4516 KOps/s $\color{#35bf28}+1.95\%$
test_creation_empty 26.0690μs 10.6905μs 93.5411 KOps/s 106.2533 KOps/s $\textbf{\color{#d91a1a}-11.96\%}$
test_creation_nested_1 43.5920μs 13.4427μs 74.3897 KOps/s 82.4778 KOps/s $\textbf{\color{#d91a1a}-9.81\%}$
test_creation_nested_2 66.3140μs 16.8214μs 59.4482 KOps/s 65.1451 KOps/s $\textbf{\color{#d91a1a}-8.74\%}$
test_clone 1.3117ms 13.3476μs 74.9201 KOps/s 74.4311 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[int] 34.3740μs 11.4478μs 87.3530 KOps/s 88.9394 KOps/s $\color{#d91a1a}-1.78\%$
test_getitem[slice_int] 66.3940μs 22.8291μs 43.8037 KOps/s 43.5335 KOps/s $\color{#35bf28}+0.62\%$
test_getitem[range] 77.8040μs 60.3202μs 16.5782 KOps/s 17.0966 KOps/s $\color{#d91a1a}-3.03\%$
test_getitem[tuple] 58.4790μs 18.4156μs 54.3017 KOps/s 52.5147 KOps/s $\color{#35bf28}+3.40\%$
test_getitem[list] 0.1010ms 39.8187μs 25.1138 KOps/s 24.2512 KOps/s $\color{#35bf28}+3.56\%$
test_setitem_dim[int] 72.5250μs 34.9592μs 28.6048 KOps/s 30.1101 KOps/s $\color{#d91a1a}-5.00\%$
test_setitem_dim[slice_int] 0.1065ms 60.7791μs 16.4530 KOps/s 16.5049 KOps/s $\color{#d91a1a}-0.31\%$
test_setitem_dim[range] 0.1421ms 81.6169μs 12.2524 KOps/s 12.2710 KOps/s $\color{#d91a1a}-0.15\%$
test_setitem_dim[tuple] 96.9200μs 48.9510μs 20.4286 KOps/s 21.0836 KOps/s $\color{#d91a1a}-3.11\%$
test_setitem 59.1810μs 20.1030μs 49.7438 KOps/s 51.2136 KOps/s $\color{#d91a1a}-2.87\%$
test_set 61.4940μs 19.8021μs 50.4997 KOps/s 52.6100 KOps/s $\color{#d91a1a}-4.01\%$
test_set_shared 1.6563ms 0.1408ms 7.1034 KOps/s 7.0804 KOps/s $\color{#35bf28}+0.32\%$
test_update 0.1291ms 21.6985μs 46.0861 KOps/s 48.0211 KOps/s $\color{#d91a1a}-4.03\%$
test_update_nested 93.1640μs 30.9521μs 32.3079 KOps/s 34.1054 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_update__nested 58.6990μs 24.8234μs 40.2845 KOps/s 40.0234 KOps/s $\color{#35bf28}+0.65\%$
test_set_nested 72.6850μs 21.3275μs 46.8878 KOps/s 47.7203 KOps/s $\color{#d91a1a}-1.74\%$
test_set_nested_new 77.0430μs 25.4366μs 39.3134 KOps/s 39.9602 KOps/s $\color{#d91a1a}-1.62\%$
test_select 0.1007ms 40.0620μs 24.9613 KOps/s 24.8976 KOps/s $\color{#35bf28}+0.26\%$
test_select_nested 0.1134ms 59.0599μs 16.9320 KOps/s 17.1073 KOps/s $\color{#d91a1a}-1.02\%$
test_exclude_nested 0.2205ms 0.1181ms 8.4684 KOps/s 8.4910 KOps/s $\color{#d91a1a}-0.27\%$
test_empty[True] 1.1521ms 0.3924ms 2.5482 KOps/s 2.5748 KOps/s $\color{#d91a1a}-1.03\%$
test_empty[False] 8.8506μs 1.1328μs 882.7782 KOps/s 859.1529 KOps/s $\color{#35bf28}+2.75\%$
test_unbind_speed 0.5102ms 0.2573ms 3.8867 KOps/s 3.9650 KOps/s $\color{#d91a1a}-1.97\%$
test_unbind_speed_stack0 0.4229ms 0.2498ms 4.0038 KOps/s 3.9602 KOps/s $\color{#35bf28}+1.10\%$
test_unbind_speed_stack1 68.3351ms 0.7148ms 1.3989 KOps/s 1.3572 KOps/s $\color{#35bf28}+3.08\%$
test_split 65.8132ms 1.5941ms 627.3128 Ops/s 625.8951 Ops/s $\color{#35bf28}+0.23\%$
test_chunk 70.0107ms 1.6550ms 604.2236 Ops/s 624.2109 Ops/s $\color{#d91a1a}-3.20\%$
test_creation[device0] 0.2319ms 84.0966μs 11.8911 KOps/s 12.2035 KOps/s $\color{#d91a1a}-2.56\%$
test_creation_from_tensor 0.2629ms 82.6097μs 12.1051 KOps/s 11.6829 KOps/s $\color{#35bf28}+3.61\%$
test_add_one[memmap_tensor0] 77.9850μs 5.2730μs 189.6460 KOps/s 187.0717 KOps/s $\color{#35bf28}+1.38\%$
test_contiguous[memmap_tensor0] 13.3450μs 0.6448μs 1.5508 MOps/s 1.5741 MOps/s $\color{#d91a1a}-1.48\%$
test_stack[memmap_tensor0] 25.3170μs 3.5464μs 281.9778 KOps/s 279.3187 KOps/s $\color{#35bf28}+0.95\%$
test_memmaptd_index 1.0340ms 0.2459ms 4.0665 KOps/s 3.9169 KOps/s $\color{#35bf28}+3.82\%$
test_memmaptd_index_astensor 0.7527ms 0.3189ms 3.1354 KOps/s 3.0373 KOps/s $\color{#35bf28}+3.23\%$
test_memmaptd_index_op 1.0372ms 0.6021ms 1.6608 KOps/s 1.7262 KOps/s $\color{#d91a1a}-3.78\%$
test_serialize_model 0.1770s 0.1125s 8.8897 Ops/s 8.3393 Ops/s $\textbf{\color{#35bf28}+6.60\%}$
test_serialize_model_pickle 0.4493s 0.3779s 2.6462 Ops/s 2.6295 Ops/s $\color{#35bf28}+0.64\%$
test_serialize_weights 0.1654s 0.1090s 9.1714 Ops/s 8.7869 Ops/s $\color{#35bf28}+4.38\%$
test_serialize_weights_returnearly 0.1843s 0.1328s 7.5275 Ops/s 7.1323 Ops/s $\textbf{\color{#35bf28}+5.54\%}$
test_serialize_weights_pickle 1.1361s 0.6087s 1.6428 Ops/s 1.4547 Ops/s $\textbf{\color{#35bf28}+12.93\%}$
test_serialize_weights_filesystem 0.1024s 90.5629ms 11.0420 Ops/s 10.7003 Ops/s $\color{#35bf28}+3.19\%$
test_serialize_model_filesystem 0.1533s 98.9581ms 10.1053 Ops/s 10.0982 Ops/s $\color{#35bf28}+0.07\%$
test_reshape_pytree 67.4860μs 25.6125μs 39.0434 KOps/s 38.5157 KOps/s $\color{#35bf28}+1.37\%$
test_reshape_td 65.6120μs 34.5695μs 28.9272 KOps/s 28.5968 KOps/s $\color{#35bf28}+1.16\%$
test_view_pytree 59.1800μs 25.3340μs 39.4727 KOps/s 39.4012 KOps/s $\color{#35bf28}+0.18\%$
test_view_td 82.3430μs 37.7298μs 26.5042 KOps/s 25.5907 KOps/s $\color{#35bf28}+3.57\%$
test_unbind_pytree 60.4730μs 28.3480μs 35.2759 KOps/s 33.7152 KOps/s $\color{#35bf28}+4.63\%$
test_unbind_td 0.3899ms 37.1182μs 26.9410 KOps/s 26.4827 KOps/s $\color{#35bf28}+1.73\%$
test_split_pytree 74.4890μs 28.8684μs 34.6399 KOps/s 33.5916 KOps/s $\color{#35bf28}+3.12\%$
test_split_td 0.1223ms 40.3764μs 24.7669 KOps/s 24.5520 KOps/s $\color{#35bf28}+0.88\%$
test_add_pytree 87.4920μs 33.9013μs 29.4974 KOps/s 28.7973 KOps/s $\color{#35bf28}+2.43\%$
test_add_td 0.1168ms 56.6874μs 17.6406 KOps/s 19.4693 KOps/s $\textbf{\color{#d91a1a}-9.39\%}$
test_distributed 0.1920ms 0.1005ms 9.9472 KOps/s 9.7789 KOps/s $\color{#35bf28}+1.72\%$
test_tdmodule 39.3540μs 17.2312μs 58.0344 KOps/s 59.3356 KOps/s $\color{#d91a1a}-2.19\%$
test_tdmodule_dispatch 59.8620μs 34.5874μs 28.9123 KOps/s 30.2210 KOps/s $\color{#d91a1a}-4.33\%$
test_tdseq 39.5730μs 20.1639μs 49.5936 KOps/s 49.6319 KOps/s $\color{#d91a1a}-0.08\%$
test_tdseq_dispatch 68.6880μs 39.4473μs 25.3503 KOps/s 25.8873 KOps/s $\color{#d91a1a}-2.07\%$
test_instantiation_functorch 2.0526ms 1.3463ms 742.7785 Ops/s 767.9545 Ops/s $\color{#d91a1a}-3.28\%$
test_instantiation_td 1.5932ms 1.0051ms 994.9084 Ops/s 991.9273 Ops/s $\color{#35bf28}+0.30\%$
test_exec_functorch 0.3473ms 0.1579ms 6.3339 KOps/s 6.2268 KOps/s $\color{#35bf28}+1.72\%$
test_exec_functional_call 0.2953ms 0.1482ms 6.7466 KOps/s 6.8132 KOps/s $\color{#d91a1a}-0.98\%$
test_exec_td 0.3503ms 0.1431ms 6.9863 KOps/s 7.0037 KOps/s $\color{#d91a1a}-0.25\%$
test_exec_td_decorator 0.7028ms 0.2166ms 4.6165 KOps/s 4.6030 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed[True-True] 0.9156ms 0.4783ms 2.0906 KOps/s 2.1135 KOps/s $\color{#d91a1a}-1.08\%$
test_vmap_mlp_speed[True-False] 0.7767ms 0.4716ms 2.1203 KOps/s 2.1588 KOps/s $\color{#d91a1a}-1.78\%$
test_vmap_mlp_speed[False-True] 0.4936ms 0.3824ms 2.6154 KOps/s 2.6273 KOps/s $\color{#d91a1a}-0.45\%$
test_vmap_mlp_speed[False-False] 8.8512ms 0.3892ms 2.5693 KOps/s 2.6324 KOps/s $\color{#d91a1a}-2.40\%$
test_vmap_mlp_speed_decorator[True-True] 1.0910ms 0.5431ms 1.8413 KOps/s 1.8533 KOps/s $\color{#d91a1a}-0.65\%$
test_vmap_mlp_speed_decorator[True-False] 0.6803ms 0.5429ms 1.8420 KOps/s 1.8772 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed_decorator[False-True] 0.5633ms 0.4447ms 2.2488 KOps/s 2.2487 KOps/s $+0.01\%$
test_vmap_mlp_speed_decorator[False-False] 0.8248ms 0.4466ms 2.2390 KOps/s 2.2594 KOps/s $\color{#d91a1a}-0.90\%$
test_to_module_speed[True] 2.5548ms 1.6682ms 599.4326 Ops/s 596.3715 Ops/s $\color{#35bf28}+0.51\%$
test_to_module_speed[False] 1.6959ms 1.6263ms 614.8925 Ops/s 541.6151 Ops/s $\textbf{\color{#35bf28}+13.53\%}$
test_tc_init 61.2040μs 28.8685μs 34.6399 KOps/s 38.8134 KOps/s $\textbf{\color{#d91a1a}-10.75\%}$
test_tc_init_nested 0.1378ms 59.8105μs 16.7195 KOps/s 19.1246 KOps/s $\textbf{\color{#d91a1a}-12.58\%}$
test_tc_first_layer_tensor 4.3710μs 0.6877μs 1.4541 MOps/s 1.4521 MOps/s $\color{#35bf28}+0.13\%$
test_tc_first_layer_nontensor 4.5915μs 0.6984μs 1.4319 MOps/s 1.4476 MOps/s $\color{#d91a1a}-1.08\%$
test_tc_second_layer_tensor 69.1390μs 1.9268μs 519.0057 KOps/s 538.6658 KOps/s $\color{#d91a1a}-3.65\%$
test_tc_second_layer_nontensor 52.5847μs 1.5189μs 658.3568 KOps/s 592.9207 KOps/s $\textbf{\color{#35bf28}+11.04\%}$
test_unbind 82.3391ms 7.1538ms 139.7862 Ops/s 151.9005 Ops/s $\textbf{\color{#d91a1a}-7.98\%}$
test_full_like 18.5020ms 10.6193ms 94.1678 Ops/s 91.8939 Ops/s $\color{#35bf28}+2.47\%$
test_zeros_like 11.5032ms 5.5313ms 180.7900 Ops/s 171.8965 Ops/s $\textbf{\color{#35bf28}+5.17\%}$
test_ones_like 12.0021ms 5.9599ms 167.7886 Ops/s 159.1026 Ops/s $\textbf{\color{#35bf28}+5.46\%}$
test_clone 14.0439ms 7.7876ms 128.4097 Ops/s 123.8621 Ops/s $\color{#35bf28}+3.67\%$
test_squeeze 60.0410μs 14.0664μs 71.0915 KOps/s 72.3519 KOps/s $\color{#d91a1a}-1.74\%$
test_unsqueeze 0.1353ms 59.8462μs 16.7095 KOps/s 16.3973 KOps/s $\color{#35bf28}+1.90\%$
test_split 0.2763ms 0.1132ms 8.8335 KOps/s 8.8799 KOps/s $\color{#d91a1a}-0.52\%$
test_permute 0.2291ms 0.1243ms 8.0438 KOps/s 7.8524 KOps/s $\color{#35bf28}+2.44\%$
test_stack 29.4246ms 22.2994ms 44.8442 Ops/s 44.3634 Ops/s $\color{#35bf28}+1.08\%$
test_cat 29.8169ms 22.0615ms 45.3278 Ops/s 45.3248 Ops/s $+0.01\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.1400μs 13.6011μs 73.5234 KOps/s 79.5879 KOps/s $\textbf{\color{#d91a1a}-7.62\%}$
test_plain_set_stack_nested 28.3210μs 13.7056μs 72.9628 KOps/s 78.7725 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_plain_set_nested_inplace 37.1300μs 14.8804μs 67.2024 KOps/s 72.1146 KOps/s $\textbf{\color{#d91a1a}-6.81\%}$
test_plain_set_stack_nested_inplace 36.0310μs 14.8955μs 67.1346 KOps/s 71.7004 KOps/s $\textbf{\color{#d91a1a}-6.37\%}$
test_items 23.4500μs 4.7558μs 210.2703 KOps/s 208.1048 KOps/s $\color{#35bf28}+1.04\%$
test_items_nested 0.3714ms 0.3379ms 2.9595 KOps/s 3.0103 KOps/s $\color{#d91a1a}-1.69\%$
test_items_nested_locked 0.3812ms 0.3535ms 2.8287 KOps/s 2.9521 KOps/s $\color{#d91a1a}-4.18\%$
test_items_nested_leaf 0.1058ms 83.5106μs 11.9745 KOps/s 12.1461 KOps/s $\color{#d91a1a}-1.41\%$
test_items_stack_nested 0.3644ms 0.3389ms 2.9509 KOps/s 2.9544 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested_leaf 0.1009ms 84.5405μs 11.8287 KOps/s 11.9115 KOps/s $\color{#d91a1a}-0.70\%$
test_items_stack_nested_locked 0.3687ms 0.3393ms 2.9475 KOps/s 2.9447 KOps/s $\color{#35bf28}+0.10\%$
test_keys 22.3200μs 4.3264μs 231.1400 KOps/s 229.6119 KOps/s $\color{#35bf28}+0.67\%$
test_keys_nested 96.1910μs 68.8103μs 14.5327 KOps/s 14.7027 KOps/s $\color{#d91a1a}-1.16\%$
test_keys_nested_locked 2.1207ms 73.2431μs 13.6532 KOps/s 13.7518 KOps/s $\color{#d91a1a}-0.72\%$
test_keys_nested_leaf 78.7710μs 58.8983μs 16.9784 KOps/s 17.0655 KOps/s $\color{#d91a1a}-0.51\%$
test_keys_stack_nested 92.8510μs 68.0618μs 14.6925 KOps/s 14.7856 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_stack_nested_leaf 81.8210μs 58.7078μs 17.0335 KOps/s 17.0779 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_stack_nested_locked 0.1059ms 73.4163μs 13.6209 KOps/s 13.7437 KOps/s $\color{#d91a1a}-0.89\%$
test_values 8.7367μs 1.8483μs 541.0489 KOps/s 550.9556 KOps/s $\color{#d91a1a}-1.80\%$
test_values_nested 67.9610μs 35.3198μs 28.3127 KOps/s 28.0344 KOps/s $\color{#35bf28}+0.99\%$
test_values_nested_locked 55.1410μs 37.1544μs 26.9147 KOps/s 26.9196 KOps/s $\color{#d91a1a}-0.02\%$
test_values_nested_leaf 53.0600μs 31.5596μs 31.6860 KOps/s 31.8195 KOps/s $\color{#d91a1a}-0.42\%$
test_values_stack_nested 55.2500μs 36.2392μs 27.5945 KOps/s 27.5365 KOps/s $\color{#35bf28}+0.21\%$
test_values_stack_nested_leaf 56.2600μs 32.2400μs 31.0174 KOps/s 31.2157 KOps/s $\color{#d91a1a}-0.64\%$
test_values_stack_nested_locked 59.5800μs 38.0108μs 26.3083 KOps/s 26.5271 KOps/s $\color{#d91a1a}-0.82\%$
test_membership 1.7350μs 0.7049μs 1.4186 MOps/s 1.4170 MOps/s $\color{#35bf28}+0.12\%$
test_membership_nested 27.5200μs 2.5772μs 388.0153 KOps/s 395.0789 KOps/s $\color{#d91a1a}-1.79\%$
test_membership_nested_leaf 18.0600μs 2.5791μs 387.7374 KOps/s 389.0102 KOps/s $\color{#d91a1a}-0.33\%$
test_membership_stacked_nested 15.3100μs 2.5460μs 392.7739 KOps/s 388.6895 KOps/s $\color{#35bf28}+1.05\%$
test_membership_stacked_nested_leaf 33.2600μs 2.6009μs 384.4857 KOps/s 388.6486 KOps/s $\color{#d91a1a}-1.07\%$
test_membership_nested_last 21.3710μs 3.1411μs 318.3549 KOps/s 322.4275 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_nested_leaf_last 57.4000μs 3.1072μs 321.8281 KOps/s 322.0654 KOps/s $\color{#d91a1a}-0.07\%$
test_membership_stacked_nested_last 14.4500μs 3.0851μs 324.1425 KOps/s 324.1752 KOps/s $\color{#d91a1a}-0.01\%$
test_membership_stacked_nested_leaf_last 21.1300μs 3.1227μs 320.2376 KOps/s 323.0815 KOps/s $\color{#d91a1a}-0.88\%$
test_nested_getleaf 38.3900μs 8.4275μs 118.6585 KOps/s 118.9320 KOps/s $\color{#d91a1a}-0.23\%$
test_nested_get 31.6710μs 7.9868μs 125.2068 KOps/s 126.2895 KOps/s $\color{#d91a1a}-0.86\%$
test_stacked_getleaf 25.6810μs 8.4666μs 118.1113 KOps/s 118.7480 KOps/s $\color{#d91a1a}-0.54\%$
test_stacked_get 36.2510μs 7.9608μs 125.6153 KOps/s 125.5631 KOps/s $\color{#35bf28}+0.04\%$
test_nested_getitemleaf 33.2400μs 8.6093μs 116.1529 KOps/s 116.4313 KOps/s $\color{#d91a1a}-0.24\%$
test_nested_getitem 22.3410μs 8.1261μs 123.0607 KOps/s 123.8858 KOps/s $\color{#d91a1a}-0.67\%$
test_stacked_getitemleaf 35.5200μs 8.6354μs 115.8022 KOps/s 115.6223 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_getitem 23.2400μs 8.0920μs 123.5788 KOps/s 123.6282 KOps/s $\color{#d91a1a}-0.04\%$
test_lock_nested 57.6021ms 0.4025ms 2.4845 KOps/s 2.4782 KOps/s $\color{#35bf28}+0.26\%$
test_lock_stack_nested 0.3438ms 0.2979ms 3.3565 KOps/s 3.3136 KOps/s $\color{#35bf28}+1.29\%$
test_unlock_nested 59.5620ms 0.4010ms 2.4936 KOps/s 2.4601 KOps/s $\color{#35bf28}+1.36\%$
test_unlock_stack_nested 0.3302ms 0.3059ms 3.2687 KOps/s 3.2277 KOps/s $\color{#35bf28}+1.27\%$
test_flatten_speed 0.3332ms 0.1003ms 9.9711 KOps/s 9.7761 KOps/s $\color{#35bf28}+2.00\%$
test_unflatten_speed 0.3332ms 0.2894ms 3.4558 KOps/s 3.4528 KOps/s $\color{#35bf28}+0.09\%$
test_common_ops 1.0617ms 0.6113ms 1.6360 KOps/s 1.7071 KOps/s $\color{#d91a1a}-4.17\%$
test_creation 15.8400μs 1.6664μs 600.0833 KOps/s 592.4725 KOps/s $\color{#35bf28}+1.28\%$
test_creation_empty 30.0300μs 10.0362μs 99.6396 KOps/s 121.7439 KOps/s $\textbf{\color{#d91a1a}-18.16\%}$
test_creation_nested_1 40.9610μs 11.8528μs 84.3683 KOps/s 100.2829 KOps/s $\textbf{\color{#d91a1a}-15.87\%}$
test_creation_nested_2 30.7400μs 13.9572μs 71.6477 KOps/s 81.0803 KOps/s $\textbf{\color{#d91a1a}-11.63\%}$
test_clone 73.3510μs 11.9976μs 83.3499 KOps/s 82.2989 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[int] 34.2010μs 10.7197μs 93.2861 KOps/s 93.5129 KOps/s $\color{#d91a1a}-0.24\%$
test_getitem[slice_int] 37.6600μs 20.5519μs 48.6574 KOps/s 44.4520 KOps/s $\textbf{\color{#35bf28}+9.46\%}$
test_getitem[range] 65.4010μs 47.2664μs 21.1567 KOps/s 20.9633 KOps/s $\color{#35bf28}+0.92\%$
test_getitem[tuple] 57.9210μs 18.6477μs 53.6260 KOps/s 55.3036 KOps/s $\color{#d91a1a}-3.03\%$
test_getitem[list] 0.1213ms 34.6526μs 28.8579 KOps/s 28.7955 KOps/s $\color{#35bf28}+0.22\%$
test_setitem_dim[int] 46.5000μs 30.3392μs 32.9607 KOps/s 35.0590 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_setitem_dim[slice_int] 68.0310μs 50.2780μs 19.8894 KOps/s 19.9888 KOps/s $\color{#d91a1a}-0.50\%$
test_setitem_dim[range] 91.4110μs 68.7103μs 14.5539 KOps/s 15.0561 KOps/s $\color{#d91a1a}-3.34\%$
test_setitem_dim[tuple] 61.3610μs 44.3869μs 22.5292 KOps/s 23.5986 KOps/s $\color{#d91a1a}-4.53\%$
test_setitem 55.4700μs 17.8116μs 56.1431 KOps/s 60.1325 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_set 47.7210μs 16.8989μs 59.1753 KOps/s 61.6229 KOps/s $\color{#d91a1a}-3.97\%$
test_set_shared 1.3410ms 99.4934μs 10.0509 KOps/s 9.9728 KOps/s $\color{#35bf28}+0.78\%$
test_update 90.2710μs 20.8175μs 48.0366 KOps/s 51.7720 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_update_nested 65.6800μs 26.8669μs 37.2205 KOps/s 37.9202 KOps/s $\color{#d91a1a}-1.85\%$
test_update__nested 64.0710μs 23.1564μs 43.1845 KOps/s 43.5216 KOps/s $\color{#d91a1a}-0.77\%$
test_set_nested 75.2110μs 18.4582μs 54.1763 KOps/s 54.9088 KOps/s $\color{#d91a1a}-1.33\%$
test_set_nested_new 54.3810μs 21.2430μs 47.0744 KOps/s 47.7396 KOps/s $\color{#d91a1a}-1.39\%$
test_select 75.2010μs 34.6907μs 28.8262 KOps/s 30.1349 KOps/s $\color{#d91a1a}-4.34\%$
test_select_nested 0.5340ms 55.7732μs 17.9297 KOps/s 17.7511 KOps/s $\color{#35bf28}+1.01\%$
test_exclude_nested 0.1363ms 0.1118ms 8.9460 KOps/s 8.9434 KOps/s $\color{#35bf28}+0.03\%$
test_empty[True] 0.3681ms 0.3462ms 2.8885 KOps/s 2.8651 KOps/s $\color{#35bf28}+0.82\%$
test_empty[False] 2.8081μs 0.9147μs 1.0933 MOps/s 1.0731 MOps/s $\color{#35bf28}+1.88\%$
test_to 0.1034ms 76.7541μs 13.0286 KOps/s 13.2109 KOps/s $\color{#d91a1a}-1.38\%$
test_to_nonblocking 94.2810μs 62.1432μs 16.0919 KOps/s 15.0977 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_unbind_speed 0.2996ms 0.2627ms 3.8067 KOps/s 3.8044 KOps/s $\color{#35bf28}+0.06\%$
test_unbind_speed_stack0 0.3021ms 0.2650ms 3.7740 KOps/s 3.7951 KOps/s $\color{#d91a1a}-0.56\%$
test_unbind_speed_stack1 74.3637ms 0.7960ms 1.2563 KOps/s 1.2410 KOps/s $\color{#35bf28}+1.24\%$
test_split 75.1542ms 1.6724ms 597.9311 Ops/s 602.0993 Ops/s $\color{#d91a1a}-0.69\%$
test_chunk 1.6123ms 1.5584ms 641.6693 Ops/s 600.9901 Ops/s $\textbf{\color{#35bf28}+6.77\%}$
test_creation[device0] 0.1280ms 57.6364μs 17.3502 KOps/s 17.0750 KOps/s $\color{#35bf28}+1.61\%$
test_creation_from_tensor 0.1504ms 54.0414μs 18.5043 KOps/s 17.2961 KOps/s $\textbf{\color{#35bf28}+6.99\%}$
test_add_one[memmap_tensor0] 76.5620μs 7.1016μs 140.8129 KOps/s 140.6538 KOps/s $\color{#35bf28}+0.11\%$
test_contiguous[memmap_tensor0] 9.7500μs 0.6547μs 1.5275 MOps/s 1.5123 MOps/s $\color{#35bf28}+1.01\%$
test_stack[memmap_tensor0] 32.4610μs 4.9535μs 201.8777 KOps/s 205.0057 KOps/s $\color{#d91a1a}-1.53\%$
test_memmaptd_index 1.3183ms 0.2861ms 3.4954 KOps/s 3.5103 KOps/s $\color{#d91a1a}-0.42\%$
test_memmaptd_index_astensor 74.7010ms 0.3909ms 2.5584 KOps/s 2.8267 KOps/s $\textbf{\color{#d91a1a}-9.49\%}$
test_memmaptd_index_op 1.2411ms 0.6862ms 1.4572 KOps/s 1.5197 KOps/s $\color{#d91a1a}-4.11\%$
test_serialize_model 0.1838s 0.1102s 9.0737 Ops/s 8.7096 Ops/s $\color{#35bf28}+4.18\%$
test_serialize_model_pickle 1.3666s 1.2377s 0.8079 Ops/s 0.8065 Ops/s $\color{#35bf28}+0.18\%$
test_serialize_weights 0.1765s 0.1076s 9.2978 Ops/s 8.7909 Ops/s $\textbf{\color{#35bf28}+5.77\%}$
test_serialize_weights_returnearly 0.2842s 0.1040s 9.6144 Ops/s 10.0374 Ops/s $\color{#d91a1a}-4.21\%$
test_serialize_weights_pickle 1.3520s 1.2480s 0.8013 Ops/s 0.8010 Ops/s $\color{#35bf28}+0.04\%$
test_reshape_pytree 62.5900μs 26.3363μs 37.9704 KOps/s 38.4287 KOps/s $\color{#d91a1a}-1.19\%$
test_reshape_td 68.1010μs 30.5651μs 32.7170 KOps/s 32.2472 KOps/s $\color{#35bf28}+1.46\%$
test_view_pytree 53.7210μs 26.0632μs 38.3683 KOps/s 39.3884 KOps/s $\color{#d91a1a}-2.59\%$
test_view_td 66.5910μs 35.4348μs 28.2208 KOps/s 28.6055 KOps/s $\color{#d91a1a}-1.34\%$
test_unbind_pytree 59.9200μs 31.6995μs 31.5463 KOps/s 31.5395 KOps/s $\color{#35bf28}+0.02\%$
test_unbind_td 0.4264ms 39.6862μs 25.1977 KOps/s 24.8462 KOps/s $\color{#35bf28}+1.41\%$
test_split_pytree 65.2610μs 34.1991μs 29.2406 KOps/s 29.2216 KOps/s $\color{#35bf28}+0.07\%$
test_split_td 0.2661ms 38.9969μs 25.6431 KOps/s 25.2035 KOps/s $\color{#35bf28}+1.74\%$
test_add_pytree 72.2010μs 38.3644μs 26.0658 KOps/s 26.4564 KOps/s $\color{#d91a1a}-1.48\%$
test_add_td 93.9210μs 52.3284μs 19.1101 KOps/s 19.6840 KOps/s $\color{#d91a1a}-2.92\%$
test_distributed 1.8805ms 68.4335μs 14.6127 KOps/s 14.8248 KOps/s $\color{#d91a1a}-1.43\%$
test_tdmodule 84.8910μs 15.5024μs 64.5061 KOps/s 70.3843 KOps/s $\textbf{\color{#d91a1a}-8.35\%}$
test_tdmodule_dispatch 46.9710μs 30.7803μs 32.4883 KOps/s 35.0177 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_tdseq 33.5610μs 17.3278μs 57.7107 KOps/s 61.1428 KOps/s $\textbf{\color{#d91a1a}-5.61\%}$
test_tdseq_dispatch 58.1200μs 34.6527μs 28.8577 KOps/s 31.2830 KOps/s $\textbf{\color{#d91a1a}-7.75\%}$
test_instantiation_functorch 1.6301ms 1.5313ms 653.0482 Ops/s 653.1604 Ops/s $\color{#d91a1a}-0.02\%$
test_instantiation_td 1.5587ms 1.0533ms 949.3808 Ops/s 957.4899 Ops/s $\color{#d91a1a}-0.85\%$
test_exec_functorch 0.1831ms 0.1501ms 6.6619 KOps/s 6.5833 KOps/s $\color{#35bf28}+1.19\%$
test_exec_functional_call 0.1873ms 0.1424ms 7.0201 KOps/s 6.9979 KOps/s $\color{#35bf28}+0.32\%$
test_exec_td 0.1696ms 0.1386ms 7.2152 KOps/s 6.9613 KOps/s $\color{#35bf28}+3.65\%$
test_exec_td_decorator 0.8130ms 0.2149ms 4.6527 KOps/s 4.6276 KOps/s $\color{#35bf28}+0.54\%$
test_vmap_mlp_speed[True-True] 0.7767ms 0.5887ms 1.6987 KOps/s 1.6701 KOps/s $\color{#35bf28}+1.71\%$
test_vmap_mlp_speed[True-False] 0.6688ms 0.5883ms 1.6998 KOps/s 1.6995 KOps/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed[False-True] 0.6131ms 0.5353ms 1.8680 KOps/s 1.9339 KOps/s $\color{#d91a1a}-3.41\%$
test_vmap_mlp_speed[False-False] 0.5680ms 0.5155ms 1.9398 KOps/s 1.9462 KOps/s $\color{#d91a1a}-0.33\%$
test_vmap_mlp_speed_decorator[True-True] 1.0188ms 0.6561ms 1.5242 KOps/s 1.5418 KOps/s $\color{#d91a1a}-1.14\%$
test_vmap_mlp_speed_decorator[True-False] 0.7358ms 0.6531ms 1.5311 KOps/s 1.5590 KOps/s $\color{#d91a1a}-1.79\%$
test_vmap_mlp_speed_decorator[False-True] 0.7494ms 0.5860ms 1.7064 KOps/s 1.7440 KOps/s $\color{#d91a1a}-2.15\%$
test_vmap_mlp_speed_decorator[False-False] 0.7157ms 0.5868ms 1.7041 KOps/s 1.7421 KOps/s $\color{#d91a1a}-2.18\%$
test_vmap_transformer_speed[True-True] 8.0426ms 7.6829ms 130.1585 Ops/s 129.6139 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed[True-False] 8.0609ms 7.6827ms 130.1628 Ops/s 130.0732 Ops/s $\color{#35bf28}+0.07\%$
test_vmap_transformer_speed[False-True] 8.0864ms 7.7658ms 128.7702 Ops/s 130.5461 Ops/s $\color{#d91a1a}-1.36\%$
test_vmap_transformer_speed[False-False] 7.7545ms 7.5909ms 131.7367 Ops/s 130.4334 Ops/s $\color{#35bf28}+1.00\%$
test_vmap_transformer_speed_decorator[True-True] 19.1351ms 18.6795ms 53.5346 Ops/s 53.3123 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed_decorator[True-False] 19.6616ms 18.7477ms 53.3399 Ops/s 53.2183 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed_decorator[False-True] 19.4852ms 18.6032ms 53.7541 Ops/s 53.5445 Ops/s $\color{#35bf28}+0.39\%$
test_vmap_transformer_speed_decorator[False-False] 19.4327ms 18.5969ms 53.7723 Ops/s 53.5799 Ops/s $\color{#35bf28}+0.36\%$
test_to_module_speed[True] 1.7112ms 1.5907ms 628.6642 Ops/s 632.7236 Ops/s $\color{#d91a1a}-0.64\%$
test_to_module_speed[False] 2.5889ms 1.5680ms 637.7448 Ops/s 650.1753 Ops/s $\color{#d91a1a}-1.91\%$
test_tc_init 48.6100μs 28.7890μs 34.7355 KOps/s 41.7950 KOps/s $\textbf{\color{#d91a1a}-16.89\%}$
test_tc_init_nested 89.8710μs 63.1593μs 15.8330 KOps/s 20.4764 KOps/s $\textbf{\color{#d91a1a}-22.68\%}$
test_tc_first_layer_tensor 0.9513μs 0.3583μs 2.7912 MOps/s 2.7490 MOps/s $\color{#35bf28}+1.53\%$
test_tc_first_layer_nontensor 2.0377μs 0.3901μs 2.5632 MOps/s 2.5483 MOps/s $\color{#35bf28}+0.59\%$
test_tc_second_layer_tensor 5.8800μs 0.9673μs 1.0338 MOps/s 1.0215 MOps/s $\color{#35bf28}+1.20\%$
test_tc_second_layer_nontensor 2.3435μs 0.8051μs 1.2420 MOps/s 1.1909 MOps/s $\color{#35bf28}+4.30\%$
test_unbind 0.1106s 7.5044ms 133.2545 Ops/s 142.7374 Ops/s $\textbf{\color{#d91a1a}-6.64\%}$
test_full_like 11.9314ms 11.2422ms 88.9505 Ops/s 75.6575 Ops/s $\textbf{\color{#35bf28}+17.57\%}$
test_zeros_like 8.4526ms 7.8921ms 126.7085 Ops/s 127.8413 Ops/s $\color{#d91a1a}-0.89\%$
test_ones_like 8.3949ms 7.8577ms 127.2631 Ops/s 127.5286 Ops/s $\color{#d91a1a}-0.21\%$
test_clone 9.5048ms 9.2941ms 107.5950 Ops/s 108.0755 Ops/s $\color{#d91a1a}-0.44\%$
test_squeeze 53.5800μs 11.0309μs 90.6544 KOps/s 90.4437 KOps/s $\color{#35bf28}+0.23\%$
test_unsqueeze 93.3010μs 51.6321μs 19.3678 KOps/s 19.1641 KOps/s $\color{#35bf28}+1.06\%$
test_split 0.1326ms 0.1002ms 9.9758 KOps/s 10.3396 KOps/s $\color{#d91a1a}-3.52\%$
test_permute 0.1458ms 0.1130ms 8.8512 KOps/s 9.0495 KOps/s $\color{#d91a1a}-2.19\%$
test_stack 27.2375ms 26.8940ms 37.1830 Ops/s 37.0562 Ops/s $\color{#35bf28}+0.34\%$
test_cat 27.7132ms 26.9103ms 37.1605 Ops/s 37.2262 Ops/s $\color{#d91a1a}-0.18\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants