Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor split #555

Merged
merged 1 commit into from
Nov 10, 2023
Merged

[Refactor] Refactor split #555

merged 1 commit into from
Nov 10, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 10, 2023

Refactors split to use slices, to make sure the storage is kept consistent when using map with regular memmap tensors

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 10, 2023
@vmoens vmoens marked this pull request as ready for review November 10, 2023 20:41
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Nov 10, 2023
@vmoens vmoens merged commit 924a46a into main Nov 10, 2023
25 of 31 checks passed
@vmoens vmoens deleted the refactor_split branch November 10, 2023 20:43
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1181ms 20.5671μs 48.6214 KOps/s 49.8113 KOps/s $\color{#d91a1a}-2.39\%$
test_plain_set_stack_nested 0.2514ms 0.1852ms 5.4007 KOps/s 5.3780 KOps/s $\color{#35bf28}+0.42\%$
test_plain_set_nested_inplace 49.0010μs 23.9198μs 41.8064 KOps/s 42.4058 KOps/s $\color{#d91a1a}-1.41\%$
test_plain_set_stack_nested_inplace 0.3315ms 0.2206ms 4.5327 KOps/s 4.5166 KOps/s $\color{#35bf28}+0.36\%$
test_items 27.6000μs 3.3992μs 294.1910 KOps/s 291.7217 KOps/s $\color{#35bf28}+0.85\%$
test_items_nested 0.5139ms 0.3714ms 2.6924 KOps/s 2.7307 KOps/s $\color{#d91a1a}-1.40\%$
test_items_nested_locked 0.4777ms 0.3730ms 2.6811 KOps/s 2.7070 KOps/s $\color{#d91a1a}-0.96\%$
test_items_nested_leaf 1.5269ms 0.2320ms 4.3105 KOps/s 4.5504 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_items_stack_nested 2.0286ms 1.8441ms 542.2598 Ops/s 554.1212 Ops/s $\color{#d91a1a}-2.14\%$
test_items_stack_nested_leaf 2.6096ms 1.7276ms 578.8421 Ops/s 612.2907 Ops/s $\textbf{\color{#d91a1a}-5.46\%}$
test_items_stack_nested_locked 1.1599ms 0.9809ms 1.0195 KOps/s 1.0234 KOps/s $\color{#d91a1a}-0.38\%$
test_keys 23.0010μs 5.0548μs 197.8300 KOps/s 197.6463 KOps/s $\color{#35bf28}+0.09\%$
test_keys_nested 0.9409ms 0.1826ms 5.4759 KOps/s 4.9791 KOps/s $\textbf{\color{#35bf28}+9.98\%}$
test_keys_nested_locked 0.3283ms 0.1814ms 5.5122 KOps/s 5.4456 KOps/s $\color{#35bf28}+1.22\%$
test_keys_nested_leaf 0.3195ms 0.1740ms 5.7460 KOps/s 5.7212 KOps/s $\color{#35bf28}+0.43\%$
test_keys_stack_nested 1.8798ms 1.6970ms 589.2724 Ops/s 591.6872 Ops/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested_leaf 1.8726ms 1.6949ms 589.9886 Ops/s 595.5264 Ops/s $\color{#d91a1a}-0.93\%$
test_keys_stack_nested_locked 1.0080ms 0.8158ms 1.2257 KOps/s 1.2220 KOps/s $\color{#35bf28}+0.31\%$
test_values 22.5000μs 1.5391μs 649.7115 KOps/s 649.3019 KOps/s $\color{#35bf28}+0.06\%$
test_values_nested 0.2239ms 66.4263μs 15.0543 KOps/s 14.9267 KOps/s $\color{#35bf28}+0.85\%$
test_values_nested_locked 0.2119ms 66.5167μs 15.0338 KOps/s 14.9722 KOps/s $\color{#35bf28}+0.41\%$
test_values_nested_leaf 0.2013ms 58.3315μs 17.1434 KOps/s 17.0685 KOps/s $\color{#35bf28}+0.44\%$
test_values_stack_nested 1.6592ms 1.4672ms 681.5782 Ops/s 689.9659 Ops/s $\color{#d91a1a}-1.22\%$
test_values_stack_nested_leaf 2.4662ms 1.4586ms 685.5978 Ops/s 698.7574 Ops/s $\color{#d91a1a}-1.88\%$
test_values_stack_nested_locked 0.7760ms 0.6400ms 1.5625 KOps/s 1.5558 KOps/s $\color{#35bf28}+0.43\%$
test_membership 19.2010μs 1.8789μs 532.2217 KOps/s 526.6201 KOps/s $\color{#35bf28}+1.06\%$
test_membership_nested 22.6000μs 3.6469μs 274.2076 KOps/s 281.0018 KOps/s $\color{#d91a1a}-2.42\%$
test_membership_nested_leaf 59.6020μs 3.6444μs 274.3965 KOps/s 281.9565 KOps/s $\color{#d91a1a}-2.68\%$
test_membership_stacked_nested 38.8010μs 14.7488μs 67.8020 KOps/s 69.9386 KOps/s $\color{#d91a1a}-3.05\%$
test_membership_stacked_nested_leaf 64.8010μs 14.5904μs 68.5380 KOps/s 69.3194 KOps/s $\color{#d91a1a}-1.13\%$
test_membership_nested_last 29.0010μs 7.6060μs 131.4743 KOps/s 135.4574 KOps/s $\color{#d91a1a}-2.94\%$
test_membership_nested_leaf_last 32.3010μs 7.5880μs 131.7867 KOps/s 132.9805 KOps/s $\color{#d91a1a}-0.90\%$
test_membership_stacked_nested_last 0.3017ms 0.2324ms 4.3034 KOps/s 4.4643 KOps/s $\color{#d91a1a}-3.61\%$
test_membership_stacked_nested_leaf_last 95.5020μs 16.8378μs 59.3903 KOps/s 59.5553 KOps/s $\color{#d91a1a}-0.28\%$
test_nested_getleaf 0.2730ms 16.8554μs 59.3283 KOps/s 63.9283 KOps/s $\textbf{\color{#d91a1a}-7.20\%}$
test_nested_get 39.1010μs 15.9028μs 62.8819 KOps/s 67.5464 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_stacked_getleaf 1.1855ms 0.7635ms 1.3097 KOps/s 1.3403 KOps/s $\color{#d91a1a}-2.28\%$
test_stacked_get 0.8252ms 0.7280ms 1.3737 KOps/s 1.4101 KOps/s $\color{#d91a1a}-2.58\%$
test_nested_getitemleaf 69.4020μs 16.0358μs 62.3604 KOps/s 64.1101 KOps/s $\color{#d91a1a}-2.73\%$
test_nested_getitem 76.3020μs 15.2618μs 65.5232 KOps/s 67.6321 KOps/s $\color{#d91a1a}-3.12\%$
test_stacked_getitemleaf 0.8798ms 0.7631ms 1.3104 KOps/s 1.3426 KOps/s $\color{#d91a1a}-2.40\%$
test_stacked_getitem 0.7543ms 0.7284ms 1.3729 KOps/s 1.4116 KOps/s $\color{#d91a1a}-2.74\%$
test_lock_nested 79.8097ms 1.2429ms 804.5896 Ops/s 844.7690 Ops/s $\color{#d91a1a}-4.76\%$
test_lock_stack_nested 96.5137ms 17.5584ms 56.9528 Ops/s 56.1210 Ops/s $\color{#35bf28}+1.48\%$
test_unlock_nested 76.8991ms 1.2520ms 798.7285 Ops/s 781.2053 Ops/s $\color{#35bf28}+2.24\%$
test_unlock_stack_nested 98.8634ms 18.2453ms 54.8085 Ops/s 53.6500 Ops/s $\color{#35bf28}+2.16\%$
test_flatten_speed 1.0475ms 0.9295ms 1.0758 KOps/s 1.1368 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_unflatten_speed 1.7228ms 1.6043ms 623.3395 Ops/s 643.6987 Ops/s $\color{#d91a1a}-3.16\%$
test_common_ops 5.5404ms 0.8427ms 1.1866 KOps/s 1.2291 KOps/s $\color{#d91a1a}-3.46\%$
test_creation 21.0010μs 3.0883μs 323.8048 KOps/s 343.0292 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_creation_empty 27.1010μs 10.1883μs 98.1518 KOps/s 103.1246 KOps/s $\color{#d91a1a}-4.82\%$
test_creation_nested_1 74.5020μs 15.4312μs 64.8038 KOps/s 67.7558 KOps/s $\color{#d91a1a}-4.36\%$
test_creation_nested_2 38.9010μs 18.4510μs 54.1975 KOps/s 56.5219 KOps/s $\color{#d91a1a}-4.11\%$
test_clone 78.9010μs 14.6829μs 68.1064 KOps/s 68.4764 KOps/s $\color{#d91a1a}-0.54\%$
test_getitem[int] 37.2010μs 17.9282μs 55.7779 KOps/s 56.2003 KOps/s $\color{#d91a1a}-0.75\%$
test_getitem[slice_int] 0.1453ms 41.8658μs 23.8858 KOps/s 24.4022 KOps/s $\color{#d91a1a}-2.12\%$
test_getitem[range] 0.1161ms 66.1640μs 15.1140 KOps/s 14.8724 KOps/s $\color{#35bf28}+1.62\%$
test_getitem[tuple] 55.6010μs 33.6409μs 29.7258 KOps/s 29.8567 KOps/s $\color{#d91a1a}-0.44\%$
test_getitem[list] 0.1161ms 61.8820μs 16.1598 KOps/s 15.7866 KOps/s $\color{#35bf28}+2.36\%$
test_setitem_dim[int] 50.8010μs 33.3419μs 29.9923 KOps/s 30.2565 KOps/s $\color{#d91a1a}-0.87\%$
test_setitem_dim[slice_int] 81.9020μs 59.6224μs 16.7722 KOps/s 17.1178 KOps/s $\color{#d91a1a}-2.02\%$
test_setitem_dim[range] 0.1127ms 78.2707μs 12.7762 KOps/s 12.6736 KOps/s $\color{#35bf28}+0.81\%$
test_setitem_dim[tuple] 69.9020μs 49.3383μs 20.2682 KOps/s 20.5200 KOps/s $\color{#d91a1a}-1.23\%$
test_setitem 0.1051ms 21.0009μs 47.6170 KOps/s 48.6871 KOps/s $\color{#d91a1a}-2.20\%$
test_set 85.5020μs 19.9509μs 50.1231 KOps/s 51.0937 KOps/s $\color{#d91a1a}-1.90\%$
test_set_shared 5.2595ms 0.1844ms 5.4234 KOps/s 5.4207 KOps/s $\color{#35bf28}+0.05\%$
test_update 0.1122ms 27.6691μs 36.1413 KOps/s 37.0855 KOps/s $\color{#d91a1a}-2.55\%$
test_update_nested 0.1473ms 39.2547μs 25.4746 KOps/s 26.1865 KOps/s $\color{#d91a1a}-2.72\%$
test_set_nested 0.1145ms 23.0542μs 43.3760 KOps/s 45.3018 KOps/s $\color{#d91a1a}-4.25\%$
test_set_nested_new 0.1393ms 31.7447μs 31.5013 KOps/s 31.9210 KOps/s $\color{#d91a1a}-1.31\%$
test_select 0.1633ms 60.2683μs 16.5925 KOps/s 16.6066 KOps/s $\color{#d91a1a}-0.09\%$
test_unbind_speed 0.4720ms 0.3771ms 2.6517 KOps/s 2.6300 KOps/s $\color{#35bf28}+0.83\%$
test_unbind_speed_stack0 85.7825ms 6.1693ms 162.0917 Ops/s 165.4594 Ops/s $\color{#d91a1a}-2.04\%$
test_unbind_speed_stack1 14.3000μs 1.2098μs 826.6159 KOps/s 1.0900 MOps/s $\textbf{\color{#d91a1a}-24.16\%}$
test_creation[device0] 0.5158ms 0.4423ms 2.2608 KOps/s 2.2114 KOps/s $\color{#35bf28}+2.24\%$
test_creation_from_tensor 3.4258ms 0.4997ms 2.0012 KOps/s 2.0173 KOps/s $\color{#d91a1a}-0.80\%$
test_add_one[memmap_tensor0] 2.2777ms 32.6246μs 30.6518 KOps/s 29.2880 KOps/s $\color{#35bf28}+4.66\%$
test_contiguous[memmap_tensor0] 21.9010μs 8.6906μs 115.0670 KOps/s 109.6026 KOps/s $\color{#35bf28}+4.99\%$
test_stack[memmap_tensor0] 0.1007ms 26.0958μs 38.3204 KOps/s 36.9721 KOps/s $\color{#35bf28}+3.65\%$
test_memmaptd_index 0.5881ms 0.3258ms 3.0696 KOps/s 3.2484 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_memmaptd_index_astensor 1.2529ms 1.2013ms 832.3970 Ops/s 828.9514 Ops/s $\color{#35bf28}+0.42\%$
test_memmaptd_index_op 2.7748ms 2.6145ms 382.4855 Ops/s 379.7702 Ops/s $\color{#35bf28}+0.71\%$
test_reshape_pytree 0.1010ms 34.1156μs 29.3121 KOps/s 29.9106 KOps/s $\color{#d91a1a}-2.00\%$
test_reshape_td 0.1024ms 29.1300μs 34.3288 KOps/s 35.0809 KOps/s $\color{#d91a1a}-2.14\%$
test_view_pytree 0.1326ms 33.8132μs 29.5743 KOps/s 30.2820 KOps/s $\color{#d91a1a}-2.34\%$
test_view_td 21.1010μs 5.6977μs 175.5084 KOps/s 178.8706 KOps/s $\color{#d91a1a}-1.88\%$
test_unbind_pytree 0.1151ms 37.9371μs 26.3594 KOps/s 26.1516 KOps/s $\color{#35bf28}+0.79\%$
test_unbind_td 83.5020μs 54.3222μs 18.4087 KOps/s 18.5451 KOps/s $\color{#d91a1a}-0.74\%$
test_split_pytree 82.2020μs 37.3150μs 26.7989 KOps/s 27.0910 KOps/s $\color{#d91a1a}-1.08\%$
test_split_td 0.2303ms 0.1036ms 9.6553 KOps/s 13.3129 KOps/s $\textbf{\color{#d91a1a}-27.47\%}$
test_add_pytree 0.1243ms 47.0715μs 21.2443 KOps/s 21.2131 KOps/s $\color{#35bf28}+0.15\%$
test_add_td 0.1467ms 60.8840μs 16.4247 KOps/s 16.8613 KOps/s $\color{#d91a1a}-2.59\%$
test_distributed 58.5010μs 8.7507μs 114.2771 KOps/s 111.5348 KOps/s $\color{#35bf28}+2.46\%$
test_tdmodule 0.2044ms 25.5801μs 39.0929 KOps/s 36.0648 KOps/s $\textbf{\color{#35bf28}+8.40\%}$
test_tdmodule_dispatch 0.2328ms 45.2036μs 22.1221 KOps/s 22.8560 KOps/s $\color{#d91a1a}-3.21\%$
test_tdseq 50.1010μs 30.9534μs 32.3066 KOps/s 31.5367 KOps/s $\color{#35bf28}+2.44\%$
test_tdseq_dispatch 0.1825ms 55.5966μs 17.9867 KOps/s 17.9960 KOps/s $\color{#d91a1a}-0.05\%$
test_instantiation_functorch 1.7475ms 1.6519ms 605.3477 Ops/s 602.0002 Ops/s $\color{#35bf28}+0.56\%$
test_instantiation_td 1.9109ms 1.3271ms 753.5214 Ops/s 759.5183 Ops/s $\color{#d91a1a}-0.79\%$
test_exec_functorch 0.3062ms 0.1957ms 5.1107 KOps/s 5.0650 KOps/s $\color{#35bf28}+0.90\%$
test_exec_td 0.2266ms 0.1878ms 5.3258 KOps/s 5.3203 KOps/s $\color{#35bf28}+0.10\%$
test_vmap_mlp_speed[True-True] 7.7102ms 1.1166ms 895.5621 Ops/s 910.6647 Ops/s $\color{#d91a1a}-1.66\%$
test_vmap_mlp_speed[True-False] 8.6217ms 0.6232ms 1.6047 KOps/s 1.6438 KOps/s $\color{#d91a1a}-2.38\%$
test_vmap_mlp_speed[False-True] 8.5430ms 0.9643ms 1.0370 KOps/s 1.0494 KOps/s $\color{#d91a1a}-1.18\%$
test_vmap_mlp_speed[False-False] 6.4513ms 0.4898ms 2.0416 KOps/s 2.0671 KOps/s $\color{#d91a1a}-1.23\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants