Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Remove _is_memmap and _is_shared from constructor #620

Closed
wants to merge 7 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 16, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2024
Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}20$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.2180μs 17.2963μs 57.8157 KOps/s 60.1454 KOps/s $\color{#d91a1a}-3.87\%$
test_plain_set_stack_nested 0.1838ms 0.1442ms 6.9364 KOps/s 6.9024 KOps/s $\color{#35bf28}+0.49\%$
test_plain_set_nested_inplace 44.0520μs 19.4441μs 51.4294 KOps/s 53.9939 KOps/s $\color{#d91a1a}-4.75\%$
test_plain_set_stack_nested_inplace 0.2539ms 0.1780ms 5.6165 KOps/s 5.5791 KOps/s $\color{#35bf28}+0.67\%$
test_items 24.8370μs 2.5222μs 396.4774 KOps/s 407.4382 KOps/s $\color{#d91a1a}-2.69\%$
test_items_nested 0.4910ms 0.2700ms 3.7041 KOps/s 3.7142 KOps/s $\color{#d91a1a}-0.27\%$
test_items_nested_locked 0.9296ms 0.2734ms 3.6573 KOps/s 3.6998 KOps/s $\color{#d91a1a}-1.15\%$
test_items_nested_leaf 0.3481ms 0.1691ms 5.9132 KOps/s 5.9769 KOps/s $\color{#d91a1a}-1.07\%$
test_items_stack_nested 1.4423ms 1.3202ms 757.4328 Ops/s 747.9836 Ops/s $\color{#35bf28}+1.26\%$
test_items_stack_nested_leaf 1.4940ms 1.1851ms 843.8390 Ops/s 833.7898 Ops/s $\color{#35bf28}+1.21\%$
test_items_stack_nested_locked 1.2228ms 0.8618ms 1.1603 KOps/s 1.1398 KOps/s $\color{#35bf28}+1.80\%$
test_keys 27.4620μs 3.9067μs 255.9730 KOps/s 258.1062 KOps/s $\color{#d91a1a}-0.83\%$
test_keys_nested 49.8768ms 0.1553ms 6.4391 KOps/s 6.7453 KOps/s $\color{#d91a1a}-4.54\%$
test_keys_nested_locked 0.2616ms 0.1508ms 6.6294 KOps/s 6.7177 KOps/s $\color{#d91a1a}-1.31\%$
test_keys_nested_leaf 0.2530ms 0.1282ms 7.8022 KOps/s 7.7667 KOps/s $\color{#35bf28}+0.46\%$
test_keys_stack_nested 2.1845ms 1.2742ms 784.8344 Ops/s 772.1630 Ops/s $\color{#35bf28}+1.64\%$
test_keys_stack_nested_leaf 1.3779ms 1.2641ms 791.0600 Ops/s 778.8130 Ops/s $\color{#35bf28}+1.57\%$
test_keys_stack_nested_locked 1.3155ms 0.7947ms 1.2583 KOps/s 1.1818 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_values 9.5703μs 1.1636μs 859.3771 KOps/s 840.6687 KOps/s $\color{#35bf28}+2.23\%$
test_values_nested 98.3740μs 52.6211μs 19.0038 KOps/s 19.1854 KOps/s $\color{#d91a1a}-0.95\%$
test_values_nested_locked 0.1203ms 53.0464μs 18.8514 KOps/s 18.8912 KOps/s $\color{#d91a1a}-0.21\%$
test_values_nested_leaf 96.9110μs 47.2096μs 21.1821 KOps/s 21.3410 KOps/s $\color{#d91a1a}-0.74\%$
test_values_stack_nested 1.7084ms 1.0463ms 955.7053 Ops/s 936.3519 Ops/s $\color{#35bf28}+2.07\%$
test_values_stack_nested_leaf 1.3517ms 1.0223ms 978.1791 Ops/s 946.0441 Ops/s $\color{#35bf28}+3.40\%$
test_values_stack_nested_locked 1.0506ms 0.5960ms 1.6779 KOps/s 1.6327 KOps/s $\color{#35bf28}+2.76\%$
test_membership 37.3290μs 1.3680μs 730.9835 KOps/s 699.6947 KOps/s $\color{#35bf28}+4.47\%$
test_membership_nested 38.2080μs 3.4504μs 289.8247 KOps/s 349.4603 KOps/s $\textbf{\color{#d91a1a}-17.07\%}$
test_membership_nested_leaf 33.5960μs 3.5156μs 284.4430 KOps/s 342.5126 KOps/s $\textbf{\color{#d91a1a}-16.95\%}$
test_membership_stacked_nested 35.8570μs 12.0523μs 82.9718 KOps/s 70.5607 KOps/s $\textbf{\color{#35bf28}+17.59\%}$
test_membership_stacked_nested_leaf 55.3230μs 12.0963μs 82.6700 KOps/s 81.6225 KOps/s $\color{#35bf28}+1.28\%$
test_membership_nested_last 23.5540μs 6.6455μs 150.4776 KOps/s 166.3891 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_membership_nested_leaf_last 25.7990μs 6.7426μs 148.3108 KOps/s 166.2413 KOps/s $\textbf{\color{#d91a1a}-10.79\%}$
test_membership_stacked_nested_last 0.2821ms 0.1735ms 5.7633 KOps/s 5.8674 KOps/s $\color{#d91a1a}-1.77\%$
test_membership_stacked_nested_leaf_last 32.5210μs 14.1158μs 70.8425 KOps/s 69.0060 KOps/s $\color{#35bf28}+2.66\%$
test_nested_getleaf 29.7060μs 10.6165μs 94.1928 KOps/s 92.0377 KOps/s $\color{#35bf28}+2.34\%$
test_nested_get 46.9770μs 9.9257μs 100.7482 KOps/s 98.9852 KOps/s $\color{#35bf28}+1.78\%$
test_stacked_getleaf 0.7287ms 0.4036ms 2.4776 KOps/s 2.4517 KOps/s $\color{#35bf28}+1.05\%$
test_stacked_get 0.5707ms 0.3662ms 2.7308 KOps/s 2.6822 KOps/s $\color{#35bf28}+1.81\%$
test_nested_getitemleaf 0.2947ms 11.1646μs 89.5690 KOps/s 93.2367 KOps/s $\color{#d91a1a}-3.93\%$
test_nested_getitem 45.5050μs 10.0547μs 99.4561 KOps/s 98.1471 KOps/s $\color{#35bf28}+1.33\%$
test_stacked_getitemleaf 0.5847ms 0.4045ms 2.4722 KOps/s 2.4604 KOps/s $\color{#35bf28}+0.48\%$
test_stacked_getitem 0.6770ms 0.3667ms 2.7268 KOps/s 2.6594 KOps/s $\color{#35bf28}+2.53\%$
test_lock_nested 1.2201ms 0.3931ms 2.5437 KOps/s 2.3445 KOps/s $\textbf{\color{#35bf28}+8.50\%}$
test_lock_stack_nested 70.4652ms 6.2016ms 161.2496 Ops/s 150.2578 Ops/s $\textbf{\color{#35bf28}+7.32\%}$
test_unlock_nested 61.8251ms 0.4540ms 2.2026 KOps/s 2.2990 KOps/s $\color{#d91a1a}-4.20\%$
test_unlock_stack_nested 70.6607ms 5.8322ms 171.4612 Ops/s 158.7963 Ops/s $\textbf{\color{#35bf28}+7.98\%}$
test_flatten_speed 0.7141ms 0.3669ms 2.7254 KOps/s 2.6788 KOps/s $\color{#35bf28}+1.74\%$
test_unflatten_speed 0.8186ms 0.4570ms 2.1883 KOps/s 2.1769 KOps/s $\color{#35bf28}+0.52\%$
test_common_ops 3.9540ms 0.6877ms 1.4541 KOps/s 1.5255 KOps/s $\color{#d91a1a}-4.68\%$
test_creation 53.6500μs 1.8800μs 531.9124 KOps/s 490.2730 KOps/s $\textbf{\color{#35bf28}+8.49\%}$
test_creation_empty 71.5700μs 10.6233μs 94.1330 KOps/s 118.1670 KOps/s $\textbf{\color{#d91a1a}-20.34\%}$
test_creation_nested_1 58.4920μs 13.3361μs 74.9846 KOps/s 88.9980 KOps/s $\textbf{\color{#d91a1a}-15.75\%}$
test_creation_nested_2 40.5160μs 16.6882μs 59.9225 KOps/s 66.6560 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_clone 0.1131ms 11.9468μs 83.7044 KOps/s 78.8915 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_getitem[int] 52.5980μs 11.0710μs 90.3257 KOps/s 78.4067 KOps/s $\textbf{\color{#35bf28}+15.20\%}$
test_getitem[slice_int] 52.6880μs 22.3804μs 44.6820 KOps/s 39.8219 KOps/s $\textbf{\color{#35bf28}+12.20\%}$
test_getitem[range] 99.7160μs 41.2385μs 24.2492 KOps/s 22.6420 KOps/s $\textbf{\color{#35bf28}+7.10\%}$
test_getitem[tuple] 52.4480μs 17.9080μs 55.8409 KOps/s 48.7626 KOps/s $\textbf{\color{#35bf28}+14.52\%}$
test_getitem[list] 0.3032ms 35.8330μs 27.9072 KOps/s 25.9611 KOps/s $\textbf{\color{#35bf28}+7.50\%}$
test_setitem_dim[int] 56.0850μs 30.8770μs 32.3866 KOps/s 36.6500 KOps/s $\textbf{\color{#d91a1a}-11.63\%}$
test_setitem_dim[slice_int] 0.1067ms 56.5781μs 17.6747 KOps/s 18.6637 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_setitem_dim[range] 0.1065ms 74.7113μs 13.3849 KOps/s 13.7385 KOps/s $\color{#d91a1a}-2.57\%$
test_setitem_dim[tuple] 82.3640μs 45.3406μs 22.0553 KOps/s 23.6591 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_setitem 0.1123ms 18.7358μs 53.3738 KOps/s 55.7061 KOps/s $\color{#d91a1a}-4.19\%$
test_set 0.1014ms 18.3130μs 54.6060 KOps/s 57.6200 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_set_shared 5.5516ms 0.1390ms 7.1935 KOps/s 7.1251 KOps/s $\color{#35bf28}+0.96\%$
test_update 0.1114ms 22.0909μs 45.2675 KOps/s 50.9465 KOps/s $\textbf{\color{#d91a1a}-11.15\%}$
test_update_nested 0.1056ms 28.8991μs 34.6032 KOps/s 37.1387 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_set_nested 0.1048ms 19.9449μs 50.1381 KOps/s 51.5596 KOps/s $\color{#d91a1a}-2.76\%$
test_set_nested_new 0.1091ms 23.6399μs 42.3014 KOps/s 42.3090 KOps/s $\color{#d91a1a}-0.02\%$
test_select 0.1252ms 46.7431μs 21.3935 KOps/s 21.0519 KOps/s $\color{#35bf28}+1.62\%$
test_unbind_speed 0.3678ms 0.3149ms 3.1753 KOps/s 2.8221 KOps/s $\textbf{\color{#35bf28}+12.52\%}$
test_unbind_speed_stack0 68.9118ms 4.0701ms 245.6916 Ops/s 247.2943 Ops/s $\color{#d91a1a}-0.65\%$
test_unbind_speed_stack1 4.8640μs 0.6575μs 1.5209 MOps/s 1.5877 MOps/s $\color{#d91a1a}-4.21\%$
test_split 61.8500ms 1.6086ms 621.6484 Ops/s 575.9873 Ops/s $\textbf{\color{#35bf28}+7.93\%}$
test_chunk 62.8010ms 1.5376ms 650.3498 Ops/s 585.2710 Ops/s $\textbf{\color{#35bf28}+11.12\%}$
test_creation[device0] 3.3559ms 0.1027ms 9.7347 KOps/s 9.7082 KOps/s $\color{#35bf28}+0.27\%$
test_creation_from_tensor 2.5855ms 80.5347μs 12.4170 KOps/s 12.4539 KOps/s $\color{#d91a1a}-0.30\%$
test_add_one[memmap_tensor0] 0.2257ms 5.2016μs 192.2484 KOps/s 191.4238 KOps/s $\color{#35bf28}+0.43\%$
test_contiguous[memmap_tensor0] 18.2940μs 0.6481μs 1.5429 MOps/s 1.5801 MOps/s $\color{#d91a1a}-2.35\%$
test_stack[memmap_tensor0] 51.7560μs 3.4743μs 287.8296 KOps/s 275.6376 KOps/s $\color{#35bf28}+4.42\%$
test_memmaptd_index 0.3812ms 0.2013ms 4.9684 KOps/s 4.8502 KOps/s $\color{#35bf28}+2.44\%$
test_memmaptd_index_astensor 0.3332ms 0.2568ms 3.8944 KOps/s 3.7680 KOps/s $\color{#35bf28}+3.35\%$
test_memmaptd_index_op 1.0067ms 0.5572ms 1.7946 KOps/s 1.9367 KOps/s $\textbf{\color{#d91a1a}-7.34\%}$
test_serialize_model 0.1565s 0.1060s 9.4305 Ops/s 8.8333 Ops/s $\textbf{\color{#35bf28}+6.76\%}$
test_serialize_model_pickle 0.4485s 0.3753s 2.6645 Ops/s 2.6391 Ops/s $\color{#35bf28}+0.96\%$
test_serialize_weights 0.1711s 0.1054s 9.4844 Ops/s 9.5138 Ops/s $\color{#d91a1a}-0.31\%$
test_serialize_weights_returnearly 0.1928s 0.1343s 7.4440 Ops/s 7.7475 Ops/s $\color{#d91a1a}-3.92\%$
test_serialize_weights_pickle 1.1515s 0.5926s 1.6874 Ops/s 2.4578 Ops/s $\textbf{\color{#d91a1a}-31.35\%}$
test_serialize_weights_filesystem 0.1703s 97.1434ms 10.2941 Ops/s 10.7005 Ops/s $\color{#d91a1a}-3.80\%$
test_serialize_model_filesystem 0.1039s 92.3334ms 10.8303 Ops/s 10.3432 Ops/s $\color{#35bf28}+4.71\%$
test_reshape_pytree 76.3720μs 22.9683μs 43.5382 KOps/s 41.9101 KOps/s $\color{#35bf28}+3.88\%$
test_reshape_td 64.5600μs 29.8252μs 33.5287 KOps/s 31.7151 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_view_pytree 64.2400μs 23.2828μs 42.9501 KOps/s 41.7887 KOps/s $\color{#35bf28}+2.78\%$
test_view_td 38.5120μs 4.9260μs 203.0027 KOps/s 201.7167 KOps/s $\color{#35bf28}+0.64\%$
test_unbind_pytree 66.0440μs 26.5316μs 37.6909 KOps/s 37.1481 KOps/s $\color{#35bf28}+1.46\%$
test_unbind_td 99.9670μs 49.9878μs 20.0049 KOps/s 17.4290 KOps/s $\textbf{\color{#35bf28}+14.78\%}$
test_split_pytree 56.7460μs 26.0625μs 38.3694 KOps/s 37.3826 KOps/s $\color{#35bf28}+2.64\%$
test_split_td 0.5515ms 39.9694μs 25.0191 KOps/s 21.9937 KOps/s $\textbf{\color{#35bf28}+13.76\%}$
test_add_pytree 76.4630μs 31.5586μs 31.6871 KOps/s 30.5524 KOps/s $\color{#35bf28}+3.71\%$
test_add_td 0.1160ms 49.8612μs 20.0557 KOps/s 22.1664 KOps/s $\textbf{\color{#d91a1a}-9.52\%}$
test_distributed 0.1958ms 97.3413μs 10.2731 KOps/s 9.8908 KOps/s $\color{#35bf28}+3.87\%$
test_tdmodule 0.7591ms 22.8559μs 43.7524 KOps/s 46.4492 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_tdmodule_dispatch 0.2034ms 40.6554μs 24.5970 KOps/s 26.0776 KOps/s $\textbf{\color{#d91a1a}-5.68\%}$
test_tdseq 49.2520μs 25.0361μs 39.9423 KOps/s 42.1516 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_tdseq_dispatch 0.1432ms 44.4184μs 22.5132 KOps/s 23.6168 KOps/s $\color{#d91a1a}-4.67\%$
test_instantiation_functorch 2.6521ms 1.3049ms 766.3526 Ops/s 761.0148 Ops/s $\color{#35bf28}+0.70\%$
test_instantiation_td 66.8046ms 1.0764ms 929.0113 Ops/s 984.8333 Ops/s $\textbf{\color{#d91a1a}-5.67\%}$
test_exec_functorch 0.2707ms 0.1574ms 6.3541 KOps/s 6.3176 KOps/s $\color{#35bf28}+0.58\%$
test_exec_functional_call 0.3305ms 0.1483ms 6.7448 KOps/s 6.7955 KOps/s $\color{#d91a1a}-0.75\%$
test_exec_td 0.2841ms 0.1441ms 6.9419 KOps/s 7.1185 KOps/s $\color{#d91a1a}-2.48\%$
test_exec_td_decorator 1.0678ms 0.1788ms 5.5922 KOps/s 5.6286 KOps/s $\color{#d91a1a}-0.65\%$
test_vmap_mlp_speed[True-True] 1.3529ms 0.8929ms 1.1199 KOps/s 1.1325 KOps/s $\color{#d91a1a}-1.11\%$
test_vmap_mlp_speed[True-False] 0.7784ms 0.4724ms 2.1168 KOps/s 2.1151 KOps/s $\color{#35bf28}+0.08\%$
test_vmap_mlp_speed[False-True] 1.5136ms 0.7835ms 1.2764 KOps/s 1.2908 KOps/s $\color{#d91a1a}-1.12\%$
test_vmap_mlp_speed[False-False] 0.6407ms 0.3864ms 2.5883 KOps/s 2.4218 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_vmap_mlp_speed_decorator[True-True] 3.0808ms 2.4238ms 412.5798 Ops/s 407.3119 Ops/s $\color{#35bf28}+1.29\%$
test_vmap_mlp_speed_decorator[True-False] 0.8990ms 0.5200ms 1.9231 KOps/s 1.9018 KOps/s $\color{#35bf28}+1.12\%$
test_vmap_mlp_speed_decorator[False-True] 2.6117ms 1.9724ms 506.9974 Ops/s 476.5745 Ops/s $\textbf{\color{#35bf28}+6.38\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7809ms 0.4001ms 2.4993 KOps/s 2.4417 KOps/s $\color{#35bf28}+2.36\%$

Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 74.6910μs 13.4853μs 74.1549 KOps/s 67.9238 KOps/s $\textbf{\color{#35bf28}+9.17\%}$
test_plain_set_stack_nested 0.1373ms 0.1163ms 8.5975 KOps/s 8.3488 KOps/s $\color{#35bf28}+2.98\%$
test_plain_set_nested_inplace 37.4910μs 14.8661μs 67.2673 KOps/s 62.3894 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_plain_set_stack_nested_inplace 0.1803ms 0.1437ms 6.9581 KOps/s 6.7190 KOps/s $\color{#35bf28}+3.56\%$
test_items 22.0700μs 4.7426μs 210.8565 KOps/s 206.9704 KOps/s $\color{#35bf28}+1.88\%$
test_items_nested 0.4327ms 0.3375ms 2.9629 KOps/s 2.9509 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_locked 0.3611ms 0.3384ms 2.9551 KOps/s 2.9214 KOps/s $\color{#35bf28}+1.16\%$
test_items_nested_leaf 0.2444ms 0.1985ms 5.0390 KOps/s 5.0187 KOps/s $\color{#35bf28}+0.41\%$
test_items_stack_nested 1.3756ms 1.3175ms 759.0258 Ops/s 752.8556 Ops/s $\color{#35bf28}+0.82\%$
test_items_stack_nested_leaf 1.4196ms 1.1516ms 868.3472 Ops/s 864.9993 Ops/s $\color{#35bf28}+0.39\%$
test_items_stack_nested_locked 1.0027ms 0.9085ms 1.1008 KOps/s 1.0766 KOps/s $\color{#35bf28}+2.25\%$
test_keys 19.2400μs 4.6334μs 215.8253 KOps/s 215.8153 KOps/s $+0.00\%$
test_keys_nested 0.9621ms 94.8332μs 10.5448 KOps/s 10.6033 KOps/s $\color{#d91a1a}-0.55\%$
test_keys_nested_locked 0.1198ms 97.3862μs 10.2684 KOps/s 10.6893 KOps/s $\color{#d91a1a}-3.94\%$
test_keys_nested_leaf 0.1966ms 78.7256μs 12.7023 KOps/s 12.8142 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested 1.3964ms 1.1367ms 879.7312 Ops/s 859.7964 Ops/s $\color{#35bf28}+2.32\%$
test_keys_stack_nested_leaf 1.2946ms 1.1427ms 875.1124 Ops/s 835.2493 Ops/s $\color{#35bf28}+4.77\%$
test_keys_stack_nested_locked 0.8008ms 0.7268ms 1.3759 KOps/s 1.3415 KOps/s $\color{#35bf28}+2.56\%$
test_values 9.5637μs 1.9167μs 521.7187 KOps/s 522.9428 KOps/s $\color{#d91a1a}-0.23\%$
test_values_nested 77.1310μs 45.7320μs 21.8665 KOps/s 21.9551 KOps/s $\color{#d91a1a}-0.40\%$
test_values_nested_locked 80.1810μs 47.8848μs 20.8835 KOps/s 20.9738 KOps/s $\color{#d91a1a}-0.43\%$
test_values_nested_leaf 55.5300μs 39.7256μs 25.1727 KOps/s 25.0572 KOps/s $\color{#35bf28}+0.46\%$
test_values_stack_nested 1.0368ms 0.9583ms 1.0435 KOps/s 1.0180 KOps/s $\color{#35bf28}+2.51\%$
test_values_stack_nested_leaf 1.0257ms 0.9527ms 1.0497 KOps/s 1.0377 KOps/s $\color{#35bf28}+1.16\%$
test_values_stack_nested_locked 0.6266ms 0.5745ms 1.7406 KOps/s 1.6956 KOps/s $\color{#35bf28}+2.66\%$
test_membership 36.1710μs 1.0952μs 913.0605 KOps/s 1.0559 MOps/s $\textbf{\color{#d91a1a}-13.53\%}$
test_membership_nested 18.4700μs 2.8782μs 347.4338 KOps/s 425.6437 KOps/s $\textbf{\color{#d91a1a}-18.37\%}$
test_membership_nested_leaf 34.0910μs 2.9299μs 341.3090 KOps/s 443.0265 KOps/s $\textbf{\color{#d91a1a}-22.96\%}$
test_membership_stacked_nested 36.5900μs 11.3220μs 88.3234 KOps/s 90.3172 KOps/s $\color{#d91a1a}-2.21\%$
test_membership_stacked_nested_leaf 44.3600μs 11.2922μs 88.5564 KOps/s 90.1732 KOps/s $\color{#d91a1a}-1.79\%$
test_membership_nested_last 17.9500μs 5.3734μs 186.1013 KOps/s 208.6761 KOps/s $\textbf{\color{#d91a1a}-10.82\%}$
test_membership_nested_leaf_last 36.7700μs 5.3734μs 186.1005 KOps/s 207.9081 KOps/s $\textbf{\color{#d91a1a}-10.49\%}$
test_membership_stacked_nested_last 0.1770ms 0.1430ms 6.9909 KOps/s 7.2890 KOps/s $\color{#d91a1a}-4.09\%$
test_membership_stacked_nested_leaf_last 39.4810μs 13.0192μs 76.8098 KOps/s 76.9170 KOps/s $\color{#d91a1a}-0.14\%$
test_nested_getleaf 37.0600μs 8.4008μs 119.0356 KOps/s 118.1142 KOps/s $\color{#35bf28}+0.78\%$
test_nested_get 23.2400μs 7.9236μs 126.2054 KOps/s 124.4589 KOps/s $\color{#35bf28}+1.40\%$
test_stacked_getleaf 0.3974ms 0.3214ms 3.1119 KOps/s 3.0317 KOps/s $\color{#35bf28}+2.64\%$
test_stacked_get 0.3175ms 0.2875ms 3.4783 KOps/s 3.4983 KOps/s $\color{#d91a1a}-0.57\%$
test_nested_getitemleaf 27.2510μs 8.4447μs 118.4176 KOps/s 116.4939 KOps/s $\color{#35bf28}+1.65\%$
test_nested_getitem 28.2010μs 7.9705μs 125.4631 KOps/s 123.4834 KOps/s $\color{#35bf28}+1.60\%$
test_stacked_getitemleaf 0.3694ms 0.3221ms 3.1048 KOps/s 3.0736 KOps/s $\color{#35bf28}+1.01\%$
test_stacked_getitem 0.3200ms 0.2890ms 3.4604 KOps/s 3.2948 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_lock_nested 7.1831ms 0.4045ms 2.4720 KOps/s 2.4177 KOps/s $\color{#35bf28}+2.25\%$
test_lock_stack_nested 85.9319ms 6.3976ms 156.3086 Ops/s 155.0316 Ops/s $\color{#35bf28}+0.82\%$
test_unlock_nested 0.8010ms 0.4014ms 2.4913 KOps/s 2.4333 KOps/s $\color{#35bf28}+2.38\%$
test_unlock_stack_nested 84.1611ms 6.7300ms 148.5887 Ops/s 146.6003 Ops/s $\color{#35bf28}+1.36\%$
test_flatten_speed 75.1917ms 0.2903ms 3.4449 KOps/s 3.7884 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_unflatten_speed 0.4609ms 0.3614ms 2.7673 KOps/s 2.7897 KOps/s $\color{#d91a1a}-0.80\%$
test_common_ops 1.0442ms 0.5852ms 1.7089 KOps/s 1.5442 KOps/s $\textbf{\color{#35bf28}+10.66\%}$
test_creation 32.8600μs 1.5534μs 643.7648 KOps/s 616.7011 KOps/s $\color{#35bf28}+4.39\%$
test_creation_empty 24.5700μs 8.3918μs 119.1635 KOps/s 93.6448 KOps/s $\textbf{\color{#35bf28}+27.25\%}$
test_creation_nested_1 29.7210μs 10.1501μs 98.5208 KOps/s 80.7648 KOps/s $\textbf{\color{#35bf28}+21.98\%}$
test_creation_nested_2 43.4910μs 12.5738μs 79.5302 KOps/s 67.0483 KOps/s $\textbf{\color{#35bf28}+18.62\%}$
test_clone 0.1360ms 12.7591μs 78.3757 KOps/s 71.1898 KOps/s $\textbf{\color{#35bf28}+10.09\%}$
test_getitem[int] 80.7610μs 10.8902μs 91.8257 KOps/s 88.1087 KOps/s $\color{#35bf28}+4.22\%$
test_getitem[slice_int] 51.6010μs 21.1240μs 47.3394 KOps/s 44.3654 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_getitem[range] 61.8010μs 36.0471μs 27.7415 KOps/s 25.5180 KOps/s $\textbf{\color{#35bf28}+8.71\%}$
test_getitem[tuple] 34.4210μs 18.6754μs 53.5464 KOps/s 52.0583 KOps/s $\color{#35bf28}+2.86\%$
test_getitem[list] 0.3649ms 32.2381μs 31.0192 KOps/s 28.0620 KOps/s $\textbf{\color{#35bf28}+10.54\%}$
test_setitem_dim[int] 50.4710μs 28.8268μs 34.6900 KOps/s 33.3974 KOps/s $\color{#35bf28}+3.87\%$
test_setitem_dim[slice_int] 92.4110μs 49.8245μs 20.0705 KOps/s 19.4421 KOps/s $\color{#35bf28}+3.23\%$
test_setitem_dim[range] 89.1710μs 67.4791μs 14.8194 KOps/s 14.8893 KOps/s $\color{#d91a1a}-0.47\%$
test_setitem_dim[tuple] 79.4310μs 44.8820μs 22.2806 KOps/s 21.9601 KOps/s $\color{#35bf28}+1.46\%$
test_setitem 0.1587ms 17.7204μs 56.4321 KOps/s 49.2855 KOps/s $\textbf{\color{#35bf28}+14.50\%}$
test_set 0.1418ms 17.2949μs 57.8205 KOps/s 51.3283 KOps/s $\textbf{\color{#35bf28}+12.65\%}$
test_set_shared 2.8763ms 0.1009ms 9.9157 KOps/s 9.6508 KOps/s $\color{#35bf28}+2.74\%$
test_update 0.1072ms 20.2712μs 49.3312 KOps/s 42.8930 KOps/s $\textbf{\color{#35bf28}+15.01\%}$
test_update_nested 85.6910μs 26.8817μs 37.2001 KOps/s 34.0609 KOps/s $\textbf{\color{#35bf28}+9.22\%}$
test_set_nested 0.1116ms 18.2614μs 54.7604 KOps/s 49.1540 KOps/s $\textbf{\color{#35bf28}+11.41\%}$
test_set_nested_new 0.1179ms 21.0710μs 47.4586 KOps/s 41.2544 KOps/s $\textbf{\color{#35bf28}+15.04\%}$
test_select 77.4700μs 43.4999μs 22.9886 KOps/s 21.7936 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_to 72.7410μs 52.8246μs 18.9306 KOps/s 18.0876 KOps/s $\color{#35bf28}+4.66\%$
test_to_nonblocking 77.6310μs 33.9025μs 29.4964 KOps/s 29.3619 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_speed 0.3627ms 0.3165ms 3.1592 KOps/s 3.0237 KOps/s $\color{#35bf28}+4.48\%$
test_unbind_speed_stack0 3.6232ms 3.3638ms 297.2842 Ops/s 245.9958 Ops/s $\textbf{\color{#35bf28}+20.85\%}$
test_unbind_speed_stack1 1.5120μs 0.5429μs 1.8421 MOps/s 1.8584 MOps/s $\color{#d91a1a}-0.88\%$
test_split 77.2211ms 1.6613ms 601.9546 Ops/s 580.0715 Ops/s $\color{#35bf28}+3.77\%$
test_chunk 73.8148ms 1.6199ms 617.3081 Ops/s 636.1907 Ops/s $\color{#d91a1a}-2.97\%$
test_creation[device0] 0.1415ms 71.6900μs 13.9489 KOps/s 12.5121 KOps/s $\textbf{\color{#35bf28}+11.48\%}$
test_creation_from_tensor 0.1334ms 54.1590μs 18.4641 KOps/s 16.8141 KOps/s $\textbf{\color{#35bf28}+9.81\%}$
test_add_one[memmap_tensor0] 78.0710μs 6.7157μs 148.9055 KOps/s 136.5209 KOps/s $\textbf{\color{#35bf28}+9.07\%}$
test_contiguous[memmap_tensor0] 15.9000μs 0.6318μs 1.5828 MOps/s 1.5284 MOps/s $\color{#35bf28}+3.56\%$
test_stack[memmap_tensor0] 22.3800μs 4.3097μs 232.0356 KOps/s 210.7952 KOps/s $\textbf{\color{#35bf28}+10.08\%}$
test_memmaptd_index 0.3040ms 0.2384ms 4.1949 KOps/s 4.0842 KOps/s $\color{#35bf28}+2.71\%$
test_memmaptd_index_astensor 0.3263ms 0.2922ms 3.4225 KOps/s 3.3239 KOps/s $\color{#35bf28}+2.97\%$
test_memmaptd_index_op 0.7321ms 0.5798ms 1.7248 KOps/s 1.5729 KOps/s $\textbf{\color{#35bf28}+9.66\%}$
test_serialize_model 0.1637s 96.7012ms 10.3411 Ops/s 9.7380 Ops/s $\textbf{\color{#35bf28}+6.19\%}$
test_serialize_model_pickle 1.3507s 1.2375s 0.8081 Ops/s 0.8081 Ops/s $-0.01\%$
test_serialize_weights 0.1647s 95.2000ms 10.5042 Ops/s 9.9410 Ops/s $\textbf{\color{#35bf28}+5.67\%}$
test_serialize_weights_returnearly 0.2531s 78.0303ms 12.8155 Ops/s 13.2417 Ops/s $\color{#d91a1a}-3.22\%$
test_serialize_weights_pickle 2.2007s 1.2882s 0.7763 Ops/s 0.8103 Ops/s $\color{#d91a1a}-4.20\%$
test_reshape_pytree 54.9310μs 24.3095μs 41.1361 KOps/s 40.4759 KOps/s $\color{#35bf28}+1.63\%$
test_reshape_td 0.1858ms 28.8062μs 34.7147 KOps/s 34.9432 KOps/s $\color{#d91a1a}-0.65\%$
test_view_pytree 0.1620ms 24.8768μs 40.1982 KOps/s 41.2742 KOps/s $\color{#d91a1a}-2.61\%$
test_view_td 41.5910μs 4.1177μs 242.8550 KOps/s 244.5558 KOps/s $\color{#d91a1a}-0.70\%$
test_unbind_pytree 52.0810μs 29.6648μs 33.7100 KOps/s 32.7436 KOps/s $\color{#35bf28}+2.95\%$
test_unbind_td 78.0110μs 49.8724μs 20.0512 KOps/s 19.2840 KOps/s $\color{#35bf28}+3.98\%$
test_split_pytree 52.1110μs 27.8242μs 35.9400 KOps/s 35.0629 KOps/s $\color{#35bf28}+2.50\%$
test_split_td 0.6673ms 38.5577μs 25.9352 KOps/s 23.6456 KOps/s $\textbf{\color{#35bf28}+9.68\%}$
test_add_pytree 0.1402ms 37.0019μs 27.0256 KOps/s 26.5300 KOps/s $\color{#35bf28}+1.87\%$
test_add_td 0.1033ms 50.7295μs 19.7124 KOps/s 19.0287 KOps/s $\color{#35bf28}+3.59\%$
test_distributed 2.2105ms 73.8371μs 13.5433 KOps/s 14.2745 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_tdmodule 0.1510ms 18.1047μs 55.2344 KOps/s 52.9000 KOps/s $\color{#35bf28}+4.41\%$
test_tdmodule_dispatch 0.1246ms 33.9248μs 29.4770 KOps/s 27.4957 KOps/s $\textbf{\color{#35bf28}+7.21\%}$
test_tdseq 36.1900μs 20.7473μs 48.1991 KOps/s 45.7773 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_tdseq_dispatch 52.7410μs 36.3761μs 27.4906 KOps/s 25.6074 KOps/s $\textbf{\color{#35bf28}+7.35\%}$
test_instantiation_functorch 1.7296ms 1.6910ms 591.3663 Ops/s 590.1820 Ops/s $\color{#35bf28}+0.20\%$
test_instantiation_td 1.7557ms 1.1732ms 852.4053 Ops/s 842.8182 Ops/s $\color{#35bf28}+1.14\%$
test_exec_functorch 0.1783ms 0.1586ms 6.3068 KOps/s 6.1873 KOps/s $\color{#35bf28}+1.93\%$
test_exec_functional_call 0.1833ms 0.1579ms 6.3349 KOps/s 6.3018 KOps/s $\color{#35bf28}+0.53\%$
test_exec_td 0.1817ms 0.1498ms 6.6746 KOps/s 6.5083 KOps/s $\color{#35bf28}+2.56\%$
test_exec_td_decorator 0.7347ms 0.1869ms 5.3499 KOps/s 5.1673 KOps/s $\color{#35bf28}+3.53\%$
test_vmap_mlp_speed[True-True] 1.2763ms 1.1054ms 904.6525 Ops/s 895.6321 Ops/s $\color{#35bf28}+1.01\%$
test_vmap_mlp_speed[True-False] 0.7214ms 0.6588ms 1.5180 KOps/s 1.4983 KOps/s $\color{#35bf28}+1.31\%$
test_vmap_mlp_speed[False-True] 1.0849ms 1.0194ms 981.0001 Ops/s 976.1829 Ops/s $\color{#35bf28}+0.49\%$
test_vmap_mlp_speed[False-False] 0.6455ms 0.5917ms 1.6902 KOps/s 1.6837 KOps/s $\color{#35bf28}+0.39\%$
test_vmap_mlp_speed_decorator[True-True] 3.0222ms 2.5065ms 398.9581 Ops/s 392.1772 Ops/s $\color{#35bf28}+1.73\%$
test_vmap_mlp_speed_decorator[True-False] 1.2550ms 0.7081ms 1.4122 KOps/s 1.3986 KOps/s $\color{#35bf28}+0.97\%$
test_vmap_mlp_speed_decorator[False-True] 2.5487ms 2.1012ms 475.9127 Ops/s 469.5542 Ops/s $\color{#35bf28}+1.35\%$
test_vmap_mlp_speed_decorator[False-False] 1.0566ms 0.6082ms 1.6441 KOps/s 1.6330 KOps/s $\color{#35bf28}+0.68\%$
test_vmap_transformer_speed[True-True] 12.5503ms 12.3738ms 80.8159 Ops/s 80.3345 Ops/s $\color{#35bf28}+0.60\%$
test_vmap_transformer_speed[True-False] 8.2345ms 8.1239ms 123.0942 Ops/s 122.2003 Ops/s $\color{#35bf28}+0.73\%$
test_vmap_transformer_speed[False-True] 12.4781ms 12.3203ms 81.1667 Ops/s 80.7341 Ops/s $\color{#35bf28}+0.54\%$
test_vmap_transformer_speed[False-False] 8.2533ms 8.0666ms 123.9685 Ops/s 122.9952 Ops/s $\color{#35bf28}+0.79\%$
test_vmap_transformer_speed_decorator[True-True] 0.1650s 81.9156ms 12.2077 Ops/s 13.0492 Ops/s $\textbf{\color{#d91a1a}-6.45\%}$
test_vmap_transformer_speed_decorator[True-False] 21.3931ms 19.5722ms 51.0930 Ops/s 51.0126 Ops/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed_decorator[False-True] 69.2623ms 68.2622ms 14.6494 Ops/s 14.4255 Ops/s $\color{#35bf28}+1.55\%$
test_vmap_transformer_speed_decorator[False-False] 20.8623ms 19.1680ms 52.1703 Ops/s 51.9902 Ops/s $\color{#35bf28}+0.35\%$

@vmoens vmoens changed the title [Performance] Faster clone [Performance] Remove _is_memmap and _is_shared from constructor Jan 16, 2024
@vmoens vmoens changed the title [Performance] Remove _is_memmap and _is_shared from constructor [BugFix] Remove _is_memmap and _is_shared from constructor Jan 16, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Jan 16, 2024

Closing in favor of #621

@vmoens vmoens closed this Jan 16, 2024
@vmoens vmoens deleted the faster-clone branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants