Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Using native torch.Tensor for memmap #554

Closed
wants to merge 15 commits into from
Closed

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 8, 2023

Introduces a new backend for memory-mapped tensors that doesn't rely on np

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2023
Copy link

github-actions bot commented Nov 8, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.2165ms 19.9547μs 50.1136 KOps/s 49.8169 KOps/s $\color{#35bf28}+0.60\%$
test_plain_set_stack_nested 0.2109ms 0.1859ms 5.3786 KOps/s 5.3328 KOps/s $\color{#35bf28}+0.86\%$
test_plain_set_nested_inplace 41.7000μs 23.6245μs 42.3289 KOps/s 42.3551 KOps/s $\color{#d91a1a}-0.06\%$
test_plain_set_stack_nested_inplace 0.9053ms 0.2205ms 4.5351 KOps/s 4.5077 KOps/s $\color{#35bf28}+0.61\%$
test_items 68.4010μs 3.3933μs 294.7023 KOps/s 289.2369 KOps/s $\color{#35bf28}+1.89\%$
test_items_nested 2.2512ms 0.3761ms 2.6590 KOps/s 2.7689 KOps/s $\color{#d91a1a}-3.97\%$
test_items_nested_locked 0.4630ms 0.3746ms 2.6697 KOps/s 2.7114 KOps/s $\color{#d91a1a}-1.54\%$
test_items_nested_leaf 0.2573ms 0.2266ms 4.4134 KOps/s 4.5251 KOps/s $\color{#d91a1a}-2.47\%$
test_items_stack_nested 1.9300ms 1.8507ms 540.3428 Ops/s 548.3711 Ops/s $\color{#d91a1a}-1.46\%$
test_items_stack_nested_leaf 1.7733ms 1.6782ms 595.8665 Ops/s 609.8826 Ops/s $\color{#d91a1a}-2.30\%$
test_items_stack_nested_locked 2.9863ms 0.9972ms 1.0028 KOps/s 1.0285 KOps/s $\color{#d91a1a}-2.50\%$
test_keys 64.6010μs 5.1100μs 195.6930 KOps/s 198.6078 KOps/s $\color{#d91a1a}-1.47\%$
test_keys_nested 1.1433ms 0.1834ms 5.4529 KOps/s 4.9614 KOps/s $\textbf{\color{#35bf28}+9.91\%}$
test_keys_nested_locked 0.2114ms 0.1814ms 5.5124 KOps/s 5.4895 KOps/s $\color{#35bf28}+0.42\%$
test_keys_nested_leaf 0.3254ms 0.1741ms 5.7443 KOps/s 5.7169 KOps/s $\color{#35bf28}+0.48\%$
test_keys_stack_nested 1.9701ms 1.7131ms 583.7210 Ops/s 594.8389 Ops/s $\color{#d91a1a}-1.87\%$
test_keys_stack_nested_leaf 1.9219ms 1.7093ms 585.0240 Ops/s 592.1189 Ops/s $\color{#d91a1a}-1.20\%$
test_keys_stack_nested_locked 1.2155ms 0.8534ms 1.1717 KOps/s 1.2147 KOps/s $\color{#d91a1a}-3.54\%$
test_values 17.6010μs 1.5496μs 645.3435 KOps/s 646.6685 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested 0.1098ms 66.8500μs 14.9589 KOps/s 14.9438 KOps/s $\color{#35bf28}+0.10\%$
test_values_nested_locked 0.1250ms 67.1448μs 14.8932 KOps/s 14.9604 KOps/s $\color{#d91a1a}-0.45\%$
test_values_nested_leaf 0.1112ms 58.4599μs 17.1057 KOps/s 17.0415 KOps/s $\color{#35bf28}+0.38\%$
test_values_stack_nested 2.5307ms 1.5751ms 634.8784 Ops/s 688.3212 Ops/s $\textbf{\color{#d91a1a}-7.76\%}$
test_values_stack_nested_leaf 1.6564ms 1.4746ms 678.1706 Ops/s 690.9528 Ops/s $\color{#d91a1a}-1.85\%$
test_values_stack_nested_locked 0.7508ms 0.6528ms 1.5320 KOps/s 1.5469 KOps/s $\color{#d91a1a}-0.97\%$
test_membership 16.7000μs 1.8025μs 554.7919 KOps/s 538.0480 KOps/s $\color{#35bf28}+3.11\%$
test_membership_nested 72.5010μs 3.6758μs 272.0470 KOps/s 280.8171 KOps/s $\color{#d91a1a}-3.12\%$
test_membership_nested_leaf 37.9010μs 3.7307μs 268.0475 KOps/s 282.3751 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_membership_stacked_nested 28.1000μs 14.4149μs 69.3726 KOps/s 69.8285 KOps/s $\color{#d91a1a}-0.65\%$
test_membership_stacked_nested_leaf 44.5010μs 14.3137μs 69.8634 KOps/s 69.9738 KOps/s $\color{#d91a1a}-0.16\%$
test_membership_nested_last 25.3000μs 7.6266μs 131.1195 KOps/s 133.4385 KOps/s $\color{#d91a1a}-1.74\%$
test_membership_nested_leaf_last 40.7010μs 7.6233μs 131.1763 KOps/s 133.4076 KOps/s $\color{#d91a1a}-1.67\%$
test_membership_stacked_nested_last 0.2586ms 0.2288ms 4.3709 KOps/s 4.4373 KOps/s $\color{#d91a1a}-1.50\%$
test_membership_stacked_nested_leaf_last 0.1041ms 16.9126μs 59.1276 KOps/s 60.3171 KOps/s $\color{#d91a1a}-1.97\%$
test_nested_getleaf 46.3010μs 15.8512μs 63.0869 KOps/s 63.4089 KOps/s $\color{#d91a1a}-0.51\%$
test_nested_get 41.1010μs 15.0983μs 66.2324 KOps/s 66.8117 KOps/s $\color{#d91a1a}-0.87\%$
test_stacked_getleaf 0.8806ms 0.7751ms 1.2902 KOps/s 1.3298 KOps/s $\color{#d91a1a}-2.98\%$
test_stacked_get 0.7789ms 0.7403ms 1.3507 KOps/s 1.3887 KOps/s $\color{#d91a1a}-2.73\%$
test_nested_getitemleaf 45.3000μs 15.8170μs 63.2229 KOps/s 63.8070 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_getitem 45.6000μs 15.0693μs 66.3603 KOps/s 66.5420 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_getitemleaf 0.8661ms 0.7773ms 1.2865 KOps/s 1.3270 KOps/s $\color{#d91a1a}-3.05\%$
test_stacked_getitem 0.7882ms 0.7412ms 1.3492 KOps/s 1.3871 KOps/s $\color{#d91a1a}-2.73\%$
test_lock_nested 87.3496ms 1.2588ms 794.4000 Ops/s 859.3824 Ops/s $\textbf{\color{#d91a1a}-7.56\%}$
test_lock_stack_nested 0.1103s 18.5665ms 53.8604 Ops/s 53.0010 Ops/s $\color{#35bf28}+1.62\%$
test_unlock_nested 85.6639ms 1.2647ms 790.6730 Ops/s 791.6967 Ops/s $\color{#d91a1a}-0.13\%$
test_unlock_stack_nested 0.1224s 18.9277ms 52.8325 Ops/s 52.1645 Ops/s $\color{#35bf28}+1.28\%$
test_flatten_speed 0.9479ms 0.8913ms 1.1220 KOps/s 1.1435 KOps/s $\color{#d91a1a}-1.88\%$
test_unflatten_speed 1.5999ms 1.5631ms 639.7538 Ops/s 643.2815 Ops/s $\color{#d91a1a}-0.55\%$
test_common_ops 7.1427ms 0.8486ms 1.1785 KOps/s 1.1826 KOps/s $\color{#d91a1a}-0.35\%$
test_creation 30.0000μs 3.0429μs 328.6351 KOps/s 335.7273 KOps/s $\color{#d91a1a}-2.11\%$
test_creation_empty 40.0000μs 9.6262μs 103.8832 KOps/s 104.7657 KOps/s $\color{#d91a1a}-0.84\%$
test_creation_nested_1 41.1000μs 14.9491μs 66.8937 KOps/s 66.8246 KOps/s $\color{#35bf28}+0.10\%$
test_creation_nested_2 52.3000μs 18.2782μs 54.7100 KOps/s 56.3633 KOps/s $\color{#d91a1a}-2.93\%$
test_clone 0.1030ms 14.9301μs 66.9789 KOps/s 66.6809 KOps/s $\color{#35bf28}+0.45\%$
test_getitem[int] 51.5010μs 17.7106μs 56.4634 KOps/s 56.9871 KOps/s $\color{#d91a1a}-0.92\%$
test_getitem[slice_int] 0.1009ms 42.8428μs 23.3411 KOps/s 24.3608 KOps/s $\color{#d91a1a}-4.19\%$
test_getitem[range] 0.1076ms 66.2080μs 15.1039 KOps/s 14.7580 KOps/s $\color{#35bf28}+2.34\%$
test_getitem[tuple] 53.6000μs 33.3563μs 29.9794 KOps/s 30.0005 KOps/s $\color{#d91a1a}-0.07\%$
test_getitem[list] 0.1236ms 61.6785μs 16.2131 KOps/s 15.6210 KOps/s $\color{#35bf28}+3.79\%$
test_setitem_dim[int] 57.5010μs 33.4385μs 29.9056 KOps/s 30.3521 KOps/s $\color{#d91a1a}-1.47\%$
test_setitem_dim[slice_int] 86.4010μs 59.0476μs 16.9355 KOps/s 16.9177 KOps/s $\color{#35bf28}+0.11\%$
test_setitem_dim[range] 99.7020μs 77.7468μs 12.8623 KOps/s 12.5608 KOps/s $\color{#35bf28}+2.40\%$
test_setitem_dim[tuple] 67.6010μs 49.1656μs 20.3394 KOps/s 20.2549 KOps/s $\color{#35bf28}+0.42\%$
test_setitem 0.1308ms 20.7524μs 48.1872 KOps/s 47.1550 KOps/s $\color{#35bf28}+2.19\%$
test_set 97.6010μs 19.9024μs 50.2452 KOps/s 49.9589 KOps/s $\color{#35bf28}+0.57\%$
test_set_shared 4.1728ms 0.1897ms 5.2727 KOps/s 5.3118 KOps/s $\color{#d91a1a}-0.74\%$
test_update 0.1290ms 27.3040μs 36.6246 KOps/s 36.1333 KOps/s $\color{#35bf28}+1.36\%$
test_update_nested 0.2052ms 38.3315μs 26.0882 KOps/s 25.7119 KOps/s $\color{#35bf28}+1.46\%$
test_set_nested 0.1297ms 22.4563μs 44.5310 KOps/s 43.6582 KOps/s $\color{#35bf28}+2.00\%$
test_set_nested_new 0.1045ms 31.9442μs 31.3046 KOps/s 31.6632 KOps/s $\color{#d91a1a}-1.13\%$
test_select 0.2523ms 61.0972μs 16.3674 KOps/s 16.6373 KOps/s $\color{#d91a1a}-1.62\%$
test_unbind_speed 0.4089ms 0.3736ms 2.6766 KOps/s 2.6513 KOps/s $\color{#35bf28}+0.95\%$
test_unbind_speed_stack0 0.1036s 6.5598ms 152.4433 Ops/s 155.7020 Ops/s $\color{#d91a1a}-2.09\%$
test_unbind_speed_stack1 29.0010μs 1.1630μs 859.8182 KOps/s 1.0840 MOps/s $\textbf{\color{#d91a1a}-20.68\%}$
test_creation[device0] 5.0859ms 0.4586ms 2.1804 KOps/s 2.1703 KOps/s $\color{#35bf28}+0.47\%$
test_creation_from_tensor 4.5601ms 0.5268ms 1.8984 KOps/s 1.9830 KOps/s $\color{#d91a1a}-4.27\%$
test_add_one[memmap_tensor0] 1.9731ms 33.2242μs 30.0986 KOps/s 29.0448 KOps/s $\color{#35bf28}+3.63\%$
test_contiguous[memmap_tensor0] 39.4010μs 8.5837μs 116.4993 KOps/s 110.3460 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_stack[memmap_tensor0] 87.4010μs 27.5748μs 36.2649 KOps/s 37.1110 KOps/s $\color{#d91a1a}-2.28\%$
test_memmaptd_index 0.4145ms 0.3092ms 3.2337 KOps/s 3.2317 KOps/s $\color{#35bf28}+0.06\%$
test_memmaptd_index_astensor 1.5341ms 1.2336ms 810.6350 Ops/s 826.4057 Ops/s $\color{#d91a1a}-1.91\%$
test_memmaptd_index_op 4.9454ms 2.6570ms 376.3674 Ops/s 373.1715 Ops/s $\color{#35bf28}+0.86\%$
test_reshape_pytree 93.7010μs 32.8187μs 30.4704 KOps/s 30.0254 KOps/s $\color{#35bf28}+1.48\%$
test_reshape_td 81.0010μs 28.1854μs 35.4794 KOps/s 35.0693 KOps/s $\color{#35bf28}+1.17\%$
test_view_pytree 93.9010μs 32.7482μs 30.5361 KOps/s 29.1430 KOps/s $\color{#35bf28}+4.78\%$
test_view_td 22.9000μs 5.7097μs 175.1398 KOps/s 176.0301 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_pytree 80.0010μs 37.8529μs 26.4181 KOps/s 26.3873 KOps/s $\color{#35bf28}+0.12\%$
test_unbind_td 93.6010μs 54.6265μs 18.3061 KOps/s 18.5444 KOps/s $\color{#d91a1a}-1.28\%$
test_split_pytree 0.1296ms 37.6170μs 26.5837 KOps/s 26.7130 KOps/s $\color{#d91a1a}-0.48\%$
test_split_td 0.1427ms 0.1043ms 9.5846 KOps/s 9.9171 KOps/s $\color{#d91a1a}-3.35\%$
test_add_pytree 90.6010μs 47.2979μs 21.1426 KOps/s 21.0150 KOps/s $\color{#35bf28}+0.61\%$
test_add_td 0.1155ms 59.9673μs 16.6758 KOps/s 16.8787 KOps/s $\color{#d91a1a}-1.20\%$
test_distributed 49.8000μs 8.8974μs 112.3919 KOps/s 109.7695 KOps/s $\color{#35bf28}+2.39\%$
test_tdmodule 0.1285ms 25.7310μs 38.8637 KOps/s 34.9672 KOps/s $\textbf{\color{#35bf28}+11.14\%}$
test_tdmodule_dispatch 0.2872ms 45.8991μs 21.7869 KOps/s 21.7395 KOps/s $\color{#35bf28}+0.22\%$
test_tdseq 55.8000μs 31.0622μs 32.1934 KOps/s 30.0316 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_tdseq_dispatch 0.6151ms 55.1967μs 18.1170 KOps/s 17.7016 KOps/s $\color{#35bf28}+2.35\%$
test_instantiation_functorch 1.9643ms 1.7000ms 588.2423 Ops/s 604.3940 Ops/s $\color{#d91a1a}-2.67\%$
test_instantiation_td 2.0972ms 1.3354ms 748.8308 Ops/s 754.3229 Ops/s $\color{#d91a1a}-0.73\%$
test_exec_functorch 0.2503ms 0.1980ms 5.0499 KOps/s 5.0515 KOps/s $\color{#d91a1a}-0.03\%$
test_exec_td 0.2392ms 0.1881ms 5.3170 KOps/s 5.3010 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed[True-True] 11.0368ms 1.1717ms 853.4732 Ops/s 878.5634 Ops/s $\color{#d91a1a}-2.86\%$
test_vmap_mlp_speed[True-False] 14.3520ms 0.6421ms 1.5575 KOps/s 1.6186 KOps/s $\color{#d91a1a}-3.78\%$
test_vmap_mlp_speed[False-True] 5.0708ms 0.9808ms 1.0196 KOps/s 978.6893 Ops/s $\color{#35bf28}+4.18\%$
test_vmap_mlp_speed[False-False] 9.2924ms 0.4948ms 2.0211 KOps/s 2.0354 KOps/s $\color{#d91a1a}-0.70\%$

@vmoens vmoens added enhancement New feature or request Refactor Refactoring code - not a new feature labels Nov 13, 2023
@vmoens
Copy link
Contributor Author

vmoens commented Dec 4, 2023

Due to the fact that sharing a file-backed non-shared tensor serializes it, we're closing this PR for the time being.

@vmoens vmoens closed this Dec 4, 2023
@vmoens vmoens deleted the memory_map branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants