-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Multithread memmap #592
Conversation
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 32.6910μs | 16.4405μs | 60.8253 KOps/s | 62.0754 KOps/s | |
test_plain_set_stack_nested | 0.2049ms | 0.1433ms | 6.9772 KOps/s | 7.0160 KOps/s | |
test_plain_set_nested_inplace | 68.1970μs | 18.9244μs | 52.8419 KOps/s | 54.8609 KOps/s | |
test_plain_set_stack_nested_inplace | 0.2440ms | 0.1788ms | 5.5936 KOps/s | 5.6648 KOps/s | |
test_items | 25.3770μs | 2.5491μs | 392.2952 KOps/s | 403.8522 KOps/s | |
test_items_nested | 0.3398ms | 0.2698ms | 3.7062 KOps/s | 3.6800 KOps/s | |
test_items_nested_locked | 0.9370ms | 0.2706ms | 3.6951 KOps/s | 3.6821 KOps/s | |
test_items_nested_leaf | 0.2417ms | 0.1668ms | 5.9948 KOps/s | 5.9776 KOps/s | |
test_items_stack_nested | 2.3760ms | 1.4973ms | 667.8644 Ops/s | 673.2859 Ops/s | |
test_items_stack_nested_leaf | 2.1307ms | 1.3408ms | 745.8178 Ops/s | 742.7414 Ops/s | |
test_items_stack_nested_locked | 2.9134ms | 0.7817ms | 1.2792 KOps/s | 1.2996 KOps/s | |
test_keys | 33.8530μs | 3.8593μs | 259.1157 KOps/s | 256.0052 KOps/s | |
test_keys_nested | 0.5508ms | 0.1413ms | 7.0757 KOps/s | 6.6715 KOps/s | |
test_keys_nested_locked | 0.3052ms | 0.1399ms | 7.1480 KOps/s | 7.1026 KOps/s | |
test_keys_nested_leaf | 0.2866ms | 0.1401ms | 7.1390 KOps/s | 7.0905 KOps/s | |
test_keys_stack_nested | 1.5466ms | 1.4040ms | 712.2384 Ops/s | 710.7292 Ops/s | |
test_keys_stack_nested_leaf | 2.2087ms | 1.4017ms | 713.4438 Ops/s | 711.5908 Ops/s | |
test_keys_stack_nested_locked | 0.7901ms | 0.6819ms | 1.4665 KOps/s | 1.4492 KOps/s | |
test_values | 8.5310μs | 1.1789μs | 848.2589 KOps/s | 864.3089 KOps/s | |
test_values_nested | 91.9210μs | 49.5734μs | 20.1721 KOps/s | 20.0002 KOps/s | |
test_values_nested_locked | 98.4130μs | 49.6480μs | 20.1418 KOps/s | 20.1219 KOps/s | |
test_values_nested_leaf | 0.1033ms | 44.5761μs | 22.4335 KOps/s | 22.5617 KOps/s | |
test_values_stack_nested | 1.8326ms | 1.1991ms | 833.9643 Ops/s | 839.2398 Ops/s | |
test_values_stack_nested_leaf | 1.3569ms | 1.1875ms | 842.1101 Ops/s | 847.3135 Ops/s | |
test_values_stack_nested_locked | 1.0054ms | 0.5093ms | 1.9633 KOps/s | 1.9099 KOps/s | |
test_membership | 15.8600μs | 1.3630μs | 733.6513 KOps/s | 746.2782 KOps/s | |
test_membership_nested | 21.8210μs | 2.7867μs | 358.8453 KOps/s | 359.7178 KOps/s | |
test_membership_nested_leaf | 19.6860μs | 2.7989μs | 357.2769 KOps/s | 358.2138 KOps/s | |
test_membership_stacked_nested | 41.6270μs | 11.8173μs | 84.6217 KOps/s | 80.1730 KOps/s | |
test_membership_stacked_nested_leaf | 34.5240μs | 11.8211μs | 84.5947 KOps/s | 84.6542 KOps/s | |
test_membership_nested_last | 25.8580μs | 5.8348μs | 171.3842 KOps/s | 172.3043 KOps/s | |
test_membership_nested_leaf_last | 38.3110μs | 5.9580μs | 167.8419 KOps/s | 171.2949 KOps/s | |
test_membership_stacked_nested_last | 0.2283ms | 0.1660ms | 6.0243 KOps/s | 6.0194 KOps/s | |
test_membership_stacked_nested_leaf_last | 43.6020μs | 13.8338μs | 72.2868 KOps/s | 72.3531 KOps/s | |
test_nested_getleaf | 37.0990μs | 10.5126μs | 95.1243 KOps/s | 93.2589 KOps/s | |
test_nested_get | 30.0760μs | 10.0211μs | 99.7895 KOps/s | 98.7054 KOps/s | |
test_stacked_getleaf | 0.7322ms | 0.6335ms | 1.5784 KOps/s | 1.5520 KOps/s | |
test_stacked_get | 1.1473ms | 0.6059ms | 1.6505 KOps/s | 1.6523 KOps/s | |
test_nested_getitemleaf | 63.3880μs | 10.5863μs | 94.4615 KOps/s | 93.9827 KOps/s | |
test_nested_getitem | 31.0480μs | 10.1386μs | 98.6327 KOps/s | 99.6718 KOps/s | |
test_stacked_getitemleaf | 0.7318ms | 0.6391ms | 1.5646 KOps/s | 1.5556 KOps/s | |
test_stacked_getitem | 0.7759ms | 0.6077ms | 1.6457 KOps/s | 1.6351 KOps/s | |
test_lock_nested | 56.2178ms | 0.4762ms | 2.1000 KOps/s | 2.4052 KOps/s | |
test_lock_stack_nested | 72.2923ms | 6.3834ms | 156.6569 Ops/s | 152.4783 Ops/s | |
test_unlock_nested | 1.0170ms | 0.4255ms | 2.3503 KOps/s | 2.0695 KOps/s | |
test_unlock_stack_nested | 70.2606ms | 6.1245ms | 163.2792 Ops/s | 163.4588 Ops/s | |
test_flatten_speed | 0.4588ms | 0.2657ms | 3.7630 KOps/s | 3.7561 KOps/s | |
test_unflatten_speed | 0.7782ms | 0.4459ms | 2.2427 KOps/s | 2.2160 KOps/s | |
test_common_ops | 5.5184ms | 0.7094ms | 1.4097 KOps/s | 1.5658 KOps/s | |
test_creation | 15.1580μs | 2.0118μs | 497.0572 KOps/s | 500.8651 KOps/s | |
test_creation_empty | 27.9920μs | 9.3741μs | 106.6769 KOps/s | 122.5054 KOps/s | |
test_creation_nested_1 | 28.9840μs | 12.1371μs | 82.3919 KOps/s | 90.7723 KOps/s | |
test_creation_nested_2 | 41.4070μs | 17.4693μs | 57.2432 KOps/s | 68.6027 KOps/s | |
test_clone | 0.2575ms | 12.4902μs | 80.0626 KOps/s | 80.7736 KOps/s | |
test_getitem[int] | 62.5770μs | 12.0051μs | 83.2982 KOps/s | 82.8571 KOps/s | |
test_getitem[slice_int] | 74.9890μs | 23.7434μs | 42.1170 KOps/s | 42.4021 KOps/s | |
test_getitem[range] | 0.1223ms | 42.8248μs | 23.3510 KOps/s | 23.4048 KOps/s | |
test_getitem[tuple] | 57.5770μs | 19.4986μs | 51.2857 KOps/s | 51.5905 KOps/s | |
test_getitem[list] | 0.1004ms | 37.4006μs | 26.7375 KOps/s | 27.1882 KOps/s | |
test_setitem_dim[int] | 51.9570μs | 30.1005μs | 33.2220 KOps/s | 34.4650 KOps/s | |
test_setitem_dim[slice_int] | 96.7610μs | 53.8245μs | 18.5789 KOps/s | 18.9882 KOps/s | |
test_setitem_dim[range] | 0.1399ms | 73.3625μs | 13.6309 KOps/s | 14.0054 KOps/s | |
test_setitem_dim[tuple] | 72.2450μs | 43.4562μs | 23.0117 KOps/s | 24.1594 KOps/s | |
test_setitem | 0.2168ms | 18.2514μs | 54.7903 KOps/s | 58.4206 KOps/s | |
test_set | 0.2227ms | 17.8169μs | 56.1266 KOps/s | 60.5281 KOps/s | |
test_set_shared | 4.9355ms | 0.1397ms | 7.1591 KOps/s | 7.1091 KOps/s | |
test_update | 0.1533ms | 20.2069μs | 49.4880 KOps/s | 54.5509 KOps/s | |
test_update_nested | 0.1465ms | 27.8067μs | 35.9626 KOps/s | 39.4380 KOps/s | |
test_set_nested | 0.1462ms | 19.6319μs | 50.9375 KOps/s | 55.2200 KOps/s | |
test_set_nested_new | 0.1690ms | 23.8536μs | 41.9224 KOps/s | 45.0039 KOps/s | |
test_select | 97.5220μs | 48.4772μs | 20.6282 KOps/s | 22.0468 KOps/s | |
test_unbind_speed | 0.4019ms | 0.3446ms | 2.9017 KOps/s | 2.9253 KOps/s | |
test_unbind_speed_stack0 | 62.1491ms | 4.1986ms | 238.1741 Ops/s | 226.7453 Ops/s | |
test_unbind_speed_stack1 | 2.0839μs | 0.6296μs | 1.5884 MOps/s | 1.5556 MOps/s | |
test_split | 59.2533ms | 1.6703ms | 598.7016 Ops/s | 599.3525 Ops/s | |
test_chunk | 56.9097ms | 1.6505ms | 605.8761 Ops/s | 604.7532 Ops/s | |
test_creation[device0] | 0.7547ms | 0.2944ms | 3.3969 KOps/s | 3.3701 KOps/s | |
test_creation_from_tensor | 58.7447ms | 0.3653ms | 2.7376 KOps/s | 3.0586 KOps/s | |
test_add_one[memmap_tensor0] | 0.2885ms | 25.4628μs | 39.2730 KOps/s | 38.8741 KOps/s | |
test_contiguous[memmap_tensor0] | 30.4360μs | 5.7998μs | 172.4190 KOps/s | 172.6585 KOps/s | |
test_stack[memmap_tensor0] | 0.1179ms | 19.5108μs | 51.2537 KOps/s | 52.4564 KOps/s | |
test_memmaptd_index | 0.4094ms | 0.2011ms | 4.9715 KOps/s | 4.9746 KOps/s | |
test_memmaptd_index_astensor | 0.3559ms | 0.2586ms | 3.8665 KOps/s | 3.8916 KOps/s | |
test_memmaptd_index_op | 1.0442ms | 0.5186ms | 1.9282 KOps/s | 1.9958 KOps/s | |
test_reshape_pytree | 76.5430μs | 22.8308μs | 43.8006 KOps/s | 42.3029 KOps/s | |
test_reshape_td | 69.8100μs | 31.0751μs | 32.1801 KOps/s | 33.0390 KOps/s | |
test_view_pytree | 57.1270μs | 22.8623μs | 43.7401 KOps/s | 42.1337 KOps/s | |
test_view_td | 22.7730μs | 4.9168μs | 203.3824 KOps/s | 203.5568 KOps/s | |
test_unbind_pytree | 57.1660μs | 26.6390μs | 37.5389 KOps/s | 37.3591 KOps/s | |
test_unbind_td | 99.3150μs | 55.0312μs | 18.1715 KOps/s | 18.2722 KOps/s | |
test_split_pytree | 90.1960μs | 25.9858μs | 38.4825 KOps/s | 36.9721 KOps/s | |
test_split_td | 95.0460μs | 43.7170μs | 22.8744 KOps/s | 23.1311 KOps/s | |
test_add_pytree | 71.6130μs | 31.6149μs | 31.6306 KOps/s | 30.6951 KOps/s | |
test_add_td | 0.1544ms | 46.5804μs | 21.4683 KOps/s | 22.0936 KOps/s | |
test_distributed | 22.2010μs | 6.1994μs | 161.3062 KOps/s | 167.1240 KOps/s | |
test_tdmodule | 0.9687ms | 23.6401μs | 42.3010 KOps/s | 47.2084 KOps/s | |
test_tdmodule_dispatch | 0.1915ms | 41.2759μs | 24.2272 KOps/s | 25.5508 KOps/s | |
test_tdseq | 44.6930μs | 25.8464μs | 38.6901 KOps/s | 39.3421 KOps/s | |
test_tdseq_dispatch | 0.3840ms | 45.3841μs | 22.0341 KOps/s | 23.1165 KOps/s | |
test_instantiation_functorch | 1.9069ms | 1.3335ms | 749.9312 Ops/s | 767.5280 Ops/s | |
test_instantiation_td | 1.5532ms | 1.0172ms | 983.0669 Ops/s | 991.5911 Ops/s | |
test_exec_functorch | 0.2177ms | 0.1607ms | 6.2221 KOps/s | 6.2603 KOps/s | |
test_exec_functional_call | 0.3592ms | 0.1477ms | 6.7683 KOps/s | 6.7604 KOps/s | |
test_exec_td | 0.2564ms | 0.1494ms | 6.6916 KOps/s | 7.1193 KOps/s | |
test_exec_td_decorator | 0.8052ms | 0.1769ms | 5.6537 KOps/s | 5.8255 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.0294ms | 0.8972ms | 1.1146 KOps/s | 1.1201 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.5834ms | 0.4687ms | 2.1333 KOps/s | 2.1316 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.1721ms | 0.7810ms | 1.2804 KOps/s | 1.2804 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6177ms | 0.3850ms | 2.5976 KOps/s | 2.5528 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.6808ms | 1.7925ms | 557.8855 Ops/s | 567.7900 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.1523ms | 0.5216ms | 1.9171 KOps/s | 1.9531 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.8863ms | 1.4939ms | 669.3800 Ops/s | 663.4180 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8070ms | 0.3932ms | 2.5431 KOps/s | 2.5259 KOps/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 77.6440μs | 12.4817μs | 80.1174 KOps/s | 78.7275 KOps/s | |
test_plain_set_stack_nested | 0.1467ms | 0.1139ms | 8.7804 KOps/s | 8.3753 KOps/s | |
test_plain_set_nested_inplace | 40.4620μs | 13.7963μs | 72.4831 KOps/s | 71.0111 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1764ms | 0.1418ms | 7.0541 KOps/s | 7.0112 KOps/s | |
test_items | 28.2220μs | 4.6421μs | 215.4197 KOps/s | 214.1616 KOps/s | |
test_items_nested | 0.3984ms | 0.3425ms | 2.9197 KOps/s | 2.8959 KOps/s | |
test_items_nested_locked | 0.3877ms | 0.3447ms | 2.9010 KOps/s | 2.8641 KOps/s | |
test_items_nested_leaf | 0.2217ms | 0.2004ms | 4.9902 KOps/s | 4.9036 KOps/s | |
test_items_stack_nested | 1.5319ms | 1.4707ms | 679.9390 Ops/s | 688.0857 Ops/s | |
test_items_stack_nested_leaf | 1.3768ms | 1.2824ms | 779.8028 Ops/s | 777.3654 Ops/s | |
test_items_stack_nested_locked | 2.3611ms | 0.8154ms | 1.2265 KOps/s | 1.2194 KOps/s | |
test_keys | 15.1610μs | 4.5538μs | 219.5957 KOps/s | 219.0237 KOps/s | |
test_keys_nested | 0.4602ms | 90.4366μs | 11.0575 KOps/s | 10.9615 KOps/s | |
test_keys_nested_locked | 0.1339ms | 89.4325μs | 11.1816 KOps/s | 11.0821 KOps/s | |
test_keys_nested_leaf | 42.3907ms | 86.2191μs | 11.5984 KOps/s | 12.1600 KOps/s | |
test_keys_stack_nested | 1.3235ms | 1.2530ms | 798.0933 Ops/s | 789.6106 Ops/s | |
test_keys_stack_nested_leaf | 1.3268ms | 1.2536ms | 797.7321 Ops/s | 803.0791 Ops/s | |
test_keys_stack_nested_locked | 0.6823ms | 0.6102ms | 1.6388 KOps/s | 1.6334 KOps/s | |
test_values | 8.8307μs | 1.8896μs | 529.2050 KOps/s | 527.2241 KOps/s | |
test_values_nested | 63.2940μs | 42.8338μs | 23.3461 KOps/s | 23.3856 KOps/s | |
test_values_nested_locked | 65.0330μs | 45.0455μs | 22.1998 KOps/s | 22.1791 KOps/s | |
test_values_nested_leaf | 56.7930μs | 37.2403μs | 26.8526 KOps/s | 26.8809 KOps/s | |
test_values_stack_nested | 1.1663ms | 1.1004ms | 908.7643 Ops/s | 908.0421 Ops/s | |
test_values_stack_nested_leaf | 1.1699ms | 1.0854ms | 921.3085 Ops/s | 918.0863 Ops/s | |
test_values_stack_nested_locked | 0.6822ms | 0.4836ms | 2.0680 KOps/s | 2.0210 KOps/s | |
test_membership | 4.8162μs | 0.9255μs | 1.0805 MOps/s | 1.0721 MOps/s | |
test_membership_nested | 14.1610μs | 2.0514μs | 487.4800 KOps/s | 457.9140 KOps/s | |
test_membership_nested_leaf | 16.4210μs | 2.0580μs | 485.9077 KOps/s | 473.5376 KOps/s | |
test_membership_stacked_nested | 31.6010μs | 10.7905μs | 92.6739 KOps/s | 94.0408 KOps/s | |
test_membership_stacked_nested_leaf | 44.4120μs | 10.8011μs | 92.5830 KOps/s | 94.2439 KOps/s | |
test_membership_nested_last | 20.3210μs | 4.5185μs | 221.3137 KOps/s | 221.8566 KOps/s | |
test_membership_nested_leaf_last | 39.8320μs | 4.5326μs | 220.6243 KOps/s | 222.3804 KOps/s | |
test_membership_stacked_nested_last | 0.2108ms | 0.1337ms | 7.4809 KOps/s | 7.4223 KOps/s | |
test_membership_stacked_nested_leaf_last | 36.3120μs | 12.6675μs | 78.9422 KOps/s | 79.7596 KOps/s | |
test_nested_getleaf | 18.6410μs | 8.3566μs | 119.6657 KOps/s | 118.3269 KOps/s | |
test_nested_get | 31.1320μs | 7.8982μs | 126.6116 KOps/s | 125.4415 KOps/s | |
test_stacked_getleaf | 0.6295ms | 0.5612ms | 1.7820 KOps/s | 1.8019 KOps/s | |
test_stacked_get | 0.5566ms | 0.5243ms | 1.9074 KOps/s | 1.9326 KOps/s | |
test_nested_getitemleaf | 70.5520μs | 8.4318μs | 118.5990 KOps/s | 118.3853 KOps/s | |
test_nested_getitem | 30.7500μs | 7.9550μs | 125.7077 KOps/s | 125.0268 KOps/s | |
test_stacked_getitemleaf | 0.5943ms | 0.5596ms | 1.7870 KOps/s | 1.7972 KOps/s | |
test_stacked_getitem | 0.6096ms | 0.5424ms | 1.8435 KOps/s | 1.9145 KOps/s | |
test_lock_nested | 1.5012ms | 0.4069ms | 2.4575 KOps/s | 2.4302 KOps/s | |
test_lock_stack_nested | 62.7289ms | 5.8282ms | 171.5792 Ops/s | 167.3592 Ops/s | |
test_unlock_nested | 0.9317ms | 0.4036ms | 2.4777 KOps/s | 2.4492 KOps/s | |
test_unlock_stack_nested | 61.8630ms | 5.9549ms | 167.9285 Ops/s | 167.2287 Ops/s | |
test_flatten_speed | 0.4500ms | 0.1876ms | 5.3316 KOps/s | 5.3043 KOps/s | |
test_unflatten_speed | 0.4046ms | 0.3539ms | 2.8258 KOps/s | 2.8457 KOps/s | |
test_common_ops | 1.0575ms | 0.5656ms | 1.7681 KOps/s | 1.8104 KOps/s | |
test_creation | 13.9510μs | 1.5913μs | 628.4023 KOps/s | 632.7778 KOps/s | |
test_creation_empty | 36.1020μs | 6.2766μs | 159.3208 KOps/s | 152.9666 KOps/s | |
test_creation_nested_1 | 25.1810μs | 8.1939μs | 122.0417 KOps/s | 118.9544 KOps/s | |
test_creation_nested_2 | 38.5720μs | 12.5508μs | 79.6763 KOps/s | 90.9810 KOps/s | |
test_clone | 76.3930μs | 12.5559μs | 79.6440 KOps/s | 79.1519 KOps/s | |
test_getitem[int] | 34.8220μs | 10.8676μs | 92.0169 KOps/s | 90.6512 KOps/s | |
test_getitem[slice_int] | 43.2820μs | 20.2533μs | 49.3746 KOps/s | 48.7174 KOps/s | |
test_getitem[range] | 67.9030μs | 36.0313μs | 27.7537 KOps/s | 27.9547 KOps/s | |
test_getitem[tuple] | 38.4420μs | 18.3812μs | 54.4035 KOps/s | 53.7479 KOps/s | |
test_getitem[list] | 0.2933ms | 32.1920μs | 31.0636 KOps/s | 30.2395 KOps/s | |
test_setitem_dim[int] | 41.3520μs | 23.3936μs | 42.7467 KOps/s | 41.2910 KOps/s | |
test_setitem_dim[slice_int] | 58.7430μs | 41.4552μs | 24.1224 KOps/s | 23.2084 KOps/s | |
test_setitem_dim[range] | 80.8540μs | 58.9638μs | 16.9596 KOps/s | 16.7180 KOps/s | |
test_setitem_dim[tuple] | 66.4930μs | 37.1156μs | 26.9428 KOps/s | 26.5970 KOps/s | |
test_setitem | 83.3350μs | 16.0252μs | 62.4019 KOps/s | 59.5563 KOps/s | |
test_set | 85.0250μs | 15.5542μs | 64.2914 KOps/s | 64.0821 KOps/s | |
test_set_shared | 3.0273ms | 99.9083μs | 10.0092 KOps/s | 10.0757 KOps/s | |
test_update | 94.5650μs | 17.2443μs | 57.9902 KOps/s | 57.5475 KOps/s | |
test_update_nested | 0.1170ms | 23.1561μs | 43.1852 KOps/s | 43.4081 KOps/s | |
test_set_nested | 94.3150μs | 16.4407μs | 60.8247 KOps/s | 60.3421 KOps/s | |
test_set_nested_new | 97.2450μs | 19.5751μs | 51.0854 KOps/s | 50.6414 KOps/s | |
test_select | 0.1059ms | 40.5126μs | 24.6837 KOps/s | 24.2847 KOps/s | |
test_to | 70.0430μs | 49.1901μs | 20.3293 KOps/s | 19.6753 KOps/s | |
test_to_nonblocking | 66.3040μs | 30.9163μs | 32.3454 KOps/s | 31.3330 KOps/s | |
test_unbind_speed | 0.3613ms | 0.3277ms | 3.0518 KOps/s | 3.0688 KOps/s | |
test_unbind_speed_stack0 | 60.7399ms | 3.8979ms | 256.5460 Ops/s | 239.4895 Ops/s | |
test_unbind_speed_stack1 | 1.8771μs | 0.5229μs | 1.9125 MOps/s | 1.9121 MOps/s | |
test_split | 54.3458ms | 1.6303ms | 613.3768 Ops/s | 605.3905 Ops/s | |
test_chunk | 53.8432ms | 1.6158ms | 618.8695 Ops/s | 610.7712 Ops/s | |
test_creation[device0] | 0.3780ms | 0.3054ms | 3.2739 KOps/s | 3.2652 KOps/s | |
test_creation[device1] | 0.7026ms | 0.3127ms | 3.1978 KOps/s | 3.2344 KOps/s | |
test_creation_from_tensor | 59.7632ms | 0.3694ms | 2.7069 KOps/s | 2.9891 KOps/s | |
test_add_one[memmap_tensor0] | 0.1384ms | 23.3121μs | 42.8962 KOps/s | 43.3167 KOps/s | |
test_add_one[memmap_tensor1] | 0.1908ms | 70.6277μs | 14.1588 KOps/s | 14.1322 KOps/s | |
test_contiguous[memmap_tensor0] | 25.7710μs | 5.8550μs | 170.7939 KOps/s | 170.5869 KOps/s | |
test_contiguous[memmap_tensor1] | 50.7430μs | 20.7591μs | 48.1716 KOps/s | 46.9792 KOps/s | |
test_stack[memmap_tensor0] | 48.4330μs | 18.9893μs | 52.6612 KOps/s | 52.7061 KOps/s | |
test_stack[memmap_tensor1] | 0.1199ms | 70.1977μs | 14.2455 KOps/s | 14.2003 KOps/s | |
test_memmaptd_index | 0.2814ms | 0.2383ms | 4.1969 KOps/s | 4.2452 KOps/s | |
test_memmaptd_index_astensor | 0.3553ms | 0.2938ms | 3.4039 KOps/s | 3.4865 KOps/s | |
test_memmaptd_index_op | 0.6194ms | 0.5335ms | 1.8745 KOps/s | 1.8713 KOps/s | |
test_reshape_pytree | 44.3820μs | 20.6634μs | 48.3947 KOps/s | 48.3375 KOps/s | |
test_reshape_td | 54.2330μs | 28.9875μs | 34.4976 KOps/s | 35.6441 KOps/s | |
test_view_pytree | 35.3020μs | 20.3417μs | 49.1600 KOps/s | 49.0640 KOps/s | |
test_view_td | 19.2110μs | 3.9900μs | 250.6240 KOps/s | 250.7046 KOps/s | |
test_unbind_pytree | 49.0620μs | 25.2068μs | 39.6719 KOps/s | 39.2588 KOps/s | |
test_unbind_td | 76.0540μs | 51.0447μs | 19.5907 KOps/s | 19.7212 KOps/s | |
test_split_pytree | 46.6330μs | 23.2825μs | 42.9507 KOps/s | 42.7756 KOps/s | |
test_split_td | 67.4030μs | 39.5805μs | 25.2649 KOps/s | 25.8405 KOps/s | |
test_add_pytree | 65.2530μs | 30.1745μs | 33.1406 KOps/s | 32.9230 KOps/s | |
test_add_td | 59.3130μs | 40.0119μs | 24.9926 KOps/s | 24.7951 KOps/s | |
test_distributed | 25.5410μs | 5.5325μs | 180.7490 KOps/s | 183.7950 KOps/s | |
test_tdmodule | 30.1820μs | 16.0433μs | 62.3314 KOps/s | 61.1235 KOps/s | |
test_tdmodule_dispatch | 0.1917ms | 31.2264μs | 32.0242 KOps/s | 31.6915 KOps/s | |
test_tdseq | 38.4220μs | 19.0708μs | 52.4363 KOps/s | 51.4407 KOps/s | |
test_tdseq_dispatch | 52.6120μs | 34.2138μs | 29.2280 KOps/s | 29.1254 KOps/s | |
test_instantiation_functorch | 1.9302ms | 1.6668ms | 599.9675 Ops/s | 605.8343 Ops/s | |
test_instantiation_td | 1.7322ms | 1.1543ms | 866.3127 Ops/s | 876.3498 Ops/s | |
test_exec_functorch | 0.2150ms | 0.1518ms | 6.5893 KOps/s | 6.5393 KOps/s | |
test_exec_functional_call | 0.2071ms | 0.1479ms | 6.7607 KOps/s | 6.7650 KOps/s | |
test_exec_td | 0.1727ms | 0.1392ms | 7.1865 KOps/s | 7.2394 KOps/s | |
test_exec_td_decorator | 0.6784ms | 0.1723ms | 5.8029 KOps/s | 5.7536 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.5129ms | 1.0126ms | 987.5767 Ops/s | 971.8905 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.6509ms | 0.5862ms | 1.7060 KOps/s | 1.6906 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.0162ms | 0.9284ms | 1.0772 KOps/s | 1.0546 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5586ms | 0.5166ms | 1.9356 KOps/s | 1.8864 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.4623ms | 1.9245ms | 519.6022 Ops/s | 510.0448 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0093ms | 0.6260ms | 1.5975 KOps/s | 1.5852 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 2.0483ms | 1.6772ms | 596.2231 Ops/s | 589.2840 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8051ms | 0.5323ms | 1.8787 KOps/s | 1.8687 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.1137ms | 11.8623ms | 84.3007 Ops/s | 83.2874 Ops/s | |
test_vmap_transformer_speed[True-False] | 7.8972ms | 7.8155ms | 127.9502 Ops/s | 126.7045 Ops/s | |
test_vmap_transformer_speed[False-True] | 11.9941ms | 11.7674ms | 84.9804 Ops/s | 84.3543 Ops/s | |
test_vmap_transformer_speed[False-False] | 7.8000ms | 7.7220ms | 129.4994 Ops/s | 127.6952 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 61.4318ms | 60.5925ms | 16.5037 Ops/s | 16.1076 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.6922ms | 18.9282ms | 52.8312 Ops/s | 52.2232 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 0.1341s | 59.0991ms | 16.9207 Ops/s | 17.9130 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 20.2969ms | 18.5478ms | 53.9148 Ops/s | 53.3061 Ops/s |
@laurencer thanks for the review. I also prevented conflicting executor and num_threads, but eventually I think |
I implemented some benchmarks too, incl. torch.save for comparison.
|
@vmoens Thanks for the information. Just out of curiosity, have you try to compare the performance if the target storage is a memory file system, like tmpfs? |
That would be EDIT
The difference between filesystem and not seems to be quite machine/distro-dependent though. With LLAMA2 7B serialization on disk, we get:
|
TODO:
_memmap_
memmap
version that does not change the tenosrdict inplace