-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster params and buffer registration in TensorDictParams #569
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Nov 23, 2023
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 50.1130μs | 15.8994μs | 62.8955 KOps/s | 64.7885 KOps/s | |
test_plain_set_stack_nested | 0.2062ms | 0.1470ms | 6.8048 KOps/s | 7.0605 KOps/s | |
test_plain_set_nested_inplace | 42.8200μs | 18.9600μs | 52.7427 KOps/s | 54.3493 KOps/s | |
test_plain_set_stack_nested_inplace | 0.2586ms | 0.1758ms | 5.6886 KOps/s | 5.9193 KOps/s | |
test_items | 23.8350μs | 2.3424μs | 426.9195 KOps/s | 414.5356 KOps/s | |
test_items_nested | 0.4182ms | 0.2671ms | 3.7432 KOps/s | 3.8469 KOps/s | |
test_items_nested_locked | 0.9954ms | 0.2697ms | 3.7074 KOps/s | 3.8182 KOps/s | |
test_items_nested_leaf | 0.5435ms | 0.1646ms | 6.0735 KOps/s | 6.2890 KOps/s | |
test_items_stack_nested | 1.5859ms | 1.4785ms | 676.3412 Ops/s | 702.4100 Ops/s | |
test_items_stack_nested_leaf | 1.4354ms | 1.3469ms | 742.4647 Ops/s | 769.4412 Ops/s | |
test_items_stack_nested_locked | 0.8632ms | 0.7644ms | 1.3081 KOps/s | 1.3412 KOps/s | |
test_keys | 26.4890μs | 3.9648μs | 252.2187 KOps/s | 267.7743 KOps/s | |
test_keys_nested | 1.3604ms | 0.1416ms | 7.0619 KOps/s | 7.0025 KOps/s | |
test_keys_nested_locked | 0.3255ms | 0.1400ms | 7.1431 KOps/s | 7.2131 KOps/s | |
test_keys_nested_leaf | 0.2969ms | 0.1406ms | 7.1108 KOps/s | 7.0771 KOps/s | |
test_keys_stack_nested | 1.8425ms | 1.4086ms | 709.9382 Ops/s | 727.3636 Ops/s | |
test_keys_stack_nested_leaf | 1.6858ms | 1.4035ms | 712.4884 Ops/s | 740.0463 Ops/s | |
test_keys_stack_nested_locked | 0.7542ms | 0.6711ms | 1.4901 KOps/s | 1.5573 KOps/s | |
test_values | 6.8605μs | 1.1720μs | 853.2122 KOps/s | 862.3639 KOps/s | |
test_values_nested | 0.1165ms | 49.6535μs | 20.1396 KOps/s | 20.2775 KOps/s | |
test_values_nested_locked | 78.5360μs | 49.5049μs | 20.2000 KOps/s | 20.1332 KOps/s | |
test_values_nested_leaf | 62.2960μs | 44.5641μs | 22.4396 KOps/s | 22.5771 KOps/s | |
test_values_stack_nested | 1.5657ms | 1.2142ms | 823.5757 Ops/s | 863.8438 Ops/s | |
test_values_stack_nested_leaf | 1.2936ms | 1.2000ms | 833.3159 Ops/s | 884.7348 Ops/s | |
test_values_stack_nested_locked | 0.8244ms | 0.5128ms | 1.9502 KOps/s | 1.9844 KOps/s | |
test_membership | 11.9530μs | 1.3576μs | 736.6166 KOps/s | 748.7768 KOps/s | |
test_membership_nested | 26.5600μs | 2.7992μs | 357.2402 KOps/s | 358.6144 KOps/s | |
test_membership_nested_leaf | 28.1520μs | 2.8096μs | 355.9271 KOps/s | 374.9869 KOps/s | |
test_membership_stacked_nested | 51.4760μs | 11.8029μs | 84.7249 KOps/s | 87.8429 KOps/s | |
test_membership_stacked_nested_leaf | 34.9150μs | 11.7732μs | 84.9390 KOps/s | 88.7420 KOps/s | |
test_membership_nested_last | 33.5720μs | 5.8564μs | 170.7547 KOps/s | 176.3483 KOps/s | |
test_membership_nested_leaf_last | 19.4160μs | 5.9428μs | 168.2698 KOps/s | 176.4126 KOps/s | |
test_membership_stacked_nested_last | 0.3385ms | 0.1698ms | 5.8878 KOps/s | 6.2331 KOps/s | |
test_membership_stacked_nested_leaf_last | 50.5840μs | 13.9243μs | 71.8169 KOps/s | 73.7655 KOps/s | |
test_nested_getleaf | 33.2220μs | 10.8501μs | 92.1648 KOps/s | 96.1868 KOps/s | |
test_nested_get | 37.2290μs | 10.2687μs | 97.3835 KOps/s | 99.8696 KOps/s | |
test_stacked_getleaf | 1.1769ms | 0.6493ms | 1.5401 KOps/s | 1.6313 KOps/s | |
test_stacked_get | 1.2903ms | 0.6117ms | 1.6347 KOps/s | 1.7031 KOps/s | |
test_nested_getitemleaf | 38.3010μs | 10.7315μs | 93.1834 KOps/s | 94.7444 KOps/s | |
test_nested_getitem | 40.2350μs | 10.2817μs | 97.2603 KOps/s | 102.3342 KOps/s | |
test_stacked_getitemleaf | 1.2384ms | 0.6345ms | 1.5760 KOps/s | 1.6345 KOps/s | |
test_stacked_getitem | 0.8348ms | 0.6008ms | 1.6643 KOps/s | 1.6937 KOps/s | |
test_lock_nested | 53.5346ms | 0.5370ms | 1.8623 KOps/s | 2.0887 KOps/s | |
test_lock_stack_nested | 69.9764ms | 7.8091ms | 128.0562 Ops/s | 134.8617 Ops/s | |
test_unlock_nested | 58.2626ms | 0.5094ms | 1.9631 KOps/s | 2.0236 KOps/s | |
test_unlock_stack_nested | 62.8419ms | 7.5608ms | 132.2617 Ops/s | 206.5750 Ops/s | |
test_flatten_speed | 0.5367ms | 0.2668ms | 3.7487 KOps/s | 3.8225 KOps/s | |
test_unflatten_speed | 0.8683ms | 0.4690ms | 2.1324 KOps/s | 2.2428 KOps/s | |
test_common_ops | 2.8302ms | 0.6848ms | 1.4603 KOps/s | 1.4982 KOps/s | |
test_creation | 49.8230μs | 2.4307μs | 411.4026 KOps/s | 421.1741 KOps/s | |
test_creation_empty | 39.8950μs | 8.1685μs | 122.4214 KOps/s | 122.8991 KOps/s | |
test_creation_nested_1 | 45.7950μs | 11.5286μs | 86.7406 KOps/s | 89.4600 KOps/s | |
test_creation_nested_2 | 34.4640μs | 15.1832μs | 65.8624 KOps/s | 67.6852 KOps/s | |
test_clone | 90.0780μs | 13.5365μs | 73.8743 KOps/s | 76.4908 KOps/s | |
test_getitem[int] | 55.9040μs | 13.1572μs | 76.0042 KOps/s | 76.8686 KOps/s | |
test_getitem[slice_int] | 64.0690μs | 25.1341μs | 39.7866 KOps/s | 39.4468 KOps/s | |
test_getitem[range] | 0.2022ms | 42.2052μs | 23.6938 KOps/s | 22.3820 KOps/s | |
test_getitem[tuple] | 63.8790μs | 20.8067μs | 48.0615 KOps/s | 48.5792 KOps/s | |
test_getitem[list] | 0.2558ms | 38.3661μs | 26.0647 KOps/s | 25.5200 KOps/s | |
test_setitem_dim[int] | 53.6600μs | 28.1646μs | 35.5056 KOps/s | 36.9804 KOps/s | |
test_setitem_dim[slice_int] | 0.1204ms | 53.8913μs | 18.5559 KOps/s | 20.3673 KOps/s | |
test_setitem_dim[range] | 0.1206ms | 71.3533μs | 14.0148 KOps/s | 13.7529 KOps/s | |
test_setitem_dim[tuple] | 68.9880μs | 41.0921μs | 24.3356 KOps/s | 25.1132 KOps/s | |
test_setitem | 82.5240μs | 18.5597μs | 53.8803 KOps/s | 55.2216 KOps/s | |
test_set | 78.8670μs | 18.0650μs | 55.3555 KOps/s | 58.1057 KOps/s | |
test_set_shared | 1.8502ms | 0.1379ms | 7.2528 KOps/s | 7.3408 KOps/s | |
test_update | 95.6380μs | 23.5938μs | 42.3841 KOps/s | 43.3697 KOps/s | |
test_update_nested | 0.1012ms | 34.8542μs | 28.6909 KOps/s | 30.0434 KOps/s | |
test_set_nested | 79.7190μs | 19.8445μs | 50.3917 KOps/s | 52.6650 KOps/s | |
test_set_nested_new | 97.7320μs | 26.3600μs | 37.9363 KOps/s | 41.2183 KOps/s | |
test_select | 0.1084ms | 51.3400μs | 19.4780 KOps/s | 20.7869 KOps/s | |
test_unbind_speed | 0.6965ms | 0.3794ms | 2.6354 KOps/s | 2.6837 KOps/s | |
test_unbind_speed_stack0 | 66.7466ms | 5.3629ms | 186.4669 Ops/s | 250.2493 Ops/s | |
test_unbind_speed_stack1 | 2.4145μs | 0.6338μs | 1.5779 MOps/s | 1.6030 MOps/s | |
test_split | 55.9680ms | 1.7799ms | 561.8276 Ops/s | 556.5728 Ops/s | |
test_chunk | 50.4416ms | 1.7431ms | 573.6765 Ops/s | 568.6560 Ops/s | |
test_creation[device0] | 0.3886ms | 0.2946ms | 3.3943 KOps/s | 3.4662 KOps/s | |
test_creation_from_tensor | 3.4794ms | 0.3287ms | 3.0425 KOps/s | 3.0520 KOps/s | |
test_add_one[memmap_tensor0] | 73.8880μs | 25.3452μs | 39.4552 KOps/s | 40.2115 KOps/s | |
test_contiguous[memmap_tensor0] | 46.4170μs | 5.6985μs | 175.4851 KOps/s | 174.1955 KOps/s | |
test_stack[memmap_tensor0] | 60.2220μs | 19.5464μs | 51.1604 KOps/s | 51.7114 KOps/s | |
test_memmaptd_index | 0.2595ms | 0.1924ms | 5.1974 KOps/s | 5.1989 KOps/s | |
test_memmaptd_index_astensor | 0.4076ms | 0.2480ms | 4.0329 KOps/s | 4.0138 KOps/s | |
test_memmaptd_index_op | 0.6558ms | 0.4885ms | 2.0469 KOps/s | 2.0261 KOps/s | |
test_reshape_pytree | 67.5150μs | 23.5601μs | 42.4446 KOps/s | 44.8024 KOps/s | |
test_reshape_td | 66.6840μs | 31.2562μs | 31.9937 KOps/s | 32.8544 KOps/s | |
test_view_pytree | 55.5730μs | 23.0130μs | 43.4536 KOps/s | 44.7210 KOps/s | |
test_view_td | 29.5550μs | 4.8362μs | 206.7744 KOps/s | 209.4473 KOps/s | |
test_unbind_pytree | 0.6363ms | 26.1998μs | 38.1682 KOps/s | 38.6905 KOps/s | |
test_unbind_td | 0.1108ms | 60.0244μs | 16.6599 KOps/s | 16.8776 KOps/s | |
test_split_pytree | 57.9880μs | 26.3791μs | 37.9088 KOps/s | 39.2936 KOps/s | |
test_split_td | 90.0470μs | 46.0257μs | 21.7270 KOps/s | 21.5111 KOps/s | |
test_add_pytree | 79.6790μs | 32.3212μs | 30.9395 KOps/s | 31.2846 KOps/s | |
test_add_td | 95.5980μs | 44.3351μs | 22.5555 KOps/s | 22.7875 KOps/s | |
test_distributed | 20.1980μs | 5.9936μs | 166.8446 KOps/s | 172.0924 KOps/s | |
test_tdmodule | 1.5871ms | 22.4976μs | 44.4493 KOps/s | 47.6618 KOps/s | |
test_tdmodule_dispatch | 0.2051ms | 38.6877μs | 25.8480 KOps/s | 26.1992 KOps/s | |
test_tdseq | 42.5890μs | 23.4047μs | 42.7265 KOps/s | 42.3866 KOps/s | |
test_tdseq_dispatch | 0.1366ms | 42.0017μs | 23.8085 KOps/s | 23.8229 KOps/s | |
test_instantiation_functorch | 1.4487ms | 1.3375ms | 747.6533 Ops/s | 791.8239 Ops/s | |
test_instantiation_td | 69.5943ms | 1.1217ms | 891.5276 Ops/s | 950.4616 Ops/s | |
test_exec_functorch | 0.2310ms | 0.1610ms | 6.2098 KOps/s | 6.3443 KOps/s | |
test_exec_functional_call | 0.2278ms | 0.1501ms | 6.6630 KOps/s | 6.7826 KOps/s | |
test_exec_td | 0.2248ms | 0.1465ms | 6.8267 KOps/s | 6.8537 KOps/s | |
test_exec_td_decorator | 0.6893ms | 0.2234ms | 4.4767 KOps/s | 4.6542 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.9700ms | 0.8885ms | 1.1254 KOps/s | 1.1369 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7681ms | 0.4653ms | 2.1494 KOps/s | 2.1761 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.5787ms | 0.7809ms | 1.2806 KOps/s | 1.3111 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7039ms | 0.3826ms | 2.6137 KOps/s | 2.6167 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.2493ms | 1.5739ms | 635.3515 Ops/s | 643.6554 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0332ms | 0.5455ms | 1.8333 KOps/s | 1.8423 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.9228ms | 1.3604ms | 735.1042 Ops/s | 749.8734 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8358ms | 0.4247ms | 2.3548 KOps/s | 2.3844 KOps/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.4567ms | 12.8286μs | 77.9507 KOps/s | 78.3521 KOps/s | |
test_plain_set_stack_nested | 0.1961ms | 0.1159ms | 8.6247 KOps/s | 8.3774 KOps/s | |
test_plain_set_nested_inplace | 31.6400μs | 15.6392μs | 63.9417 KOps/s | 65.8399 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1759ms | 0.1447ms | 6.9127 KOps/s | 7.0749 KOps/s | |
test_items | 23.6400μs | 4.7129μs | 212.1835 KOps/s | 210.7644 KOps/s | |
test_items_nested | 0.3823ms | 0.3384ms | 2.9549 KOps/s | 2.9603 KOps/s | |
test_items_nested_locked | 0.3599ms | 0.3373ms | 2.9647 KOps/s | 2.9468 KOps/s | |
test_items_nested_leaf | 0.2235ms | 0.1978ms | 5.0549 KOps/s | 5.0043 KOps/s | |
test_items_stack_nested | 1.5352ms | 1.4793ms | 675.9928 Ops/s | 677.6973 Ops/s | |
test_items_stack_nested_leaf | 1.3825ms | 1.3186ms | 758.3694 Ops/s | 764.7870 Ops/s | |
test_items_stack_nested_locked | 0.8913ms | 0.8229ms | 1.2153 KOps/s | 1.2391 KOps/s | |
test_keys | 22.7600μs | 4.6217μs | 216.3690 KOps/s | 212.8548 KOps/s | |
test_keys_nested | 0.5788ms | 90.9418μs | 10.9960 KOps/s | 11.0957 KOps/s | |
test_keys_nested_locked | 0.1112ms | 89.8591μs | 11.1285 KOps/s | 11.1918 KOps/s | |
test_keys_nested_leaf | 42.6074ms | 86.6209μs | 11.5446 KOps/s | 12.1860 KOps/s | |
test_keys_stack_nested | 1.3389ms | 1.3016ms | 768.2778 Ops/s | 763.8173 Ops/s | |
test_keys_stack_nested_leaf | 1.3866ms | 1.2982ms | 770.2936 Ops/s | 767.4561 Ops/s | |
test_keys_stack_nested_locked | 0.7084ms | 0.6244ms | 1.6015 KOps/s | 1.6303 KOps/s | |
test_values | 10.3937μs | 1.8829μs | 531.0966 KOps/s | 526.9657 KOps/s | |
test_values_nested | 58.6310μs | 43.2600μs | 23.1160 KOps/s | 23.1413 KOps/s | |
test_values_nested_locked | 73.4110μs | 43.0648μs | 23.2208 KOps/s | 23.1102 KOps/s | |
test_values_nested_leaf | 0.1101ms | 37.5784μs | 26.6110 KOps/s | 26.7814 KOps/s | |
test_values_stack_nested | 1.1960ms | 1.1407ms | 876.6554 Ops/s | 890.2080 Ops/s | |
test_values_stack_nested_leaf | 1.1655ms | 1.1291ms | 885.6979 Ops/s | 899.1234 Ops/s | |
test_values_stack_nested_locked | 0.5819ms | 0.4978ms | 2.0087 KOps/s | 2.0517 KOps/s | |
test_membership | 3.9200μs | 0.9449μs | 1.0583 MOps/s | 1.0578 MOps/s | |
test_membership_nested | 16.2510μs | 2.2105μs | 452.3838 KOps/s | 445.6173 KOps/s | |
test_membership_nested_leaf | 13.3050μs | 2.1401μs | 467.2646 KOps/s | 465.3064 KOps/s | |
test_membership_stacked_nested | 45.3010μs | 10.8143μs | 92.4699 KOps/s | 90.9826 KOps/s | |
test_membership_stacked_nested_leaf | 30.6410μs | 10.8942μs | 91.7919 KOps/s | 90.6306 KOps/s | |
test_membership_nested_last | 38.2710μs | 4.6523μs | 214.9485 KOps/s | 215.5704 KOps/s | |
test_membership_nested_leaf_last | 27.0510μs | 4.6591μs | 214.6334 KOps/s | 216.4922 KOps/s | |
test_membership_stacked_nested_last | 0.2047ms | 0.1343ms | 7.4478 KOps/s | 7.5429 KOps/s | |
test_membership_stacked_nested_leaf_last | 81.8610μs | 12.8309μs | 77.9369 KOps/s | 78.2151 KOps/s | |
test_nested_getleaf | 31.0910μs | 8.3599μs | 119.6193 KOps/s | 119.3469 KOps/s | |
test_nested_get | 22.3410μs | 7.9436μs | 125.8880 KOps/s | 125.9552 KOps/s | |
test_stacked_getleaf | 0.6361ms | 0.5739ms | 1.7425 KOps/s | 1.7231 KOps/s | |
test_stacked_get | 0.6532ms | 0.5352ms | 1.8684 KOps/s | 1.8468 KOps/s | |
test_nested_getitemleaf | 27.8000μs | 8.4531μs | 118.2997 KOps/s | 118.1335 KOps/s | |
test_nested_getitem | 30.6100μs | 8.0113μs | 124.8231 KOps/s | 125.1091 KOps/s | |
test_stacked_getitemleaf | 0.6319ms | 0.5778ms | 1.7306 KOps/s | 1.7434 KOps/s | |
test_stacked_getitem | 0.6607ms | 0.5362ms | 1.8651 KOps/s | 1.8465 KOps/s | |
test_lock_nested | 4.4657ms | 0.4632ms | 2.1590 KOps/s | 2.1483 KOps/s | |
test_lock_stack_nested | 71.5859ms | 6.6803ms | 149.6929 Ops/s | 149.2541 Ops/s | |
test_unlock_nested | 1.3050ms | 0.4381ms | 2.2828 KOps/s | 1.9912 KOps/s | |
test_unlock_stack_nested | 67.7370ms | 7.4377ms | 134.4493 Ops/s | 135.1676 Ops/s | |
test_flatten_speed | 0.5211ms | 0.1877ms | 5.3289 KOps/s | 5.3870 KOps/s | |
test_unflatten_speed | 0.3925ms | 0.3604ms | 2.7745 KOps/s | 2.7689 KOps/s | |
test_common_ops | 1.0912ms | 0.6359ms | 1.5725 KOps/s | 1.6002 KOps/s | |
test_creation | 37.1400μs | 1.9575μs | 510.8514 KOps/s | 508.4584 KOps/s | |
test_creation_empty | 33.8310μs | 7.1542μs | 139.7780 KOps/s | 136.7844 KOps/s | |
test_creation_nested_1 | 32.5010μs | 9.5509μs | 104.7023 KOps/s | 104.0046 KOps/s | |
test_creation_nested_2 | 71.2920μs | 12.1751μs | 82.1350 KOps/s | 81.9819 KOps/s | |
test_clone | 93.9120μs | 14.8206μs | 67.4735 KOps/s | 71.4441 KOps/s | |
test_getitem[int] | 66.4520μs | 12.2180μs | 81.8466 KOps/s | 81.7625 KOps/s | |
test_getitem[slice_int] | 48.2710μs | 23.8226μs | 41.9770 KOps/s | 42.4479 KOps/s | |
test_getitem[range] | 68.2810μs | 41.5742μs | 24.0534 KOps/s | 25.4460 KOps/s | |
test_getitem[tuple] | 42.7710μs | 20.4790μs | 48.8305 KOps/s | 48.5173 KOps/s | |
test_getitem[list] | 0.2495ms | 38.4304μs | 26.0211 KOps/s | 27.4512 KOps/s | |
test_setitem_dim[int] | 43.3110μs | 27.3272μs | 36.5935 KOps/s | 38.6876 KOps/s | |
test_setitem_dim[slice_int] | 82.0320μs | 47.7186μs | 20.9562 KOps/s | 21.5820 KOps/s | |
test_setitem_dim[range] | 90.2520μs | 63.1909μs | 15.8251 KOps/s | 15.9215 KOps/s | |
test_setitem_dim[tuple] | 60.4110μs | 39.9395μs | 25.0379 KOps/s | 25.7163 KOps/s | |
test_setitem | 0.1034ms | 19.1817μs | 52.1331 KOps/s | 55.3744 KOps/s | |
test_set | 99.4120μs | 18.5442μs | 53.9252 KOps/s | 57.3446 KOps/s | |
test_set_shared | 0.5668ms | 0.1027ms | 9.7327 KOps/s | 9.9260 KOps/s | |
test_update | 0.1135ms | 22.9937μs | 43.4902 KOps/s | 45.8345 KOps/s | |
test_update_nested | 0.1284ms | 32.1817μs | 31.0735 KOps/s | 32.1069 KOps/s | |
test_set_nested | 98.6820μs | 19.9733μs | 50.0668 KOps/s | 52.6531 KOps/s | |
test_set_nested_new | 0.1090ms | 23.9336μs | 41.7822 KOps/s | 43.8990 KOps/s | |
test_select | 76.8920μs | 46.3370μs | 21.5810 KOps/s | 21.2858 KOps/s | |
test_to | 73.6220μs | 52.3354μs | 19.1075 KOps/s | 18.8297 KOps/s | |
test_to_nonblocking | 70.3810μs | 34.8765μs | 28.6726 KOps/s | 28.4916 KOps/s | |
test_unbind_speed | 0.3927ms | 0.3565ms | 2.8049 KOps/s | 2.8222 KOps/s | |
test_unbind_speed_stack0 | 63.3412ms | 5.2621ms | 190.0377 Ops/s | 191.1943 Ops/s | |
test_unbind_speed_stack1 | 1.2430μs | 0.5244μs | 1.9068 MOps/s | 1.9284 MOps/s | |
test_split | 54.2278ms | 1.8369ms | 544.4051 Ops/s | 558.3058 Ops/s | |
test_chunk | 54.0882ms | 1.8191ms | 549.7133 Ops/s | 564.2835 Ops/s | |
test_creation[device0] | 0.4516ms | 0.3097ms | 3.2290 KOps/s | 3.1815 KOps/s | |
test_creation[device1] | 0.7904ms | 0.3115ms | 3.2103 KOps/s | 3.1816 KOps/s | |
test_creation_from_tensor | 57.2059ms | 0.3640ms | 2.7472 KOps/s | 2.9490 KOps/s | |
test_add_one[memmap_tensor0] | 0.2546ms | 24.5831μs | 40.6783 KOps/s | 41.9724 KOps/s | |
test_add_one[memmap_tensor1] | 0.1845ms | 75.1746μs | 13.3024 KOps/s | 13.2829 KOps/s | |
test_contiguous[memmap_tensor0] | 32.1500μs | 6.0829μs | 164.3948 KOps/s | 169.5108 KOps/s | |
test_contiguous[memmap_tensor1] | 50.8800μs | 22.5668μs | 44.3129 KOps/s | 45.1950 KOps/s | |
test_stack[memmap_tensor0] | 49.3610μs | 21.5523μs | 46.3988 KOps/s | 50.8129 KOps/s | |
test_stack[memmap_tensor1] | 0.1623ms | 75.4035μs | 13.2620 KOps/s | 13.4455 KOps/s | |
test_memmaptd_index | 0.2619ms | 0.2262ms | 4.4212 KOps/s | 4.4292 KOps/s | |
test_memmaptd_index_astensor | 0.3879ms | 0.2811ms | 3.5579 KOps/s | 3.5861 KOps/s | |
test_memmaptd_index_op | 0.6509ms | 0.5683ms | 1.7596 KOps/s | 1.8521 KOps/s | |
test_reshape_pytree | 0.2609ms | 21.4457μs | 46.6293 KOps/s | 47.8269 KOps/s | |
test_reshape_td | 60.2110μs | 30.8421μs | 32.4232 KOps/s | 33.4734 KOps/s | |
test_view_pytree | 43.8300μs | 21.2342μs | 47.0939 KOps/s | 48.6227 KOps/s | |
test_view_td | 19.3600μs | 4.1562μs | 240.6035 KOps/s | 245.4158 KOps/s | |
test_unbind_pytree | 44.2310μs | 26.6925μs | 37.4637 KOps/s | 38.5549 KOps/s | |
test_unbind_td | 83.8310μs | 57.3718μs | 17.4302 KOps/s | 17.6359 KOps/s | |
test_split_pytree | 94.8720μs | 25.7152μs | 38.8875 KOps/s | 42.1588 KOps/s | |
test_split_td | 71.3520μs | 44.9081μs | 22.2677 KOps/s | 22.7556 KOps/s | |
test_add_pytree | 56.6310μs | 33.6811μs | 29.6902 KOps/s | 32.1449 KOps/s | |
test_add_td | 76.2010μs | 46.8318μs | 21.3530 KOps/s | 22.7685 KOps/s | |
test_distributed | 20.8110μs | 5.6930μs | 175.6552 KOps/s | 176.3318 KOps/s | |
test_tdmodule | 89.4720μs | 16.9690μs | 58.9311 KOps/s | 58.8565 KOps/s | |
test_tdmodule_dispatch | 0.2298ms | 33.2376μs | 30.0865 KOps/s | 30.1235 KOps/s | |
test_tdseq | 39.7800μs | 20.2016μs | 49.5011 KOps/s | 48.4943 KOps/s | |
test_tdseq_dispatch | 0.1357ms | 36.3542μs | 27.5071 KOps/s | 27.1658 KOps/s | |
test_instantiation_functorch | 1.7490ms | 1.7130ms | 583.7809 Ops/s | 595.4224 Ops/s | |
test_instantiation_td | 1.8487ms | 1.1924ms | 838.6647 Ops/s | 847.1615 Ops/s | |
test_exec_functorch | 0.2055ms | 0.1625ms | 6.1541 KOps/s | 6.2629 KOps/s | |
test_exec_functional_call | 0.2208ms | 0.1643ms | 6.0867 KOps/s | 6.2422 KOps/s | |
test_exec_td | 0.1883ms | 0.1542ms | 6.4845 KOps/s | 6.6362 KOps/s | |
test_exec_td_decorator | 1.0352ms | 0.2261ms | 4.4232 KOps/s | 4.4516 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1796ms | 1.0949ms | 913.3494 Ops/s | 905.0262 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.7440ms | 0.6384ms | 1.5665 KOps/s | 1.5885 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.0706ms | 1.0082ms | 991.8382 Ops/s | 991.0404 Ops/s | |
test_vmap_mlp_speed[False-False] | 0.6086ms | 0.5549ms | 1.8021 KOps/s | 1.7826 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.6210ms | 1.8237ms | 548.3348 Ops/s | 544.1566 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.1735ms | 0.7077ms | 1.4131 KOps/s | 1.4225 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 2.1100ms | 1.6404ms | 609.6198 Ops/s | 605.7532 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 1.0398ms | 0.5955ms | 1.6793 KOps/s | 1.6746 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.9590ms | 12.8752ms | 77.6687 Ops/s | 77.5838 Ops/s | |
test_vmap_transformer_speed[True-False] | 10.7422ms | 8.4431ms | 118.4392 Ops/s | 118.5362 Ops/s | |
test_vmap_transformer_speed[False-True] | 12.8261ms | 12.7340ms | 78.5302 Ops/s | 78.2186 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.6754ms | 8.3677ms | 119.5065 Ops/s | 119.4152 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 46.4936ms | 44.8865ms | 22.2784 Ops/s | 22.8169 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 99.3917ms | 22.2757ms | 44.8919 Ops/s | 44.8315 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 44.4971ms | 43.3295ms | 23.0790 Ops/s | 23.0208 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 0.1012s | 21.9076ms | 45.6463 Ops/s | 45.4709 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.