Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Unbind and stack tds in map with chunksize=0 #589

Merged
merged 5 commits into from
Dec 4, 2023
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 4, 2023

This PR allows map to be called on single items of a tensordict.
This is useful whenever we want to work independently on each element of a stack, and where the stack dimension should be discarded.

This example uses transforms.v2 in torchvision and returns tensors of the same type as the original one, which wouldn't be possible without this PR

import torchvision
from torchvision.tv_tensors import BoundingBoxes, Image
from torchvision.transforms.v2 import Compose, Resize, Grayscale
from tensordict import TensorDict
import torch

if __name__ == "__main__":
    image = Image(torch.randint(255, (5, 3, 64, 64), dtype=torch.uint8))
    box = BoundingBoxes(
        torch.randint(0, 64, size=(5, 4)),
        format="XYXY",
        canvas_size=(64, 64)
        )
    label = torch.randint(10, ())

    td = TensorDict(
        {"image": image, "label": label, "meta": {"box": box}},
        [],
        device="cpu"
        )

    t = Compose([Resize((32, 32)), Grayscale()])

    tdt = t(td)
    # Makes a lazy stack of the tensordicts
    td = torch.stack([td.clone() for _ in range(100)])
    # Map the transform over all items on 2 separate procs
    print('calling map on', td)
    tdt = td.map(t, dim=0, num_workers=2, chunksize=0)
    print(tdt[0]) # the first tensordict of the lazy stack contains the original types!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 4, 2023
@vmoens vmoens marked this pull request as ready for review December 4, 2023 12:31
Copy link

github-actions bot commented Dec 4, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 26.2490μs 16.0729μs 62.2166 KOps/s 63.7023 KOps/s $\color{#d91a1a}-2.33\%$
test_plain_set_stack_nested 0.1775ms 0.1424ms 7.0240 KOps/s 7.0553 KOps/s $\color{#d91a1a}-0.44\%$
test_plain_set_nested_inplace 41.8070μs 18.1678μs 55.0425 KOps/s 57.1476 KOps/s $\color{#d91a1a}-3.68\%$
test_plain_set_stack_nested_inplace 0.3347ms 0.1768ms 5.6576 KOps/s 5.7121 KOps/s $\color{#d91a1a}-0.95\%$
test_items 23.5930μs 2.4470μs 408.6644 KOps/s 387.5942 KOps/s $\textbf{\color{#35bf28}+5.44\%}$
test_items_nested 0.3707ms 0.2744ms 3.6445 KOps/s 3.6757 KOps/s $\color{#d91a1a}-0.85\%$
test_items_nested_locked 1.2992ms 0.2692ms 3.7143 KOps/s 3.7146 KOps/s $-0.01\%$
test_items_nested_leaf 0.7269ms 0.1639ms 6.1007 KOps/s 5.9352 KOps/s $\color{#35bf28}+2.79\%$
test_items_stack_nested 1.5978ms 1.4865ms 672.7170 Ops/s 676.2021 Ops/s $\color{#d91a1a}-0.52\%$
test_items_stack_nested_leaf 1.4773ms 1.3594ms 735.6319 Ops/s 744.9070 Ops/s $\color{#d91a1a}-1.25\%$
test_items_stack_nested_locked 1.8572ms 0.7654ms 1.3065 KOps/s 1.3103 KOps/s $\color{#d91a1a}-0.29\%$
test_keys 20.5280μs 3.8278μs 261.2499 KOps/s 260.7809 KOps/s $\color{#35bf28}+0.18\%$
test_keys_nested 0.5008ms 0.1410ms 7.0902 KOps/s 6.6272 KOps/s $\textbf{\color{#35bf28}+6.99\%}$
test_keys_nested_locked 0.3323ms 0.1403ms 7.1287 KOps/s 7.1299 KOps/s $\color{#d91a1a}-0.02\%$
test_keys_nested_leaf 0.4015ms 0.1383ms 7.2326 KOps/s 7.1290 KOps/s $\color{#35bf28}+1.45\%$
test_keys_stack_nested 1.6785ms 1.3994ms 714.6050 Ops/s 713.4334 Ops/s $\color{#35bf28}+0.16\%$
test_keys_stack_nested_leaf 2.0549ms 1.4007ms 713.9338 Ops/s 712.6524 Ops/s $\color{#35bf28}+0.18\%$
test_keys_stack_nested_locked 0.8215ms 0.6662ms 1.5010 KOps/s 1.4891 KOps/s $\color{#35bf28}+0.79\%$
test_values 7.4288μs 1.1639μs 859.1852 KOps/s 830.3258 KOps/s $\color{#35bf28}+3.48\%$
test_values_nested 94.1250μs 48.9625μs 20.4238 KOps/s 20.1741 KOps/s $\color{#35bf28}+1.24\%$
test_values_nested_locked 0.1127ms 49.4222μs 20.2338 KOps/s 20.0689 KOps/s $\color{#35bf28}+0.82\%$
test_values_nested_leaf 87.8340μs 43.9645μs 22.7456 KOps/s 22.7732 KOps/s $\color{#d91a1a}-0.12\%$
test_values_stack_nested 2.0071ms 1.1981ms 834.6320 Ops/s 832.1607 Ops/s $\color{#35bf28}+0.30\%$
test_values_stack_nested_leaf 1.8801ms 1.1878ms 841.8999 Ops/s 842.2989 Ops/s $\color{#d91a1a}-0.05\%$
test_values_stack_nested_locked 0.8788ms 0.5100ms 1.9609 KOps/s 1.9671 KOps/s $\color{#d91a1a}-0.32\%$
test_membership 16.2800μs 1.3288μs 752.5444 KOps/s 744.3440 KOps/s $\color{#35bf28}+1.10\%$
test_membership_nested 19.6970μs 2.7912μs 358.2720 KOps/s 352.1577 KOps/s $\color{#35bf28}+1.74\%$
test_membership_nested_leaf 20.7880μs 2.8021μs 356.8703 KOps/s 350.9733 KOps/s $\color{#35bf28}+1.68\%$
test_membership_stacked_nested 36.8290μs 11.6346μs 85.9502 KOps/s 83.2797 KOps/s $\color{#35bf28}+3.21\%$
test_membership_stacked_nested_leaf 64.4100μs 11.7014μs 85.4602 KOps/s 83.5885 KOps/s $\color{#35bf28}+2.24\%$
test_membership_nested_last 33.7120μs 5.9290μs 168.6618 KOps/s 164.3784 KOps/s $\color{#35bf28}+2.61\%$
test_membership_nested_leaf_last 28.2620μs 5.8695μs 170.3735 KOps/s 171.7498 KOps/s $\color{#d91a1a}-0.80\%$
test_membership_stacked_nested_last 0.2180ms 0.1677ms 5.9637 KOps/s 5.9617 KOps/s $\color{#35bf28}+0.03\%$
test_membership_stacked_nested_leaf_last 38.3310μs 13.7737μs 72.6020 KOps/s 71.1527 KOps/s $\color{#35bf28}+2.04\%$
test_nested_getleaf 34.5840μs 10.5811μs 94.5084 KOps/s 94.6763 KOps/s $\color{#d91a1a}-0.18\%$
test_nested_get 30.1970μs 10.1722μs 98.3071 KOps/s 99.4247 KOps/s $\color{#d91a1a}-1.12\%$
test_stacked_getleaf 1.1799ms 0.6420ms 1.5575 KOps/s 1.5448 KOps/s $\color{#35bf28}+0.82\%$
test_stacked_get 1.2336ms 0.6104ms 1.6383 KOps/s 1.6144 KOps/s $\color{#35bf28}+1.48\%$
test_nested_getitemleaf 31.6690μs 10.5114μs 95.1351 KOps/s 93.6094 KOps/s $\color{#35bf28}+1.63\%$
test_nested_getitem 32.3800μs 9.9975μs 100.0250 KOps/s 100.0532 KOps/s $\color{#d91a1a}-0.03\%$
test_stacked_getitemleaf 0.7902ms 0.6483ms 1.5426 KOps/s 1.5428 KOps/s $-0.01\%$
test_stacked_getitem 1.0540ms 0.6139ms 1.6290 KOps/s 1.6222 KOps/s $\color{#35bf28}+0.42\%$
test_lock_nested 59.2189ms 0.6206ms 1.6113 KOps/s 1.7681 KOps/s $\textbf{\color{#d91a1a}-8.86\%}$
test_lock_stack_nested 9.2049ms 5.0943ms 196.2986 Ops/s 192.8540 Ops/s $\color{#35bf28}+1.79\%$
test_unlock_nested 1.0874ms 0.4474ms 2.2351 KOps/s 2.2447 KOps/s $\color{#d91a1a}-0.43\%$
test_unlock_stack_nested 70.0117ms 6.7504ms 148.1397 Ops/s 144.0595 Ops/s $\color{#35bf28}+2.83\%$
test_flatten_speed 0.4067ms 0.2682ms 3.7287 KOps/s 3.7361 KOps/s $\color{#d91a1a}-0.20\%$
test_unflatten_speed 0.9001ms 0.4606ms 2.1713 KOps/s 2.1758 KOps/s $\color{#d91a1a}-0.21\%$
test_common_ops 3.8868ms 0.6753ms 1.4809 KOps/s 1.4519 KOps/s $\color{#35bf28}+2.00\%$
test_creation 20.9990μs 2.4745μs 404.1141 KOps/s 391.7043 KOps/s $\color{#35bf28}+3.17\%$
test_creation_empty 21.8000μs 8.0960μs 123.5176 KOps/s 119.8698 KOps/s $\color{#35bf28}+3.04\%$
test_creation_nested_1 48.5610μs 11.3994μs 87.7240 KOps/s 86.5649 KOps/s $\color{#35bf28}+1.34\%$
test_creation_nested_2 36.0070μs 14.8855μs 67.1793 KOps/s 65.5320 KOps/s $\color{#35bf28}+2.51\%$
test_clone 55.8040μs 13.9470μs 71.6998 KOps/s 71.6404 KOps/s $\color{#35bf28}+0.08\%$
test_getitem[int] 43.2400μs 13.2060μs 75.7229 KOps/s 75.2974 KOps/s $\color{#35bf28}+0.57\%$
test_getitem[slice_int] 63.2080μs 26.3672μs 37.9259 KOps/s 38.6658 KOps/s $\color{#d91a1a}-1.91\%$
test_getitem[range] 82.0130μs 43.7995μs 22.8313 KOps/s 22.1563 KOps/s $\color{#35bf28}+3.05\%$
test_getitem[tuple] 61.2840μs 20.9084μs 47.8278 KOps/s 46.9783 KOps/s $\color{#35bf28}+1.81\%$
test_getitem[list] 0.1768ms 38.6786μs 25.8541 KOps/s 25.0042 KOps/s $\color{#35bf28}+3.40\%$
test_setitem_dim[int] 47.7290μs 27.6988μs 36.1026 KOps/s 36.0449 KOps/s $\color{#35bf28}+0.16\%$
test_setitem_dim[slice_int] 80.4300μs 52.7041μs 18.9739 KOps/s 18.9997 KOps/s $\color{#d91a1a}-0.14\%$
test_setitem_dim[range] 0.1202ms 70.3028μs 14.2242 KOps/s 13.8421 KOps/s $\color{#35bf28}+2.76\%$
test_setitem_dim[tuple] 61.8250μs 40.9375μs 24.4275 KOps/s 24.0742 KOps/s $\color{#35bf28}+1.47\%$
test_setitem 73.4270μs 18.8064μs 53.1734 KOps/s 52.6565 KOps/s $\color{#35bf28}+0.98\%$
test_set 87.8640μs 17.9769μs 55.6269 KOps/s 54.1487 KOps/s $\color{#35bf28}+2.73\%$
test_set_shared 3.2228ms 0.1385ms 7.2191 KOps/s 6.8497 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_update 93.6650μs 19.1854μs 52.1230 KOps/s 49.6919 KOps/s $\color{#35bf28}+4.89\%$
test_update_nested 88.2740μs 27.5824μs 36.2550 KOps/s 35.9558 KOps/s $\color{#35bf28}+0.83\%$
test_set_nested 69.7200μs 20.5606μs 48.6367 KOps/s 48.5017 KOps/s $\color{#35bf28}+0.28\%$
test_set_nested_new 88.4550μs 26.1056μs 38.3060 KOps/s 39.1683 KOps/s $\color{#d91a1a}-2.20\%$
test_select 0.1041ms 52.2659μs 19.1330 KOps/s 19.5127 KOps/s $\color{#d91a1a}-1.95\%$
test_unbind_speed 0.7175ms 0.3818ms 2.6194 KOps/s 2.6257 KOps/s $\color{#d91a1a}-0.24\%$
test_unbind_speed_stack0 66.2016ms 4.7034ms 212.6120 Ops/s 210.7403 Ops/s $\color{#35bf28}+0.89\%$
test_unbind_speed_stack1 2.0053μs 0.6445μs 1.5517 MOps/s 1.5887 MOps/s $\color{#d91a1a}-2.33\%$
test_split 56.2655ms 1.7791ms 562.0754 Ops/s 560.0625 Ops/s $\color{#35bf28}+0.36\%$
test_chunk 58.7017ms 1.7535ms 570.2729 Ops/s 567.4537 Ops/s $\color{#35bf28}+0.50\%$
test_creation[device0] 0.4948ms 0.2933ms 3.4091 KOps/s 3.4243 KOps/s $\color{#d91a1a}-0.44\%$
test_creation_from_tensor 3.5844ms 0.3301ms 3.0298 KOps/s 3.0163 KOps/s $\color{#35bf28}+0.45\%$
test_add_one[memmap_tensor0] 70.3510μs 25.4034μs 39.3649 KOps/s 39.6142 KOps/s $\color{#d91a1a}-0.63\%$
test_contiguous[memmap_tensor0] 25.5080μs 5.8870μs 169.8655 KOps/s 162.0815 KOps/s $\color{#35bf28}+4.80\%$
test_stack[memmap_tensor0] 60.3430μs 19.4528μs 51.4065 KOps/s 50.0121 KOps/s $\color{#35bf28}+2.79\%$
test_memmaptd_index 0.3635ms 0.2081ms 4.8045 KOps/s 4.8451 KOps/s $\color{#d91a1a}-0.84\%$
test_memmaptd_index_astensor 0.5307ms 0.2673ms 3.7413 KOps/s 3.7326 KOps/s $\color{#35bf28}+0.23\%$
test_memmaptd_index_op 0.8038ms 0.5046ms 1.9816 KOps/s 1.9458 KOps/s $\color{#35bf28}+1.84\%$
test_reshape_pytree 0.2339ms 23.6248μs 42.3284 KOps/s 42.2182 KOps/s $\color{#35bf28}+0.26\%$
test_reshape_td 71.2830μs 32.1026μs 31.1502 KOps/s 30.2129 KOps/s $\color{#35bf28}+3.10\%$
test_view_pytree 0.4033ms 23.1696μs 43.1599 KOps/s 42.4550 KOps/s $\color{#35bf28}+1.66\%$
test_view_td 17.9630μs 4.9004μs 204.0662 KOps/s 201.8268 KOps/s $\color{#35bf28}+1.11\%$
test_unbind_pytree 87.0720μs 26.4076μs 37.8679 KOps/s 37.5133 KOps/s $\color{#35bf28}+0.95\%$
test_unbind_td 0.1458ms 59.5665μs 16.7880 KOps/s 16.5535 KOps/s $\color{#35bf28}+1.42\%$
test_split_pytree 60.6430μs 26.4421μs 37.8185 KOps/s 37.6835 KOps/s $\color{#35bf28}+0.36\%$
test_split_td 0.1002ms 46.3669μs 21.5671 KOps/s 21.0127 KOps/s $\color{#35bf28}+2.64\%$
test_add_pytree 84.8590μs 32.2172μs 31.0393 KOps/s 30.9275 KOps/s $\color{#35bf28}+0.36\%$
test_add_td 0.1069ms 45.6218μs 21.9194 KOps/s 21.9585 KOps/s $\color{#d91a1a}-0.18\%$
test_distributed 36.1470μs 6.3103μs 158.4712 KOps/s 166.1509 KOps/s $\color{#d91a1a}-4.62\%$
test_tdmodule 1.7198ms 22.7683μs 43.9207 KOps/s 46.4709 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_tdmodule_dispatch 0.1754ms 38.7191μs 25.8271 KOps/s 25.6657 KOps/s $\color{#35bf28}+0.63\%$
test_tdseq 43.8920μs 24.4628μs 40.8784 KOps/s 42.0138 KOps/s $\color{#d91a1a}-2.70\%$
test_tdseq_dispatch 0.1392ms 43.5245μs 22.9755 KOps/s 23.0294 KOps/s $\color{#d91a1a}-0.23\%$
test_instantiation_functorch 1.5215ms 1.3195ms 757.8363 Ops/s 755.8897 Ops/s $\color{#35bf28}+0.26\%$
test_instantiation_td 1.5487ms 1.0390ms 962.5052 Ops/s 893.0799 Ops/s $\textbf{\color{#35bf28}+7.77\%}$
test_exec_functorch 0.2219ms 0.1590ms 6.2908 KOps/s 6.2716 KOps/s $\color{#35bf28}+0.31\%$
test_exec_functional_call 0.3994ms 0.1484ms 6.7383 KOps/s 6.6676 KOps/s $\color{#35bf28}+1.06\%$
test_exec_td 0.2130ms 0.1423ms 7.0268 KOps/s 6.0609 KOps/s $\textbf{\color{#35bf28}+15.94\%}$
test_exec_td_decorator 0.7598ms 0.1767ms 5.6609 KOps/s 5.5551 KOps/s $\color{#35bf28}+1.90\%$
test_vmap_mlp_speed[True-True] 1.4205ms 0.8776ms 1.1395 KOps/s 1.1188 KOps/s $\color{#35bf28}+1.86\%$
test_vmap_mlp_speed[True-False] 0.5789ms 0.4642ms 2.1544 KOps/s 2.1523 KOps/s $\color{#35bf28}+0.10\%$
test_vmap_mlp_speed[False-True] 1.0678ms 0.7612ms 1.3138 KOps/s 1.2926 KOps/s $\color{#35bf28}+1.64\%$
test_vmap_mlp_speed[False-False] 0.7795ms 0.3891ms 2.5699 KOps/s 2.6281 KOps/s $\color{#d91a1a}-2.22\%$
test_vmap_mlp_speed_decorator[True-True] 2.7122ms 1.7416ms 574.1725 Ops/s 566.7688 Ops/s $\color{#35bf28}+1.31\%$
test_vmap_mlp_speed_decorator[True-False] 0.9776ms 0.5173ms 1.9331 KOps/s 1.9446 KOps/s $\color{#d91a1a}-0.60\%$
test_vmap_mlp_speed_decorator[False-True] 1.9201ms 1.4516ms 688.8769 Ops/s 673.8862 Ops/s $\color{#35bf28}+2.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.7772ms 0.3988ms 2.5076 KOps/s 2.5132 KOps/s $\color{#d91a1a}-0.22\%$

Copy link

github-actions bot commented Dec 4, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.6691ms 12.5737μs 79.5308 KOps/s 79.8783 KOps/s $\color{#d91a1a}-0.44\%$
test_plain_set_stack_nested 0.1436ms 0.1157ms 8.6427 KOps/s 8.3623 KOps/s $\color{#35bf28}+3.35\%$
test_plain_set_nested_inplace 39.9310μs 13.8574μs 72.1634 KOps/s 72.2631 KOps/s $\color{#d91a1a}-0.14\%$
test_plain_set_stack_nested_inplace 0.1770ms 0.1425ms 7.0179 KOps/s 7.0398 KOps/s $\color{#d91a1a}-0.31\%$
test_items 24.1710μs 4.6637μs 214.4198 KOps/s 214.8345 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested 0.3891ms 0.3379ms 2.9597 KOps/s 2.9537 KOps/s $\color{#35bf28}+0.20\%$
test_items_nested_locked 0.3927ms 0.3384ms 2.9552 KOps/s 2.9188 KOps/s $\color{#35bf28}+1.25\%$
test_items_nested_leaf 0.2402ms 0.1980ms 5.0496 KOps/s 4.9813 KOps/s $\color{#35bf28}+1.37\%$
test_items_stack_nested 1.5848ms 1.4793ms 676.0038 Ops/s 677.8422 Ops/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested_leaf 1.3977ms 1.3060ms 765.7243 Ops/s 764.3955 Ops/s $\color{#35bf28}+0.17\%$
test_items_stack_nested_locked 0.8737ms 0.8145ms 1.2278 KOps/s 1.1841 KOps/s $\color{#35bf28}+3.69\%$
test_keys 27.6110μs 4.5786μs 218.4095 KOps/s 218.9096 KOps/s $\color{#d91a1a}-0.23\%$
test_keys_nested 3.5756ms 90.8047μs 11.0126 KOps/s 11.0778 KOps/s $\color{#d91a1a}-0.59\%$
test_keys_nested_locked 0.1170ms 90.4965μs 11.0501 KOps/s 11.1765 KOps/s $\color{#d91a1a}-1.13\%$
test_keys_nested_leaf 41.9019ms 86.3293μs 11.5836 KOps/s 12.2639 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_keys_stack_nested 1.3663ms 1.2895ms 775.5183 Ops/s 778.7913 Ops/s $\color{#d91a1a}-0.42\%$
test_keys_stack_nested_leaf 1.3423ms 1.2757ms 783.8609 Ops/s 780.8083 Ops/s $\color{#35bf28}+0.39\%$
test_keys_stack_nested_locked 0.7013ms 0.6186ms 1.6166 KOps/s 1.5852 KOps/s $\color{#35bf28}+1.98\%$
test_values 9.0473μs 1.8898μs 529.1602 KOps/s 529.3324 KOps/s $\color{#d91a1a}-0.03\%$
test_values_nested 62.9330μs 42.7495μs 23.3921 KOps/s 23.4151 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested_locked 67.3220μs 45.0829μs 22.1814 KOps/s 22.0521 KOps/s $\color{#35bf28}+0.59\%$
test_values_nested_leaf 57.7520μs 37.2287μs 26.8610 KOps/s 26.8618 KOps/s $-0.00\%$
test_values_stack_nested 1.2581ms 1.1284ms 886.1831 Ops/s 880.4665 Ops/s $\color{#35bf28}+0.65\%$
test_values_stack_nested_leaf 1.1970ms 1.1302ms 884.8298 Ops/s 880.2564 Ops/s $\color{#35bf28}+0.52\%$
test_values_stack_nested_locked 0.5912ms 0.4977ms 2.0093 KOps/s 1.9550 KOps/s $\color{#35bf28}+2.78\%$
test_membership 5.2762μs 0.9412μs 1.0624 MOps/s 1.0638 MOps/s $\color{#d91a1a}-0.13\%$
test_membership_nested 16.9610μs 2.1618μs 462.5743 KOps/s 453.4986 KOps/s $\color{#35bf28}+2.00\%$
test_membership_nested_leaf 11.6740μs 2.0255μs 493.7067 KOps/s 478.0759 KOps/s $\color{#35bf28}+3.27\%$
test_membership_stacked_nested 45.6710μs 10.8029μs 92.5673 KOps/s 90.7485 KOps/s $\color{#35bf28}+2.00\%$
test_membership_stacked_nested_leaf 28.5410μs 10.8234μs 92.3928 KOps/s 92.2739 KOps/s $\color{#35bf28}+0.13\%$
test_membership_nested_last 32.7720μs 4.5251μs 220.9911 KOps/s 219.0454 KOps/s $\color{#35bf28}+0.89\%$
test_membership_nested_leaf_last 20.2910μs 4.5165μs 221.4099 KOps/s 219.5337 KOps/s $\color{#35bf28}+0.85\%$
test_membership_stacked_nested_last 0.1683ms 0.1335ms 7.4929 KOps/s 7.4293 KOps/s $\color{#35bf28}+0.86\%$
test_membership_stacked_nested_leaf_last 41.4020μs 12.6537μs 79.0280 KOps/s 79.4016 KOps/s $\color{#d91a1a}-0.47\%$
test_nested_getleaf 28.7010μs 8.3534μs 119.7113 KOps/s 117.3702 KOps/s $\color{#35bf28}+1.99\%$
test_nested_get 29.5710μs 7.9157μs 126.3314 KOps/s 124.3579 KOps/s $\color{#35bf28}+1.59\%$
test_stacked_getleaf 0.6228ms 0.5622ms 1.7788 KOps/s 1.7824 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_get 0.6347ms 0.5455ms 1.8333 KOps/s 1.8834 KOps/s $\color{#d91a1a}-2.66\%$
test_nested_getitemleaf 31.8010μs 8.4367μs 118.5299 KOps/s 118.3964 KOps/s $\color{#35bf28}+0.11\%$
test_nested_getitem 28.5220μs 7.9544μs 125.7164 KOps/s 123.8566 KOps/s $\color{#35bf28}+1.50\%$
test_stacked_getitemleaf 0.6305ms 0.5673ms 1.7628 KOps/s 1.7832 KOps/s $\color{#d91a1a}-1.15\%$
test_stacked_getitem 0.5921ms 0.5418ms 1.8458 KOps/s 1.8993 KOps/s $\color{#d91a1a}-2.82\%$
test_lock_nested 3.2547ms 0.5524ms 1.8103 KOps/s 1.8270 KOps/s $\color{#d91a1a}-0.92\%$
test_lock_stack_nested 81.3943ms 7.2070ms 138.7540 Ops/s 137.9253 Ops/s $\color{#35bf28}+0.60\%$
test_unlock_nested 2.3936ms 0.4269ms 2.3426 KOps/s 2.3490 KOps/s $\color{#d91a1a}-0.27\%$
test_unlock_stack_nested 67.4711ms 6.2365ms 160.3453 Ops/s 163.2099 Ops/s $\color{#d91a1a}-1.76\%$
test_flatten_speed 0.2338ms 0.1865ms 5.3628 KOps/s 5.3392 KOps/s $\color{#35bf28}+0.44\%$
test_unflatten_speed 0.4276ms 0.3631ms 2.7538 KOps/s 2.7515 KOps/s $\color{#35bf28}+0.09\%$
test_common_ops 1.1081ms 0.5874ms 1.7023 KOps/s 1.7081 KOps/s $\color{#d91a1a}-0.34\%$
test_creation 32.1910μs 2.1194μs 471.8379 KOps/s 484.3657 KOps/s $\color{#d91a1a}-2.59\%$
test_creation_empty 22.6120μs 6.5741μs 152.1114 KOps/s 152.6133 KOps/s $\color{#d91a1a}-0.33\%$
test_creation_nested_1 40.9520μs 8.8675μs 112.7720 KOps/s 113.6153 KOps/s $\color{#d91a1a}-0.74\%$
test_creation_nested_2 29.2610μs 11.4055μs 87.6767 KOps/s 87.1290 KOps/s $\color{#35bf28}+0.63\%$
test_clone 0.1052ms 14.0145μs 71.3547 KOps/s 70.0824 KOps/s $\color{#35bf28}+1.82\%$
test_getitem[int] 39.0110μs 11.9711μs 83.5346 KOps/s 83.1167 KOps/s $\color{#35bf28}+0.50\%$
test_getitem[slice_int] 39.9720μs 22.0066μs 45.4409 KOps/s 44.3595 KOps/s $\color{#35bf28}+2.44\%$
test_getitem[range] 62.4420μs 38.5965μs 25.9091 KOps/s 25.5435 KOps/s $\color{#35bf28}+1.43\%$
test_getitem[tuple] 47.6430μs 20.2312μs 49.4287 KOps/s 49.3468 KOps/s $\color{#35bf28}+0.17\%$
test_getitem[list] 0.2366ms 34.6509μs 28.8593 KOps/s 28.7219 KOps/s $\color{#35bf28}+0.48\%$
test_setitem_dim[int] 41.6220μs 25.3295μs 39.4797 KOps/s 40.5974 KOps/s $\color{#d91a1a}-2.75\%$
test_setitem_dim[slice_int] 71.7340μs 44.3288μs 22.5587 KOps/s 23.3350 KOps/s $\color{#d91a1a}-3.33\%$
test_setitem_dim[range] 96.6140μs 59.9948μs 16.6681 KOps/s 16.6969 KOps/s $\color{#d91a1a}-0.17\%$
test_setitem_dim[tuple] 54.3420μs 37.7515μs 26.4890 KOps/s 26.3681 KOps/s $\color{#35bf28}+0.46\%$
test_setitem 97.4030μs 17.2003μs 58.1386 KOps/s 55.7457 KOps/s $\color{#35bf28}+4.29\%$
test_set 95.3430μs 16.6128μs 60.1946 KOps/s 57.2660 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_set_shared 2.7193ms 0.1031ms 9.7038 KOps/s 8.7529 KOps/s $\textbf{\color{#35bf28}+10.86\%}$
test_update 87.4040μs 17.9454μs 55.7247 KOps/s 53.6129 KOps/s $\color{#35bf28}+3.94\%$
test_update_nested 98.2330μs 24.2706μs 41.2021 KOps/s 39.4117 KOps/s $\color{#35bf28}+4.54\%$
test_set_nested 58.9920μs 18.7007μs 53.4741 KOps/s 54.3543 KOps/s $\color{#d91a1a}-1.62\%$
test_set_nested_new 89.1730μs 22.6595μs 44.1317 KOps/s 43.4220 KOps/s $\color{#35bf28}+1.63\%$
test_select 0.1172ms 44.7271μs 22.3578 KOps/s 21.7877 KOps/s $\color{#35bf28}+2.62\%$
test_to 73.6630μs 52.8666μs 18.9156 KOps/s 18.9183 KOps/s $\color{#d91a1a}-0.01\%$
test_to_nonblocking 65.0120μs 33.8022μs 29.5839 KOps/s 28.8592 KOps/s $\color{#35bf28}+2.51\%$
test_unbind_speed 0.4056ms 0.3577ms 2.7958 KOps/s 2.8413 KOps/s $\color{#d91a1a}-1.60\%$
test_unbind_speed_stack0 62.0898ms 4.5716ms 218.7432 Ops/s 235.0193 Ops/s $\textbf{\color{#d91a1a}-6.93\%}$
test_unbind_speed_stack1 2.0021μs 0.5251μs 1.9043 MOps/s 1.8645 MOps/s $\color{#35bf28}+2.14\%$
test_split 1.9353ms 1.6443ms 608.1760 Ops/s 573.2499 Ops/s $\textbf{\color{#35bf28}+6.09\%}$
test_chunk 53.3118ms 1.7317ms 577.4709 Ops/s 580.1378 Ops/s $\color{#d91a1a}-0.46\%$
test_creation[device0] 0.4149ms 0.3055ms 3.2736 KOps/s 3.2825 KOps/s $\color{#d91a1a}-0.27\%$
test_creation[device1] 55.4840ms 0.3303ms 3.0274 KOps/s 3.2365 KOps/s $\textbf{\color{#d91a1a}-6.46\%}$
test_creation_from_tensor 0.5687ms 0.3328ms 3.0052 KOps/s 3.0040 KOps/s $\color{#35bf28}+0.04\%$
test_add_one[memmap_tensor0] 0.2669ms 23.2078μs 43.0890 KOps/s 42.7841 KOps/s $\color{#35bf28}+0.71\%$
test_add_one[memmap_tensor1] 0.2107ms 72.0568μs 13.8779 KOps/s 13.7892 KOps/s $\color{#35bf28}+0.64\%$
test_contiguous[memmap_tensor0] 26.2300μs 5.7688μs 173.3471 KOps/s 177.2258 KOps/s $\color{#d91a1a}-2.19\%$
test_contiguous[memmap_tensor1] 43.0120μs 21.2719μs 47.0104 KOps/s 46.4226 KOps/s $\color{#35bf28}+1.27\%$
test_stack[memmap_tensor0] 49.7620μs 18.7181μs 53.4244 KOps/s 52.4914 KOps/s $\color{#35bf28}+1.78\%$
test_stack[memmap_tensor1] 0.1524ms 71.8149μs 13.9247 KOps/s 13.0345 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_memmaptd_index 0.2982ms 0.2278ms 4.3908 KOps/s 4.3376 KOps/s $\color{#35bf28}+1.23\%$
test_memmaptd_index_astensor 0.4017ms 0.2850ms 3.5082 KOps/s 3.5030 KOps/s $\color{#35bf28}+0.15\%$
test_memmaptd_index_op 0.6090ms 0.5414ms 1.8471 KOps/s 1.8072 KOps/s $\color{#35bf28}+2.21\%$
test_reshape_pytree 47.7320μs 20.3050μs 49.2490 KOps/s 48.3486 KOps/s $\color{#35bf28}+1.86\%$
test_reshape_td 53.7530μs 29.8198μs 33.5348 KOps/s 33.2508 KOps/s $\color{#35bf28}+0.85\%$
test_view_pytree 51.0030μs 19.9471μs 50.1325 KOps/s 49.5953 KOps/s $\color{#35bf28}+1.08\%$
test_view_td 17.6210μs 3.9776μs 251.4073 KOps/s 246.5422 KOps/s $\color{#35bf28}+1.97\%$
test_unbind_pytree 43.4110μs 25.7506μs 38.8340 KOps/s 39.4009 KOps/s $\color{#d91a1a}-1.44\%$
test_unbind_td 78.1730μs 55.0730μs 18.1577 KOps/s 17.8616 KOps/s $\color{#35bf28}+1.66\%$
test_split_pytree 47.7710μs 23.4933μs 42.5652 KOps/s 42.0999 KOps/s $\color{#35bf28}+1.11\%$
test_split_td 67.4220μs 42.1480μs 23.7259 KOps/s 23.7794 KOps/s $\color{#d91a1a}-0.22\%$
test_add_pytree 49.6620μs 31.2964μs 31.9525 KOps/s 30.4371 KOps/s $\color{#35bf28}+4.98\%$
test_add_td 83.8530μs 43.3759μs 23.0543 KOps/s 22.4379 KOps/s $\color{#35bf28}+2.75\%$
test_distributed 23.5910μs 5.4655μs 182.9673 KOps/s 182.5189 KOps/s $\color{#35bf28}+0.25\%$
test_tdmodule 37.3220μs 16.0774μs 62.1991 KOps/s 60.6416 KOps/s $\color{#35bf28}+2.57\%$
test_tdmodule_dispatch 0.1952ms 32.0408μs 31.2102 KOps/s 31.0099 KOps/s $\color{#35bf28}+0.65\%$
test_tdseq 34.6110μs 19.4069μs 51.5280 KOps/s 51.8164 KOps/s $\color{#d91a1a}-0.56\%$
test_tdseq_dispatch 78.9430μs 34.8034μs 28.7328 KOps/s 28.3503 KOps/s $\color{#35bf28}+1.35\%$
test_instantiation_functorch 2.2430ms 1.6833ms 594.0625 Ops/s 606.2592 Ops/s $\color{#d91a1a}-2.01\%$
test_instantiation_td 65.0585ms 1.2454ms 802.9470 Ops/s 860.2498 Ops/s $\textbf{\color{#d91a1a}-6.66\%}$
test_exec_functorch 0.2062ms 0.1547ms 6.4646 KOps/s 6.3867 KOps/s $\color{#35bf28}+1.22\%$
test_exec_functional_call 0.2145ms 0.1528ms 6.5428 KOps/s 6.3857 KOps/s $\color{#35bf28}+2.46\%$
test_exec_td 0.2311ms 0.1461ms 6.8466 KOps/s 6.8478 KOps/s $\color{#d91a1a}-0.02\%$
test_exec_td_decorator 0.8067ms 0.1853ms 5.3973 KOps/s 5.4245 KOps/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed[True-True] 1.1315ms 1.0482ms 954.0189 Ops/s 948.6899 Ops/s $\color{#35bf28}+0.56\%$
test_vmap_mlp_speed[True-False] 0.6799ms 0.6014ms 1.6627 KOps/s 1.6330 KOps/s $\color{#35bf28}+1.82\%$
test_vmap_mlp_speed[False-True] 1.0485ms 0.9620ms 1.0395 KOps/s 1.0428 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_mlp_speed[False-False] 0.6119ms 0.5365ms 1.8639 KOps/s 1.8457 KOps/s $\color{#35bf28}+0.98\%$
test_vmap_mlp_speed_decorator[True-True] 2.8557ms 1.9670ms 508.3947 Ops/s 506.3380 Ops/s $\color{#35bf28}+0.41\%$
test_vmap_mlp_speed_decorator[True-False] 1.1071ms 0.6445ms 1.5517 KOps/s 1.5320 KOps/s $\color{#35bf28}+1.29\%$
test_vmap_mlp_speed_decorator[False-True] 2.1351ms 1.7017ms 587.6359 Ops/s 582.1585 Ops/s $\color{#35bf28}+0.94\%$
test_vmap_mlp_speed_decorator[False-False] 0.9265ms 0.5464ms 1.8303 KOps/s 1.7905 KOps/s $\color{#35bf28}+2.22\%$
test_vmap_transformer_speed[True-True] 12.4069ms 12.2611ms 81.5588 Ops/s 80.3128 Ops/s $\color{#35bf28}+1.55\%$
test_vmap_transformer_speed[True-False] 8.4357ms 8.1133ms 123.2544 Ops/s 121.2130 Ops/s $\color{#35bf28}+1.68\%$
test_vmap_transformer_speed[False-True] 12.8488ms 12.1675ms 82.1864 Ops/s 80.9779 Ops/s $\color{#35bf28}+1.49\%$
test_vmap_transformer_speed[False-False] 8.0498ms 7.9955ms 125.0709 Ops/s 121.8171 Ops/s $\color{#35bf28}+2.67\%$
test_vmap_transformer_speed_decorator[True-True] 63.4635ms 62.5962ms 15.9754 Ops/s 14.8472 Ops/s $\textbf{\color{#35bf28}+7.60\%}$
test_vmap_transformer_speed_decorator[True-False] 21.8868ms 19.6784ms 50.8171 Ops/s 49.3829 Ops/s $\color{#35bf28}+2.90\%$
test_vmap_transformer_speed_decorator[False-True] 58.3044ms 57.0333ms 17.5336 Ops/s 17.4943 Ops/s $\color{#35bf28}+0.22\%$
test_vmap_transformer_speed_decorator[False-False] 21.4439ms 19.2787ms 51.8708 Ops/s 46.5763 Ops/s $\textbf{\color{#35bf28}+11.37\%}$

@vmoens vmoens added the enhancement New feature or request label Dec 4, 2023
@vmoens vmoens merged commit f16c076 into main Dec 4, 2023
13 of 19 checks passed
@vmoens vmoens deleted the unbind-map branch December 4, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants