Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Memory-mapped nested tensors #618

Merged
merged 14 commits into from
May 8, 2024
Merged

[Feature] Memory-mapped nested tensors #618

merged 14 commits into from
May 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 15, 2024

This PR makes it possible to create memory-mapped tensors with heterogeneous (jagged) shapes using nested tensors as a backend.

The usage I have in mind with this is to amortize the cost of reading files: this can be done once and for all provided that there is enough space to store a single gigantic uint8 tensor on some scratch storage (more elaborate pipelines can be thought of if the dataset is split in several chunks etc)

Code example with torchvision where we decode a small dataset on a single tensor to cache the decoding phase, then do a similar preprocessing using resize with a resized buffer:

import torchvision
from tensordict import MemoryMappedTensor, TensorDict
import tempfile
import torch
from torchvision.transforms.v2 import ToTensor, Resize

with tempfile.TemporaryDirectory() as path:
    # Create a fake dataset
    shapes = []
    for i in range(1000):
        w, h = torch.randint(low=200, high=300, size=(2,)).tolist()
        image = torch.randint(256, (3, w, h), dtype=torch.uint8)
        shapes.append(image.shape)
        torchvision.io.write_jpeg(image, path + "/" + str(i) + ".jpeg")

    # Create a memmap tensor
    mmap_tensor = MemoryMappedTensor.zeros(torch.tensor(shapes), dtype=torch.uint8)
    td = TensorDict({"image": mmap_tensor, "index": torch.arange(1000)}, batch_size=[1000])
    totensor = ToTensor()
    def load_preproc(td):
        i = td["index"].item()
        img = torchvision.io.read_image(path + "/" + str(i) + ".jpeg")
        td.set_("image", totensor(img))
        return None
    print("Loading images")
    td.map(load_preproc, chunksize=0, mp_start_method="fork", num_workers=8, pbar=True)

    print("Resizing images")
    resize = Resize((32, 32))
    # resize preproc
    def preproc_resize(td):
        td["image"] = resize(td["image"])
        return td

    out = preproc_resize(td[0]).expand(td.shape).memmap_(path + "/resized", num_threads=32)
    td.map(preproc_resize, chunksize=0, mp_start_method="fork", num_workers=8, pbar=True, out=out)
    print(out)

cc @albanD @cpuhrsch @NicolasHug @mikaylagawarecki

Gist: https://gist.github.com/vmoens/d50dc6a7defe823444bcc80143bf37fd

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 15, 2024
Copy link

github-actions bot commented Jan 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.7860μs 18.2277μs 54.8614 KOps/s 56.5695 KOps/s $\color{#d91a1a}-3.02\%$
test_plain_set_stack_nested 40.1240μs 18.0632μs 55.3613 KOps/s 56.9798 KOps/s $\color{#d91a1a}-2.84\%$
test_plain_set_nested_inplace 75.2600μs 20.4980μs 48.7853 KOps/s 50.6004 KOps/s $\color{#d91a1a}-3.59\%$
test_plain_set_stack_nested_inplace 48.9820μs 20.5627μs 48.6316 KOps/s 50.2712 KOps/s $\color{#d91a1a}-3.26\%$
test_items 53.9800μs 2.6472μs 377.7578 KOps/s 399.0030 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_items_nested 0.4647ms 0.2733ms 3.6596 KOps/s 3.7740 KOps/s $\color{#d91a1a}-3.03\%$
test_items_nested_locked 1.3142ms 0.2720ms 3.6762 KOps/s 3.7642 KOps/s $\color{#d91a1a}-2.34\%$
test_items_nested_leaf 0.1410ms 78.9915μs 12.6596 KOps/s 12.8765 KOps/s $\color{#d91a1a}-1.68\%$
test_items_stack_nested 0.4515ms 0.2721ms 3.6751 KOps/s 3.7083 KOps/s $\color{#d91a1a}-0.90\%$
test_items_stack_nested_leaf 0.1633ms 78.5239μs 12.7350 KOps/s 12.5144 KOps/s $\color{#35bf28}+1.76\%$
test_items_stack_nested_locked 0.3547ms 0.2760ms 3.6231 KOps/s 3.7516 KOps/s $\color{#d91a1a}-3.43\%$
test_keys 17.9040μs 3.9339μs 254.2022 KOps/s 259.1716 KOps/s $\color{#d91a1a}-1.92\%$
test_keys_nested 0.2645ms 0.1385ms 7.2207 KOps/s 7.3393 KOps/s $\color{#d91a1a}-1.62\%$
test_keys_nested_locked 0.6909ms 0.1438ms 6.9560 KOps/s 7.1204 KOps/s $\color{#d91a1a}-2.31\%$
test_keys_nested_leaf 0.2199ms 0.1179ms 8.4782 KOps/s 8.8102 KOps/s $\color{#d91a1a}-3.77\%$
test_keys_stack_nested 0.2291ms 0.1359ms 7.3573 KOps/s 7.4408 KOps/s $\color{#d91a1a}-1.12\%$
test_keys_stack_nested_leaf 0.2251ms 0.1165ms 8.5819 KOps/s 8.7188 KOps/s $\color{#d91a1a}-1.57\%$
test_keys_stack_nested_locked 0.2461ms 0.1412ms 7.0800 KOps/s 7.1558 KOps/s $\color{#d91a1a}-1.06\%$
test_values 6.3217μs 1.1716μs 853.5398 KOps/s 864.8169 KOps/s $\color{#d91a1a}-1.30\%$
test_values_nested 0.1023ms 51.2144μs 19.5258 KOps/s 19.7718 KOps/s $\color{#d91a1a}-1.24\%$
test_values_nested_locked 99.6250μs 51.8848μs 19.2735 KOps/s 19.7298 KOps/s $\color{#d91a1a}-2.31\%$
test_values_nested_leaf 0.1249ms 46.4955μs 21.5074 KOps/s 21.7105 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested 0.1067ms 52.3423μs 19.1050 KOps/s 19.0051 KOps/s $\color{#35bf28}+0.53\%$
test_values_stack_nested_leaf 88.1530μs 45.7597μs 21.8533 KOps/s 21.6479 KOps/s $\color{#35bf28}+0.95\%$
test_values_stack_nested_locked 0.1068ms 51.9422μs 19.2522 KOps/s 19.1400 KOps/s $\color{#35bf28}+0.59\%$
test_membership 23.5240μs 1.3278μs 753.1529 KOps/s 711.6572 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_membership_nested 22.0810μs 3.4764μs 287.6559 KOps/s 288.0907 KOps/s $\color{#d91a1a}-0.15\%$
test_membership_nested_leaf 26.6690μs 3.4624μs 288.8191 KOps/s 285.8401 KOps/s $\color{#35bf28}+1.04\%$
test_membership_stacked_nested 18.1840μs 3.4450μs 290.2788 KOps/s 288.2840 KOps/s $\color{#35bf28}+0.69\%$
test_membership_stacked_nested_leaf 26.4900μs 3.4557μs 289.3757 KOps/s 289.4047 KOps/s $\color{#d91a1a}-0.01\%$
test_membership_nested_last 31.6790μs 4.2850μs 233.3735 KOps/s 234.8705 KOps/s $\color{#d91a1a}-0.64\%$
test_membership_nested_leaf_last 24.7460μs 4.2758μs 233.8760 KOps/s 240.7023 KOps/s $\color{#d91a1a}-2.84\%$
test_membership_stacked_nested_last 31.7890μs 13.8196μs 72.3608 KOps/s 211.2828 KOps/s $\textbf{\color{#d91a1a}-65.75\%}$
test_membership_stacked_nested_leaf_last 35.1550μs 13.7313μs 72.8263 KOps/s 207.6757 KOps/s $\textbf{\color{#d91a1a}-64.93\%}$
test_nested_getleaf 34.5650μs 10.8989μs 91.7524 KOps/s 92.6570 KOps/s $\color{#d91a1a}-0.98\%$
test_nested_get 49.3620μs 10.2502μs 97.5594 KOps/s 98.0809 KOps/s $\color{#d91a1a}-0.53\%$
test_stacked_getleaf 31.7690μs 10.7638μs 92.9036 KOps/s 93.9817 KOps/s $\color{#d91a1a}-1.15\%$
test_stacked_get 43.6250μs 10.0050μs 99.9496 KOps/s 98.4121 KOps/s $\color{#35bf28}+1.56\%$
test_nested_getitemleaf 51.3750μs 11.3175μs 88.3587 KOps/s 87.4075 KOps/s $\color{#35bf28}+1.09\%$
test_nested_getitem 41.5470μs 10.3518μs 96.6016 KOps/s 95.1406 KOps/s $\color{#35bf28}+1.54\%$
test_stacked_getitemleaf 43.7820μs 11.2580μs 88.8257 KOps/s 88.3316 KOps/s $\color{#35bf28}+0.56\%$
test_stacked_getitem 29.5250μs 10.4874μs 95.3528 KOps/s 95.2134 KOps/s $\color{#35bf28}+0.15\%$
test_lock_nested 51.3768ms 0.4000ms 2.5000 KOps/s 2.8217 KOps/s $\textbf{\color{#d91a1a}-11.40\%}$
test_lock_stack_nested 0.5846ms 0.2964ms 3.3741 KOps/s 3.2670 KOps/s $\color{#35bf28}+3.28\%$
test_unlock_nested 0.7024ms 0.3494ms 2.8621 KOps/s 2.4651 KOps/s $\textbf{\color{#35bf28}+16.11\%}$
test_unlock_stack_nested 0.3925ms 0.3029ms 3.3016 KOps/s 3.1690 KOps/s $\color{#35bf28}+4.18\%$
test_flatten_speed 0.1791ms 96.6209μs 10.3497 KOps/s 10.5530 KOps/s $\color{#d91a1a}-1.93\%$
test_unflatten_speed 1.3479ms 0.4315ms 2.3177 KOps/s 2.4194 KOps/s $\color{#d91a1a}-4.20\%$
test_common_ops 4.1537ms 0.7836ms 1.2762 KOps/s 1.3427 KOps/s $\color{#d91a1a}-4.95\%$
test_creation 23.4440μs 1.9530μs 512.0210 KOps/s 525.9846 KOps/s $\color{#d91a1a}-2.65\%$
test_creation_empty 30.9170μs 12.4589μs 80.2638 KOps/s 85.5076 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_creation_nested_1 49.8020μs 15.5432μs 64.3370 KOps/s 66.2247 KOps/s $\color{#d91a1a}-2.85\%$
test_creation_nested_2 53.7300μs 18.9296μs 52.8273 KOps/s 55.8277 KOps/s $\textbf{\color{#d91a1a}-5.37\%}$
test_clone 76.1420μs 13.5443μs 73.8316 KOps/s 72.6302 KOps/s $\color{#35bf28}+1.65\%$
test_getitem[int] 39.3430μs 11.5166μs 86.8313 KOps/s 84.6807 KOps/s $\color{#35bf28}+2.54\%$
test_getitem[slice_int] 53.8800μs 22.9536μs 43.5661 KOps/s 42.7721 KOps/s $\color{#35bf28}+1.86\%$
test_getitem[range] 85.1080μs 61.0112μs 16.3904 KOps/s 15.6229 KOps/s $\color{#35bf28}+4.91\%$
test_getitem[tuple] 50.3640μs 19.3397μs 51.7071 KOps/s 51.1106 KOps/s $\color{#35bf28}+1.17\%$
test_getitem[list] 0.1088ms 41.3608μs 24.1775 KOps/s 23.6225 KOps/s $\color{#35bf28}+2.35\%$
test_setitem_dim[int] 56.1050μs 36.8686μs 27.1233 KOps/s 27.5362 KOps/s $\color{#d91a1a}-1.50\%$
test_setitem_dim[slice_int] 0.3353ms 63.6019μs 15.7228 KOps/s 15.4846 KOps/s $\color{#35bf28}+1.54\%$
test_setitem_dim[range] 0.2355ms 85.0775μs 11.7540 KOps/s 11.5190 KOps/s $\color{#35bf28}+2.04\%$
test_setitem_dim[tuple] 88.6740μs 52.1191μs 19.1868 KOps/s 19.1052 KOps/s $\color{#35bf28}+0.43\%$
test_setitem 0.2940ms 22.0002μs 45.4541 KOps/s 46.8677 KOps/s $\color{#d91a1a}-3.02\%$
test_set 66.4440μs 21.1004μs 47.3925 KOps/s 45.8582 KOps/s $\color{#35bf28}+3.35\%$
test_set_shared 1.8132ms 0.1437ms 6.9584 KOps/s 6.8679 KOps/s $\color{#35bf28}+1.32\%$
test_update 0.3016ms 24.6384μs 40.5871 KOps/s 42.5529 KOps/s $\color{#d91a1a}-4.62\%$
test_update_nested 76.9830μs 32.3593μs 30.9030 KOps/s 30.6379 KOps/s $\color{#35bf28}+0.87\%$
test_update__nested 72.4140μs 25.0365μs 39.9418 KOps/s 38.7920 KOps/s $\color{#35bf28}+2.96\%$
test_set_nested 83.2550μs 23.2616μs 42.9893 KOps/s 43.9523 KOps/s $\color{#d91a1a}-2.19\%$
test_set_nested_new 70.4410μs 27.1063μs 36.8918 KOps/s 37.6120 KOps/s $\color{#d91a1a}-1.91\%$
test_select 89.1560μs 43.1938μs 23.1515 KOps/s 24.0197 KOps/s $\color{#d91a1a}-3.61\%$
test_select_nested 0.1130ms 60.8250μs 16.4406 KOps/s 16.1877 KOps/s $\color{#35bf28}+1.56\%$
test_exclude_nested 0.2870ms 0.1229ms 8.1337 KOps/s 8.1149 KOps/s $\color{#35bf28}+0.23\%$
test_empty[True] 1.0314ms 0.4028ms 2.4827 KOps/s 2.5113 KOps/s $\color{#d91a1a}-1.14\%$
test_empty[False] 20.4740μs 1.0787μs 927.0072 KOps/s 917.9824 KOps/s $\color{#35bf28}+0.98\%$
test_unbind_speed 1.6171ms 0.2626ms 3.8080 KOps/s 3.7414 KOps/s $\color{#35bf28}+1.78\%$
test_unbind_speed_stack0 0.3678ms 0.2485ms 4.0241 KOps/s 3.9386 KOps/s $\color{#35bf28}+2.17\%$
test_unbind_speed_stack1 65.7003ms 0.7391ms 1.3530 KOps/s 1.3140 KOps/s $\color{#35bf28}+2.97\%$
test_split 1.7195ms 1.4950ms 668.9017 Ops/s 622.0807 Ops/s $\textbf{\color{#35bf28}+7.53\%}$
test_chunk 66.7037ms 1.6088ms 621.5862 Ops/s 623.0959 Ops/s $\color{#d91a1a}-0.24\%$
test_creation[device0] 0.1970ms 0.1060ms 9.4336 KOps/s 9.2353 KOps/s $\color{#35bf28}+2.15\%$
test_creation_from_tensor 3.3174ms 85.2446μs 11.7309 KOps/s 11.8106 KOps/s $\color{#d91a1a}-0.67\%$
test_add_one[memmap_tensor0] 51.5560μs 5.3353μs 187.4303 KOps/s 177.7118 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_contiguous[memmap_tensor0] 16.6710μs 0.6377μs 1.5681 MOps/s 1.5663 MOps/s $\color{#35bf28}+0.12\%$
test_stack[memmap_tensor0] 47.3540μs 3.5913μs 278.4491 KOps/s 275.1956 KOps/s $\color{#35bf28}+1.18\%$
test_memmaptd_index 0.9927ms 0.2499ms 4.0014 KOps/s 4.1409 KOps/s $\color{#d91a1a}-3.37\%$
test_memmaptd_index_astensor 66.4264ms 0.3506ms 2.8521 KOps/s 3.1706 KOps/s $\textbf{\color{#d91a1a}-10.04\%}$
test_memmaptd_index_op 0.9467ms 0.6383ms 1.5667 KOps/s 1.6119 KOps/s $\color{#d91a1a}-2.80\%$
test_serialize_model 0.1727s 0.1085s 9.2153 Ops/s 8.9537 Ops/s $\color{#35bf28}+2.92\%$
test_serialize_model_pickle 0.4470s 0.3818s 2.6190 Ops/s 2.6059 Ops/s $\color{#35bf28}+0.50\%$
test_serialize_weights 0.1702s 0.1079s 9.2700 Ops/s 9.3285 Ops/s $\color{#d91a1a}-0.63\%$
test_serialize_weights_returnearly 0.1838s 0.1303s 7.6740 Ops/s 7.5341 Ops/s $\color{#35bf28}+1.86\%$
test_serialize_weights_pickle 0.7037s 0.4587s 2.1801 Ops/s 2.4446 Ops/s $\textbf{\color{#d91a1a}-10.82\%}$
test_serialize_weights_filesystem 98.0380ms 91.5172ms 10.9269 Ops/s 10.5872 Ops/s $\color{#35bf28}+3.21\%$
test_serialize_model_filesystem 0.1616s 97.6320ms 10.2425 Ops/s 8.9940 Ops/s $\textbf{\color{#35bf28}+13.88\%}$
test_reshape_pytree 62.8770μs 25.1758μs 39.7206 KOps/s 35.7473 KOps/s $\textbf{\color{#35bf28}+11.12\%}$
test_reshape_td 73.3460μs 33.9450μs 29.4594 KOps/s 29.1542 KOps/s $\color{#35bf28}+1.05\%$
test_view_pytree 60.2320μs 25.2184μs 39.6536 KOps/s 38.8682 KOps/s $\color{#35bf28}+2.02\%$
test_view_td 68.2870μs 37.2996μs 26.8099 KOps/s 26.2492 KOps/s $\color{#35bf28}+2.14\%$
test_unbind_pytree 71.8640μs 28.9349μs 34.5603 KOps/s 34.0054 KOps/s $\color{#35bf28}+1.63\%$
test_unbind_td 0.3873ms 38.4378μs 26.0161 KOps/s 26.0593 KOps/s $\color{#d91a1a}-0.17\%$
test_split_pytree 67.1850μs 28.7848μs 34.7406 KOps/s 33.1698 KOps/s $\color{#35bf28}+4.74\%$
test_split_td 0.1204ms 41.4001μs 24.1545 KOps/s 24.0333 KOps/s $\color{#35bf28}+0.50\%$
test_add_pytree 74.1180μs 34.7226μs 28.7997 KOps/s 27.8086 KOps/s $\color{#35bf28}+3.56\%$
test_add_td 0.1301ms 57.4540μs 17.4052 KOps/s 17.3799 KOps/s $\color{#35bf28}+0.15\%$
test_distributed 0.1781ms 99.0099μs 10.1000 KOps/s 9.7654 KOps/s $\color{#35bf28}+3.43\%$
test_tdmodule 64.3890μs 18.4375μs 54.2374 KOps/s 46.0387 KOps/s $\textbf{\color{#35bf28}+17.81\%}$
test_tdmodule_dispatch 54.9430μs 36.6283μs 27.3013 KOps/s 23.2146 KOps/s $\textbf{\color{#35bf28}+17.60\%}$
test_tdseq 41.6280μs 21.4955μs 46.5215 KOps/s 41.5284 KOps/s $\textbf{\color{#35bf28}+12.02\%}$
test_tdseq_dispatch 78.5360μs 42.5859μs 23.4820 KOps/s 22.3103 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_instantiation_functorch 1.7756ms 1.3408ms 745.8326 Ops/s 730.0239 Ops/s $\color{#35bf28}+2.17\%$
test_instantiation_td 1.5530ms 1.0322ms 968.8174 Ops/s 951.7012 Ops/s $\color{#35bf28}+1.80\%$
test_exec_functorch 0.2848ms 0.1602ms 6.2438 KOps/s 5.5431 KOps/s $\textbf{\color{#35bf28}+12.64\%}$
test_exec_functional_call 0.3278ms 0.1508ms 6.6319 KOps/s 5.8650 KOps/s $\textbf{\color{#35bf28}+13.08\%}$
test_exec_td 0.2957ms 0.1500ms 6.6652 KOps/s 6.3201 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_exec_td_decorator 0.7981ms 0.2240ms 4.4642 KOps/s 4.3312 KOps/s $\color{#35bf28}+3.07\%$
test_vmap_mlp_speed[True-True] 0.8655ms 0.4878ms 2.0502 KOps/s 1.9373 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_vmap_mlp_speed[True-False] 0.7520ms 0.4859ms 2.0579 KOps/s 1.9658 KOps/s $\color{#35bf28}+4.69\%$
test_vmap_mlp_speed[False-True] 0.6073ms 0.3960ms 2.5252 KOps/s 2.4110 KOps/s $\color{#35bf28}+4.73\%$
test_vmap_mlp_speed[False-False] 0.6047ms 0.4122ms 2.4262 KOps/s 2.4016 KOps/s $\color{#35bf28}+1.02\%$
test_vmap_mlp_speed_decorator[True-True] 1.2645ms 0.5556ms 1.7999 KOps/s 1.7437 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed_decorator[True-False] 0.7033ms 0.5520ms 1.8115 KOps/s 1.7556 KOps/s $\color{#35bf28}+3.19\%$
test_vmap_mlp_speed_decorator[False-True] 0.7050ms 0.4556ms 2.1948 KOps/s 2.1071 KOps/s $\color{#35bf28}+4.16\%$
test_vmap_mlp_speed_decorator[False-False] 0.6511ms 0.4537ms 2.2039 KOps/s 2.1026 KOps/s $\color{#35bf28}+4.82\%$
test_to_module_speed[True] 1.7797ms 1.6938ms 590.3917 Ops/s 574.4005 Ops/s $\color{#35bf28}+2.78\%$
test_to_module_speed[False] 1.7628ms 1.6669ms 599.9133 Ops/s 590.7810 Ops/s $\color{#35bf28}+1.55\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 78.6910μs 13.9830μs 71.5156 KOps/s 77.6362 KOps/s $\textbf{\color{#d91a1a}-7.88\%}$
test_plain_set_stack_nested 0.1476ms 0.1186ms 8.4301 KOps/s 8.4200 KOps/s $\color{#35bf28}+0.12\%$
test_plain_set_nested_inplace 36.8310μs 15.2954μs 65.3791 KOps/s 70.3905 KOps/s $\textbf{\color{#d91a1a}-7.12\%}$
test_plain_set_stack_nested_inplace 0.1861ms 0.1444ms 6.9254 KOps/s 6.8348 KOps/s $\color{#35bf28}+1.33\%$
test_items 19.6810μs 4.7096μs 212.3331 KOps/s 210.0282 KOps/s $\color{#35bf28}+1.10\%$
test_items_nested 0.3950ms 0.3406ms 2.9362 KOps/s 2.9094 KOps/s $\color{#35bf28}+0.92\%$
test_items_nested_locked 0.3919ms 0.3414ms 2.9294 KOps/s 2.8880 KOps/s $\color{#35bf28}+1.43\%$
test_items_nested_leaf 0.2471ms 0.1995ms 5.0133 KOps/s 4.9477 KOps/s $\color{#35bf28}+1.33\%$
test_items_stack_nested 1.4001ms 1.3100ms 763.3530 Ops/s 752.2565 Ops/s $\color{#35bf28}+1.48\%$
test_items_stack_nested_leaf 1.2439ms 1.1509ms 868.8751 Ops/s 856.1777 Ops/s $\color{#35bf28}+1.48\%$
test_items_stack_nested_locked 0.9757ms 0.9140ms 1.0940 KOps/s 1.0753 KOps/s $\color{#35bf28}+1.75\%$
test_keys 24.8310μs 4.6356μs 215.7197 KOps/s 208.0632 KOps/s $\color{#35bf28}+3.68\%$
test_keys_nested 0.7451ms 95.7431μs 10.4446 KOps/s 10.5487 KOps/s $\color{#d91a1a}-0.99\%$
test_keys_nested_locked 0.1230ms 95.1373μs 10.5111 KOps/s 10.6231 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_nested_leaf 0.1821ms 78.7503μs 12.6984 KOps/s 12.8588 KOps/s $\color{#d91a1a}-1.25\%$
test_keys_stack_nested 1.2602ms 1.1610ms 861.3324 Ops/s 856.8784 Ops/s $\color{#35bf28}+0.52\%$
test_keys_stack_nested_leaf 1.2693ms 1.1408ms 876.5694 Ops/s 871.9504 Ops/s $\color{#35bf28}+0.53\%$
test_keys_stack_nested_locked 0.7946ms 0.7336ms 1.3631 KOps/s 1.3696 KOps/s $\color{#d91a1a}-0.47\%$
test_values 8.1770μs 1.9073μs 524.3097 KOps/s 525.6596 KOps/s $\color{#d91a1a}-0.26\%$
test_values_nested 66.7310μs 44.9970μs 22.2237 KOps/s 21.9759 KOps/s $\color{#35bf28}+1.13\%$
test_values_nested_locked 67.7810μs 46.9872μs 21.2824 KOps/s 21.0426 KOps/s $\color{#35bf28}+1.14\%$
test_values_nested_leaf 61.4710μs 39.2926μs 25.4501 KOps/s 25.3630 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested 1.0456ms 0.9617ms 1.0398 KOps/s 1.0194 KOps/s $\color{#35bf28}+2.01\%$
test_values_stack_nested_leaf 1.0169ms 0.9559ms 1.0461 KOps/s 1.0393 KOps/s $\color{#35bf28}+0.66\%$
test_values_stack_nested_locked 0.6764ms 0.5929ms 1.6865 KOps/s 1.7029 KOps/s $\color{#d91a1a}-0.96\%$
test_membership 4.9982μs 0.9345μs 1.0701 MOps/s 937.5690 KOps/s $\textbf{\color{#35bf28}+14.14\%}$
test_membership_nested 28.7500μs 2.2541μs 443.6363 KOps/s 433.1978 KOps/s $\color{#35bf28}+2.41\%$
test_membership_nested_leaf 12.0250μs 2.1823μs 458.2336 KOps/s 448.8892 KOps/s $\color{#35bf28}+2.08\%$
test_membership_stacked_nested 30.1210μs 10.9830μs 91.0500 KOps/s 90.6520 KOps/s $\color{#35bf28}+0.44\%$
test_membership_stacked_nested_leaf 36.7600μs 10.9325μs 91.4707 KOps/s 91.0358 KOps/s $\color{#35bf28}+0.48\%$
test_membership_nested_last 37.6310μs 4.6744μs 213.9325 KOps/s 214.5814 KOps/s $\color{#d91a1a}-0.30\%$
test_membership_nested_leaf_last 20.1100μs 4.6652μs 214.3551 KOps/s 214.2584 KOps/s $\color{#35bf28}+0.05\%$
test_membership_stacked_nested_last 0.1872ms 0.1367ms 7.3144 KOps/s 7.3445 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested_leaf_last 40.6610μs 12.9835μs 77.0209 KOps/s 78.1081 KOps/s $\color{#d91a1a}-1.39\%$
test_nested_getleaf 33.2600μs 8.4535μs 118.2938 KOps/s 119.2126 KOps/s $\color{#d91a1a}-0.77\%$
test_nested_get 29.4600μs 7.9830μs 125.2659 KOps/s 125.8032 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getleaf 0.3860ms 0.3241ms 3.0854 KOps/s 3.1171 KOps/s $\color{#d91a1a}-1.02\%$
test_stacked_get 0.3542ms 0.2911ms 3.4356 KOps/s 3.4771 KOps/s $\color{#d91a1a}-1.19\%$
test_nested_getitemleaf 29.9200μs 8.4899μs 117.7865 KOps/s 118.1802 KOps/s $\color{#d91a1a}-0.33\%$
test_nested_getitem 40.8100μs 8.0514μs 124.2019 KOps/s 125.6241 KOps/s $\color{#d91a1a}-1.13\%$
test_stacked_getitemleaf 0.3925ms 0.3261ms 3.0666 KOps/s 3.0963 KOps/s $\color{#d91a1a}-0.96\%$
test_stacked_getitem 0.3636ms 0.2925ms 3.4186 KOps/s 3.4866 KOps/s $\color{#d91a1a}-1.95\%$
test_lock_nested 4.2707ms 0.4195ms 2.3837 KOps/s 2.3965 KOps/s $\color{#d91a1a}-0.54\%$
test_lock_stack_nested 84.7477ms 6.6316ms 150.7937 Ops/s 152.2350 Ops/s $\color{#d91a1a}-0.95\%$
test_unlock_nested 0.8417ms 0.4149ms 2.4103 KOps/s 2.3959 KOps/s $\color{#35bf28}+0.60\%$
test_unlock_stack_nested 83.2413ms 6.9509ms 143.8668 Ops/s 143.4190 Ops/s $\color{#35bf28}+0.31\%$
test_flatten_speed 0.8159ms 0.2663ms 3.7558 KOps/s 3.8123 KOps/s $\color{#d91a1a}-1.48\%$
test_unflatten_speed 0.4125ms 0.3578ms 2.7952 KOps/s 2.7814 KOps/s $\color{#35bf28}+0.50\%$
test_common_ops 1.1133ms 0.6304ms 1.5863 KOps/s 1.7145 KOps/s $\textbf{\color{#d91a1a}-7.47\%}$
test_creation 16.7400μs 1.6029μs 623.8618 KOps/s 620.0926 KOps/s $\color{#35bf28}+0.61\%$
test_creation_empty 36.6600μs 9.2012μs 108.6817 KOps/s 155.4726 KOps/s $\textbf{\color{#d91a1a}-30.10\%}$
test_creation_nested_1 24.8300μs 11.0711μs 90.3257 KOps/s 119.9234 KOps/s $\textbf{\color{#d91a1a}-24.68\%}$
test_creation_nested_2 35.4710μs 15.4865μs 64.5725 KOps/s 77.0570 KOps/s $\textbf{\color{#d91a1a}-16.20\%}$
test_clone 0.1077ms 13.2683μs 75.3673 KOps/s 74.8796 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[int] 33.2700μs 11.5322μs 86.7141 KOps/s 87.6510 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[slice_int] 42.0900μs 22.3651μs 44.7126 KOps/s 46.1288 KOps/s $\color{#d91a1a}-3.07\%$
test_getitem[range] 69.4210μs 37.4616μs 26.6940 KOps/s 26.4708 KOps/s $\color{#35bf28}+0.84\%$
test_getitem[tuple] 81.8710μs 19.6886μs 50.7907 KOps/s 50.6524 KOps/s $\color{#35bf28}+0.27\%$
test_getitem[list] 69.6110μs 34.1562μs 29.2773 KOps/s 28.4869 KOps/s $\color{#35bf28}+2.77\%$
test_setitem_dim[int] 44.6910μs 29.6466μs 33.7307 KOps/s 38.8304 KOps/s $\textbf{\color{#d91a1a}-13.13\%}$
test_setitem_dim[slice_int] 0.1129ms 49.6823μs 20.1279 KOps/s 21.3500 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_setitem_dim[range] 81.9010μs 64.4931μs 15.5055 KOps/s 16.3804 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_setitem_dim[tuple] 60.8310μs 43.4990μs 22.9890 KOps/s 24.9983 KOps/s $\textbf{\color{#d91a1a}-8.04\%}$
test_setitem 0.1047ms 18.2752μs 54.7189 KOps/s 58.6208 KOps/s $\textbf{\color{#d91a1a}-6.66\%}$
test_set 0.1036ms 17.7703μs 56.2737 KOps/s 61.5016 KOps/s $\textbf{\color{#d91a1a}-8.50\%}$
test_set_shared 2.6725ms 0.1059ms 9.4456 KOps/s 9.5698 KOps/s $\color{#d91a1a}-1.30\%$
test_update 0.1052ms 20.8988μs 47.8497 KOps/s 54.8069 KOps/s $\textbf{\color{#d91a1a}-12.69\%}$
test_update_nested 0.1149ms 27.2181μs 36.7403 KOps/s 40.9936 KOps/s $\textbf{\color{#d91a1a}-10.38\%}$
test_set_nested 0.1037ms 19.2626μs 51.9142 KOps/s 57.0433 KOps/s $\textbf{\color{#d91a1a}-8.99\%}$
test_set_nested_new 0.1029ms 22.0281μs 45.3966 KOps/s 48.5748 KOps/s $\textbf{\color{#d91a1a}-6.54\%}$
test_select 72.1810μs 43.2788μs 23.1060 KOps/s 24.2165 KOps/s $\color{#d91a1a}-4.59\%$
test_to 74.2810μs 54.5266μs 18.3397 KOps/s 17.9176 KOps/s $\color{#35bf28}+2.36\%$
test_to_nonblocking 60.0310μs 34.9344μs 28.6251 KOps/s 28.5408 KOps/s $\color{#35bf28}+0.30\%$
test_unbind_speed 0.3938ms 0.3311ms 3.0198 KOps/s 3.0484 KOps/s $\color{#d91a1a}-0.94\%$
test_unbind_speed_stack0 79.8306ms 3.9118ms 255.6381 Ops/s 258.2614 Ops/s $\color{#d91a1a}-1.02\%$
test_unbind_speed_stack1 1.7020μs 0.5404μs 1.8506 MOps/s 1.8817 MOps/s $\color{#d91a1a}-1.65\%$
test_split 74.4525ms 1.7626ms 567.3481 Ops/s 567.1699 Ops/s $\color{#35bf28}+0.03\%$
test_chunk 1.7427ms 1.6126ms 620.1160 Ops/s 575.4602 Ops/s $\textbf{\color{#35bf28}+7.76\%}$
test_creation[device0] 0.1455ms 72.9066μs 13.7162 KOps/s 13.7046 KOps/s $\color{#35bf28}+0.08\%$
test_creation_from_tensor 0.1557ms 54.8215μs 18.2410 KOps/s 17.5030 KOps/s $\color{#35bf28}+4.22\%$
test_add_one[memmap_tensor0] 0.1309ms 7.1415μs 140.0263 KOps/s 139.4473 KOps/s $\color{#35bf28}+0.42\%$
test_contiguous[memmap_tensor0] 10.5100μs 0.6594μs 1.5164 MOps/s 1.5558 MOps/s $\color{#d91a1a}-2.53\%$
test_stack[memmap_tensor0] 28.7600μs 4.7133μs 212.1642 KOps/s 215.9260 KOps/s $\color{#d91a1a}-1.74\%$
test_memmaptd_index 0.3078ms 0.2473ms 4.0440 KOps/s 4.0337 KOps/s $\color{#35bf28}+0.26\%$
test_memmaptd_index_astensor 0.3792ms 0.3054ms 3.2741 KOps/s 3.2479 KOps/s $\color{#35bf28}+0.80\%$
test_memmaptd_index_op 0.7982ms 0.6117ms 1.6348 KOps/s 1.7084 KOps/s $\color{#d91a1a}-4.31\%$
test_serialize_model 92.3941ms 88.7824ms 11.2635 Ops/s 9.7365 Ops/s $\textbf{\color{#35bf28}+15.68\%}$
test_serialize_model_pickle 1.6736s 1.3043s 0.7667 Ops/s 0.8078 Ops/s $\textbf{\color{#d91a1a}-5.09\%}$
test_serialize_weights 0.1642s 94.2504ms 10.6100 Ops/s 9.8813 Ops/s $\textbf{\color{#35bf28}+7.37\%}$
test_serialize_weights_returnearly 0.2556s 77.3048ms 12.9358 Ops/s 14.7583 Ops/s $\textbf{\color{#d91a1a}-12.35\%}$
test_serialize_weights_pickle 1.3508s 1.2364s 0.8088 Ops/s 0.8086 Ops/s $\color{#35bf28}+0.02\%$
test_reshape_pytree 52.0010μs 24.5403μs 40.7493 KOps/s 40.7775 KOps/s $\color{#d91a1a}-0.07\%$
test_reshape_td 45.9610μs 29.7057μs 33.6636 KOps/s 34.0989 KOps/s $\color{#d91a1a}-1.28\%$
test_view_pytree 48.0310μs 24.3914μs 40.9980 KOps/s 40.9617 KOps/s $\color{#35bf28}+0.09\%$
test_view_td 16.9610μs 4.0917μs 244.3992 KOps/s 243.8218 KOps/s $\color{#35bf28}+0.24\%$
test_unbind_pytree 53.2110μs 30.7500μs 32.5203 KOps/s 32.8811 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_td 84.9110μs 52.9478μs 18.8865 KOps/s 19.2954 KOps/s $\color{#d91a1a}-2.12\%$
test_split_pytree 56.4810μs 28.6968μs 34.8471 KOps/s 35.1274 KOps/s $\color{#d91a1a}-0.80\%$
test_split_td 0.7338ms 42.1409μs 23.7299 KOps/s 24.7466 KOps/s $\color{#d91a1a}-4.11\%$
test_add_pytree 63.6910μs 36.4969μs 27.3996 KOps/s 25.8637 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_add_td 84.9510μs 49.9184μs 20.0327 KOps/s 21.3978 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_distributed 1.9565ms 77.3753μs 12.9240 KOps/s 13.8942 KOps/s $\textbf{\color{#d91a1a}-6.98\%}$
test_tdmodule 0.1091ms 18.5853μs 53.8059 KOps/s 59.7216 KOps/s $\textbf{\color{#d91a1a}-9.91\%}$
test_tdmodule_dispatch 0.1528ms 35.9143μs 27.8441 KOps/s 30.9851 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_tdseq 36.3500μs 21.7804μs 45.9129 KOps/s 51.0085 KOps/s $\textbf{\color{#d91a1a}-9.99\%}$
test_tdseq_dispatch 65.8600μs 39.2804μs 25.4580 KOps/s 28.5701 KOps/s $\textbf{\color{#d91a1a}-10.89\%}$
test_instantiation_functorch 1.8348ms 1.7069ms 585.8599 Ops/s 597.4758 Ops/s $\color{#d91a1a}-1.94\%$
test_instantiation_td 1.8416ms 1.1939ms 837.5702 Ops/s 854.5622 Ops/s $\color{#d91a1a}-1.99\%$
test_exec_functorch 0.2185ms 0.1634ms 6.1216 KOps/s 6.1819 KOps/s $\color{#d91a1a}-0.98\%$
test_exec_functional_call 0.2210ms 0.1626ms 6.1503 KOps/s 5.9580 KOps/s $\color{#35bf28}+3.23\%$
test_exec_td 0.2168ms 0.1555ms 6.4288 KOps/s 6.4201 KOps/s $\color{#35bf28}+0.14\%$
test_exec_td_decorator 0.9811ms 0.1975ms 5.0628 KOps/s 5.1379 KOps/s $\color{#d91a1a}-1.46\%$
test_vmap_mlp_speed[True-True] 1.2178ms 1.1361ms 880.2297 Ops/s 895.9307 Ops/s $\color{#d91a1a}-1.75\%$
test_vmap_mlp_speed[True-False] 0.8477ms 0.6769ms 1.4774 KOps/s 1.4907 KOps/s $\color{#d91a1a}-0.89\%$
test_vmap_mlp_speed[False-True] 1.1215ms 1.0403ms 961.2550 Ops/s 959.9178 Ops/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed[False-False] 0.6912ms 0.6014ms 1.6628 KOps/s 1.5998 KOps/s $\color{#35bf28}+3.93\%$
test_vmap_mlp_speed_decorator[True-True] 3.2957ms 2.5889ms 386.2585 Ops/s 404.0691 Ops/s $\color{#d91a1a}-4.41\%$
test_vmap_mlp_speed_decorator[True-False] 0.9916ms 0.7241ms 1.3810 KOps/s 1.3785 KOps/s $\color{#35bf28}+0.18\%$
test_vmap_mlp_speed_decorator[False-True] 2.5446ms 2.1532ms 464.4220 Ops/s 484.3077 Ops/s $\color{#d91a1a}-4.11\%$
test_vmap_mlp_speed_decorator[False-False] 1.0656ms 0.6204ms 1.6118 KOps/s 1.6002 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_transformer_speed[True-True] 12.8833ms 12.6478ms 79.0652 Ops/s 80.0815 Ops/s $\color{#d91a1a}-1.27\%$
test_vmap_transformer_speed[True-False] 8.6056ms 8.3497ms 119.7645 Ops/s 120.6891 Ops/s $\color{#d91a1a}-0.77\%$
test_vmap_transformer_speed[False-True] 12.9328ms 12.6050ms 79.3335 Ops/s 80.9700 Ops/s $\color{#d91a1a}-2.02\%$
test_vmap_transformer_speed[False-False] 8.5080ms 8.2825ms 120.7365 Ops/s 121.5871 Ops/s $\color{#d91a1a}-0.70\%$
test_vmap_transformer_speed_decorator[True-True] 0.1669s 84.6779ms 11.8095 Ops/s 12.2013 Ops/s $\color{#d91a1a}-3.21\%$
test_vmap_transformer_speed_decorator[True-False] 21.7837ms 20.0070ms 49.9826 Ops/s 50.0742 Ops/s $\color{#d91a1a}-0.18\%$
test_vmap_transformer_speed_decorator[False-True] 71.5679ms 70.2682ms 14.2312 Ops/s 14.7320 Ops/s $\color{#d91a1a}-3.40\%$
test_vmap_transformer_speed_decorator[False-False] 0.1180s 21.5114ms 46.4870 Ops/s 46.4841 Ops/s $+0.01\%$

@vmoens
Copy link
Contributor Author

vmoens commented Mar 20, 2024

Blocked by pytorch/pytorch#117711

@vmoens vmoens merged commit 04e52a1 into main May 8, 2024
29 of 37 checks passed
@vmoens vmoens deleted the nested-memmap branch May 8, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants