Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Avoid TDParams parameters and buffers construction when obvious + new constructor #1100

Merged
merged 7 commits into from
Nov 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 21, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 21, 2024
…vious

ghstack-source-id: 6c833eb5b6144174e733bc7eedae435a6e9fce18
Pull Request resolved: #1100
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2024
@vmoens vmoens changed the title [Refactor] Avoid TDParams parameters and buffers construction when obvious [Refactor] Avoid TDParams parameters and buffers construction when obvious + new constructor Nov 21, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Nov 21, 2024

@kurtamohler
I'm investigating some issues with compile and I'd appretiate some help if you have time (ofc!)

This works:

from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn

module = torch.nn.Module()
module.params = torch.nn.Parameter(torch.randn(3))
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)

@torch.compile(fullgraph=True)
def func(z, params2):
    with params2.to_module(module):
        out = z + module.params
    return out

print(func(torch.zeros(()), params2))

All of these don't

  1. Using a plain tensordict: _dynamo doesn't like that we set attribute to a MutableMapping
from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn

module = torch.nn.Module()
module.params = torch.nn.Parameter(torch.randn(3))
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)
# Isolate the inner tensordict
params2 = params2._param_td

@torch.compile(fullgraph=True)
def func(z, params2):
    with params2.to_module(module):
        out = z + module.params
    return out

print(func(torch.zeros(()), params2))
  1. This doesn't work because we have a TensorDIctParams in our module (the error is somewhat similar to the one above)
from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn

module = torch.nn.Module()
module.params = TensorDictParams(
    # string="a string!",
    TensorDict(a=0.0)
)
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)

@torch.compile(fullgraph=True)
def func(z, params2):
    with params2.to_module(module):
        out = z + module.params["a"]
    return out

print(func(torch.zeros(()), params2))

The use case where we have a non-tensor defined in the tensordict (see the comment "string" key above) is also important because it might happen that we have a TensorDIctParams with non-tensors somewhere in the module.

It's crucial that _dyanmo works fine with this kinds of ops. I suspect that just handling setattr for MutableMapping might be enough but I'm not sure.

(commenting on this PR as it's part of the effort to rationalize TensorDictParams, see also pytorch/pytorch#141118 for a related issue)
cc @anijain2305

Copy link

github-actions bot commented Nov 21, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}29$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.5200μs 17.7718μs 56.2690 KOps/s 60.5652 KOps/s $\textbf{\color{#d91a1a}-7.09\%}$
test_plain_set_stack_nested 46.6470μs 18.0200μs 55.4939 KOps/s 60.3031 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_plain_set_nested_inplace 63.7390μs 19.6052μs 51.0068 KOps/s 53.2813 KOps/s $\color{#d91a1a}-4.27\%$
test_plain_set_stack_nested_inplace 62.8770μs 19.6325μs 50.9360 KOps/s 54.4696 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_items 41.9280μs 4.1561μs 240.6095 KOps/s 242.7487 KOps/s $\color{#d91a1a}-0.88\%$
test_items_nested 0.6360ms 0.3995ms 2.5033 KOps/s 2.5069 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_locked 0.5435ms 0.3958ms 2.5265 KOps/s 2.5228 KOps/s $\color{#35bf28}+0.15\%$
test_items_nested_leaf 0.1341ms 71.0420μs 14.0762 KOps/s 13.9968 KOps/s $\color{#35bf28}+0.57\%$
test_items_stack_nested 0.5159ms 0.4012ms 2.4923 KOps/s 2.4942 KOps/s $\color{#d91a1a}-0.08\%$
test_items_stack_nested_leaf 0.1708ms 73.6840μs 13.5715 KOps/s 13.3787 KOps/s $\color{#35bf28}+1.44\%$
test_items_stack_nested_locked 0.5504ms 0.3984ms 2.5097 KOps/s 2.5151 KOps/s $\color{#d91a1a}-0.21\%$
test_keys 42.9000μs 3.4675μs 288.3888 KOps/s 286.6262 KOps/s $\color{#35bf28}+0.61\%$
test_keys_nested 0.2310ms 0.1359ms 7.3611 KOps/s 7.3130 KOps/s $\color{#35bf28}+0.66\%$
test_keys_nested_locked 1.7814ms 0.1398ms 7.1518 KOps/s 6.9696 KOps/s $\color{#35bf28}+2.61\%$
test_keys_nested_leaf 0.1961ms 0.1158ms 8.6356 KOps/s 8.3802 KOps/s $\color{#35bf28}+3.05\%$
test_keys_stack_nested 0.2261ms 0.1356ms 7.3723 KOps/s 7.3073 KOps/s $\color{#35bf28}+0.89\%$
test_keys_stack_nested_leaf 0.2512ms 0.1166ms 8.5734 KOps/s 8.4847 KOps/s $\color{#35bf28}+1.04\%$
test_keys_stack_nested_locked 0.2379ms 0.1420ms 7.0443 KOps/s 7.0577 KOps/s $\color{#d91a1a}-0.19\%$
test_values 8.6160μs 1.0246μs 975.9560 KOps/s 950.0466 KOps/s $\color{#35bf28}+2.73\%$
test_values_nested 0.1044ms 55.0898μs 18.1522 KOps/s 17.6642 KOps/s $\color{#35bf28}+2.76\%$
test_values_nested_locked 0.1068ms 55.0887μs 18.1525 KOps/s 17.0400 KOps/s $\textbf{\color{#35bf28}+6.53\%}$
test_values_nested_leaf 0.1061ms 59.7719μs 16.7303 KOps/s 16.4579 KOps/s $\color{#35bf28}+1.65\%$
test_values_stack_nested 0.1060ms 57.0571μs 17.5263 KOps/s 17.7627 KOps/s $\color{#d91a1a}-1.33\%$
test_values_stack_nested_leaf 0.1459ms 60.5929μs 16.5036 KOps/s 16.1652 KOps/s $\color{#35bf28}+2.09\%$
test_values_stack_nested_locked 0.1108ms 56.2071μs 17.7913 KOps/s 17.7906 KOps/s $+0.00\%$
test_membership 38.7420μs 0.8889μs 1.1250 MOps/s 1.1613 MOps/s $\color{#d91a1a}-3.12\%$
test_membership_nested 31.0180μs 2.9269μs 341.6584 KOps/s 344.0245 KOps/s $\color{#d91a1a}-0.69\%$
test_membership_nested_leaf 44.0820μs 2.9372μs 340.4555 KOps/s 333.8411 KOps/s $\color{#35bf28}+1.98\%$
test_membership_stacked_nested 25.2570μs 2.8999μs 344.8426 KOps/s 350.2106 KOps/s $\color{#d91a1a}-1.53\%$
test_membership_stacked_nested_leaf 44.8440μs 2.8973μs 345.1486 KOps/s 349.0789 KOps/s $\color{#d91a1a}-1.13\%$
test_membership_nested_last 32.8680μs 4.1853μs 238.9312 KOps/s 240.1866 KOps/s $\color{#d91a1a}-0.52\%$
test_membership_nested_leaf_last 35.4160μs 4.2498μs 235.3028 KOps/s 241.0408 KOps/s $\color{#d91a1a}-2.38\%$
test_membership_stacked_nested_last 24.9770μs 4.2179μs 237.0838 KOps/s 210.1122 KOps/s $\textbf{\color{#35bf28}+12.84\%}$
test_membership_stacked_nested_leaf_last 22.6530μs 4.1856μs 238.9168 KOps/s 210.2320 KOps/s $\textbf{\color{#35bf28}+13.64\%}$
test_nested_getleaf 33.3320μs 10.6844μs 93.5948 KOps/s 93.9898 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_get 38.5720μs 10.0721μs 99.2844 KOps/s 98.6059 KOps/s $\color{#35bf28}+0.69\%$
test_stacked_getleaf 37.5900μs 10.6499μs 93.8973 KOps/s 93.6311 KOps/s $\color{#35bf28}+0.28\%$
test_stacked_get 36.0370μs 10.0965μs 99.0446 KOps/s 97.5684 KOps/s $\color{#35bf28}+1.51\%$
test_nested_getitemleaf 37.8110μs 11.0705μs 90.3305 KOps/s 90.9144 KOps/s $\color{#d91a1a}-0.64\%$
test_nested_getitem 44.5670μs 10.3433μs 96.6806 KOps/s 96.9396 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_getitemleaf 37.9910μs 11.1043μs 90.0556 KOps/s 90.8437 KOps/s $\color{#d91a1a}-0.87\%$
test_stacked_getitem 57.4570μs 10.4879μs 95.3479 KOps/s 96.6725 KOps/s $\color{#d91a1a}-1.37\%$
test_lock_nested 3.3389ms 0.4459ms 2.2425 KOps/s 2.2569 KOps/s $\color{#d91a1a}-0.64\%$
test_lock_stack_nested 0.7865ms 0.4129ms 2.4219 KOps/s 2.4157 KOps/s $\color{#35bf28}+0.26\%$
test_unlock_nested 0.6716ms 0.3581ms 2.7929 KOps/s 2.7550 KOps/s $\color{#35bf28}+1.38\%$
test_unlock_stack_nested 1.2056ms 0.3318ms 3.0139 KOps/s 3.0551 KOps/s $\color{#d91a1a}-1.35\%$
test_flatten_speed 0.5941ms 98.1445μs 10.1891 KOps/s 10.5127 KOps/s $\color{#d91a1a}-3.08\%$
test_unflatten_speed 0.6165ms 0.4900ms 2.0408 KOps/s 2.0309 KOps/s $\color{#35bf28}+0.49\%$
test_common_ops 4.7274ms 0.7899ms 1.2660 KOps/s 1.3880 KOps/s $\textbf{\color{#d91a1a}-8.79\%}$
test_creation 22.3610μs 2.1098μs 473.9829 KOps/s 491.2926 KOps/s $\color{#d91a1a}-3.52\%$
test_creation_empty 45.3150μs 10.9852μs 91.0312 KOps/s 114.3637 KOps/s $\textbf{\color{#d91a1a}-20.40\%}$
test_creation_nested_1 50.5950μs 13.6014μs 73.5221 KOps/s 88.0584 KOps/s $\textbf{\color{#d91a1a}-16.51\%}$
test_creation_nested_2 63.0880μs 17.7648μs 56.2912 KOps/s 63.2149 KOps/s $\textbf{\color{#d91a1a}-10.95\%}$
test_clone 0.2059ms 13.0927μs 76.3787 KOps/s 77.7749 KOps/s $\color{#d91a1a}-1.80\%$
test_getitem[int] 1.0644ms 12.8644μs 77.7342 KOps/s 80.4661 KOps/s $\color{#d91a1a}-3.40\%$
test_getitem[slice_int] 0.1545ms 25.6610μs 38.9696 KOps/s 41.1266 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_getitem[range] 0.1852ms 50.8388μs 19.6700 KOps/s 21.9900 KOps/s $\textbf{\color{#d91a1a}-10.55\%}$
test_getitem[tuple] 0.1588ms 20.5853μs 48.5783 KOps/s 50.4903 KOps/s $\color{#d91a1a}-3.79\%$
test_getitem[list] 0.3423ms 46.5229μs 21.4948 KOps/s 23.7212 KOps/s $\textbf{\color{#d91a1a}-9.39\%}$
test_setitem_dim[int] 54.6420μs 26.0998μs 38.3145 KOps/s 31.6433 KOps/s $\textbf{\color{#35bf28}+21.08\%}$
test_setitem_dim[slice_int] 99.4760μs 53.2522μs 18.7786 KOps/s 18.7787 KOps/s $-0.00\%$
test_setitem_dim[range] 0.1331ms 77.5365μs 12.8971 KOps/s 13.8328 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_setitem_dim[tuple] 0.1143ms 41.5637μs 24.0595 KOps/s 23.4739 KOps/s $\color{#35bf28}+2.49\%$
test_setitem 91.1300μs 19.9476μs 50.1313 KOps/s 52.5229 KOps/s $\color{#d91a1a}-4.55\%$
test_set 0.1429ms 19.6318μs 50.9378 KOps/s 55.0237 KOps/s $\textbf{\color{#d91a1a}-7.43\%}$
test_set_shared 3.6787ms 0.1701ms 5.8776 KOps/s 5.9883 KOps/s $\color{#d91a1a}-1.85\%$
test_update 0.1500ms 22.7302μs 43.9943 KOps/s 51.2418 KOps/s $\textbf{\color{#d91a1a}-14.14\%}$
test_update_nested 93.0140μs 32.1860μs 31.0694 KOps/s 31.8976 KOps/s $\color{#d91a1a}-2.60\%$
test_update__nested 0.6807ms 32.2360μs 31.0212 KOps/s 31.7999 KOps/s $\color{#d91a1a}-2.45\%$
test_set_nested 89.1460μs 21.9397μs 45.5795 KOps/s 49.4529 KOps/s $\textbf{\color{#d91a1a}-7.83\%}$
test_set_nested_new 0.4547ms 28.5111μs 35.0741 KOps/s 39.6675 KOps/s $\textbf{\color{#d91a1a}-11.58\%}$
test_select 0.1102ms 43.2832μs 23.1036 KOps/s 24.3586 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_select_nested 0.1469ms 59.2179μs 16.8868 KOps/s 16.9315 KOps/s $\color{#d91a1a}-0.26\%$
test_exclude_nested 0.1492ms 77.5872μs 12.8887 KOps/s 12.9206 KOps/s $\color{#d91a1a}-0.25\%$
test_empty[True] 0.6890ms 0.3797ms 2.6336 KOps/s 2.6364 KOps/s $\color{#d91a1a}-0.11\%$
test_empty[False] 11.7317μs 1.2588μs 794.3869 KOps/s 820.5689 KOps/s $\color{#d91a1a}-3.19\%$
test_unbind_speed 0.5439ms 0.2631ms 3.8008 KOps/s 3.8175 KOps/s $\color{#d91a1a}-0.44\%$
test_unbind_speed_stack0 0.7690ms 0.2586ms 3.8671 KOps/s 3.8759 KOps/s $\color{#d91a1a}-0.23\%$
test_unbind_speed_stack1 0.1038s 0.7745ms 1.2911 KOps/s 1.4237 KOps/s $\textbf{\color{#d91a1a}-9.31\%}$
test_split 0.1017s 1.7512ms 571.0257 Ops/s 568.7830 Ops/s $\color{#35bf28}+0.39\%$
test_chunk 0.1022s 1.7554ms 569.6702 Ops/s 565.4031 Ops/s $\color{#35bf28}+0.75\%$
test_consolidate_njt[False-None] 8.4264ms 8.2149ms 121.7296 Ops/s 120.0505 Ops/s $\color{#35bf28}+1.40\%$
test_creation[device0] 0.2185ms 92.6805μs 10.7898 KOps/s 10.7893 KOps/s $+0.00\%$
test_creation_from_tensor 3.9865ms 95.5234μs 10.4686 KOps/s 10.3484 KOps/s $\color{#35bf28}+1.16\%$
test_add_one[memmap_tensor0] 0.1574ms 4.8192μs 207.5039 KOps/s 203.0677 KOps/s $\color{#35bf28}+2.18\%$
test_contiguous[memmap_tensor0] 26.5700μs 0.5163μs 1.9367 MOps/s 1.9824 MOps/s $\color{#d91a1a}-2.31\%$
test_stack[memmap_tensor0] 29.5150μs 3.3374μs 299.6360 KOps/s 294.8212 KOps/s $\color{#35bf28}+1.63\%$
test_memmaptd_index 0.8364ms 0.2429ms 4.1170 KOps/s 4.2266 KOps/s $\color{#d91a1a}-2.59\%$
test_memmaptd_index_astensor 1.3755ms 0.3292ms 3.0375 KOps/s 3.1916 KOps/s $\color{#d91a1a}-4.83\%$
test_memmaptd_index_op 0.9583ms 0.5783ms 1.7293 KOps/s 1.8326 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_serialize_model 0.1253s 0.1135s 8.8144 Ops/s 7.3362 Ops/s $\textbf{\color{#35bf28}+20.15\%}$
test_serialize_model_pickle 0.4457s 0.3907s 2.5594 Ops/s 2.5295 Ops/s $\color{#35bf28}+1.18\%$
test_serialize_weights 0.2162s 0.1273s 7.8554 Ops/s 8.6245 Ops/s $\textbf{\color{#d91a1a}-8.92\%}$
test_serialize_weights_returnearly 0.1743s 0.1584s 6.3120 Ops/s 6.3093 Ops/s $\color{#35bf28}+0.04\%$
test_serialize_weights_pickle 0.6024s 0.4488s 2.2283 Ops/s 2.3864 Ops/s $\textbf{\color{#d91a1a}-6.62\%}$
test_serialize_weights_filesystem 0.1465s 0.1410s 7.0925 Ops/s 7.0290 Ops/s $\color{#35bf28}+0.90\%$
test_serialize_model_filesystem 0.2750s 0.1685s 5.9350 Ops/s 6.5525 Ops/s $\textbf{\color{#d91a1a}-9.42\%}$
test_reshape_pytree 67.3160μs 27.0471μs 36.9725 KOps/s 37.3610 KOps/s $\color{#d91a1a}-1.04\%$
test_reshape_td 79.2180μs 31.9825μs 31.2671 KOps/s 31.1081 KOps/s $\color{#35bf28}+0.51\%$
test_view_pytree 86.0240μs 27.0911μs 36.9125 KOps/s 37.7652 KOps/s $\color{#d91a1a}-2.26\%$
test_view_td 74.3190μs 38.5364μs 25.9495 KOps/s 26.5413 KOps/s $\color{#d91a1a}-2.23\%$
test_unbind_pytree 61.5650μs 29.8761μs 33.4716 KOps/s 33.7374 KOps/s $\color{#d91a1a}-0.79\%$
test_unbind_td 0.3220ms 38.8036μs 25.7708 KOps/s 26.6517 KOps/s $\color{#d91a1a}-3.31\%$
test_split_pytree 64.2500μs 29.7305μs 33.6355 KOps/s 33.5452 KOps/s $\color{#35bf28}+0.27\%$
test_split_td 0.4950ms 44.3371μs 22.5545 KOps/s 22.8858 KOps/s $\color{#d91a1a}-1.45\%$
test_add_pytree 82.7140μs 36.0168μs 27.7648 KOps/s 25.9122 KOps/s $\textbf{\color{#35bf28}+7.15\%}$
test_add_td 0.1222ms 56.9907μs 17.5467 KOps/s 19.1445 KOps/s $\textbf{\color{#d91a1a}-8.35\%}$
test_compile_add_one_nested[tensordict-compile] 0.1289ms 61.2723μs 16.3206 KOps/s 16.3860 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_add_one_nested[tensordict-eager] 1.3991ms 0.1603ms 6.2389 KOps/s 6.2866 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_add_one_nested[pytree-compile] 0.1057ms 45.6893μs 21.8869 KOps/s 21.8378 KOps/s $\color{#35bf28}+0.23\%$
test_compile_add_one_nested[pytree-eager] 0.2707ms 0.1178ms 8.4896 KOps/s 8.3059 KOps/s $\color{#35bf28}+2.21\%$
test_compile_copy_nested[tensordict-compile] 60.0920μs 25.8561μs 38.6757 KOps/s 38.9260 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_copy_nested[tensordict-eager] 0.1184ms 53.0457μs 18.8517 KOps/s 18.5055 KOps/s $\color{#35bf28}+1.87\%$
test_compile_copy_nested[pytree-compile] 0.1667ms 78.3625μs 12.7612 KOps/s 12.8349 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_copy_nested[pytree-eager] 0.1295ms 67.7222μs 14.7662 KOps/s 14.8267 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_add_one_flat[tensordict-compile] 0.1830ms 0.1056ms 9.4681 KOps/s 9.6577 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_add_one_flat[tensordict-eager] 0.4185ms 0.1965ms 5.0903 KOps/s 5.0473 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_one_flat[tensorclass-compile] 96.0000μs 44.6646μs 22.3891 KOps/s 22.7047 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_add_one_flat[tensorclass-eager] 0.5132ms 61.0907μs 16.3691 KOps/s 16.4554 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_add_one_flat[pytree-compile] 0.2339ms 0.1038ms 9.6333 KOps/s 9.9685 KOps/s $\color{#d91a1a}-3.36\%$
test_compile_add_one_flat[pytree-eager] 0.3696ms 0.2018ms 4.9557 KOps/s 4.9278 KOps/s $\color{#35bf28}+0.56\%$
test_compile_add_self_flat[tensordict-eager] 0.3899ms 0.2081ms 4.8045 KOps/s 4.7944 KOps/s $\color{#35bf28}+0.21\%$
test_compile_add_self_flat[tensordict-compile] 0.1838ms 0.1071ms 9.3328 KOps/s 9.5650 KOps/s $\color{#d91a1a}-2.43\%$
test_compile_add_self_flat[tensorclass-eager] 0.2162ms 54.1135μs 18.4797 KOps/s 18.7843 KOps/s $\color{#d91a1a}-1.62\%$
test_compile_add_self_flat[tensorclass-compile] 0.1035ms 47.4682μs 21.0667 KOps/s 22.3377 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_compile_add_self_flat[pytree-eager] 0.6323ms 0.1600ms 6.2513 KOps/s 6.2507 KOps/s $\color{#35bf28}+0.01\%$
test_compile_add_self_flat[pytree-compile] 0.1956ms 0.1043ms 9.5875 KOps/s 9.8230 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_copy_flat[tensordict-compile] 72.4050μs 20.7560μs 48.1789 KOps/s 48.2453 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_copy_flat[tensordict-eager] 0.1568ms 60.5011μs 16.5286 KOps/s 16.8786 KOps/s $\color{#d91a1a}-2.07\%$
test_compile_copy_flat[pytree-compile] 0.1570ms 81.9717μs 12.1993 KOps/s 12.5200 KOps/s $\color{#d91a1a}-2.56\%$
test_compile_copy_flat[pytree-eager] 0.1333ms 70.1679μs 14.2515 KOps/s 14.3859 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_assign_and_add[tensordict-compile] 0.3080ms 0.2088ms 4.7900 KOps/s 4.9392 KOps/s $\color{#d91a1a}-3.02\%$
test_compile_assign_and_add[tensordict-eager] 2.4650ms 1.2749ms 784.3582 Ops/s 786.7951 Ops/s $\color{#d91a1a}-0.31\%$
test_compile_assign_and_add[pytree-compile] 0.2976ms 0.2029ms 4.9274 KOps/s 5.0518 KOps/s $\color{#d91a1a}-2.46\%$
test_compile_assign_and_add[pytree-eager] 0.9752ms 0.7726ms 1.2944 KOps/s 1.2946 KOps/s $\color{#d91a1a}-0.02\%$
test_compile_assign_and_add_stack[compile] 0.8092ms 0.4602ms 2.1728 KOps/s 2.2720 KOps/s $\color{#d91a1a}-4.37\%$
test_compile_assign_and_add_stack[eager] 3.7653ms 2.6694ms 374.6115 Ops/s 403.5179 Ops/s $\textbf{\color{#d91a1a}-7.16\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1071ms 36.6510μs 27.2844 KOps/s 29.2727 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5195ms 33.6921μs 29.6805 KOps/s 30.6632 KOps/s $\color{#d91a1a}-3.20\%$
test_compile_indexing[tensor-tensorclass-compile] 80.0300μs 28.9048μs 34.5964 KOps/s 35.2370 KOps/s $\color{#d91a1a}-1.82\%$
test_compile_indexing[tensor-tensorclass-eager] 75.3800μs 23.3987μs 42.7375 KOps/s 43.1994 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[tensor-pytree-compile] 79.9490μs 29.5253μs 33.8692 KOps/s 34.0809 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_indexing[tensor-pytree-eager] 70.2710μs 23.2727μs 42.9689 KOps/s 42.7391 KOps/s $\color{#35bf28}+0.54\%$
test_compile_indexing[slice-tensordict-compile] 0.1252ms 53.0416μs 18.8531 KOps/s 19.4881 KOps/s $\color{#d91a1a}-3.26\%$
test_compile_indexing[slice-tensordict-eager] 0.5973ms 20.6311μs 48.4704 KOps/s 49.7367 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_indexing[slice-tensorclass-compile] 0.2688ms 44.6806μs 22.3811 KOps/s 22.6535 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_indexing[slice-tensorclass-eager] 51.9280μs 19.1465μs 52.2290 KOps/s 52.9354 KOps/s $\color{#d91a1a}-1.33\%$
test_compile_indexing[slice-pytree-compile] 0.1077ms 45.1204μs 22.1629 KOps/s 22.2743 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_indexing[slice-pytree-eager] 0.3895ms 19.7492μs 50.6349 KOps/s 52.9362 KOps/s $\color{#d91a1a}-4.35\%$
test_compile_indexing[int-tensordict-compile] 0.1102ms 53.4855μs 18.6967 KOps/s 19.3953 KOps/s $\color{#d91a1a}-3.60\%$
test_compile_indexing[int-tensordict-eager] 0.9422ms 20.6925μs 48.3267 KOps/s 51.3132 KOps/s $\textbf{\color{#d91a1a}-5.82\%}$
test_compile_indexing[int-tensorclass-compile] 0.2836ms 45.2140μs 22.1170 KOps/s 22.3694 KOps/s $\color{#d91a1a}-1.13\%$
test_compile_indexing[int-tensorclass-eager] 58.5100μs 19.2102μs 52.0556 KOps/s 53.0178 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_indexing[int-pytree-compile] 0.1017ms 44.8239μs 22.3095 KOps/s 22.0816 KOps/s $\color{#35bf28}+1.03\%$
test_compile_indexing[int-pytree-eager] 58.3090μs 18.9932μs 52.6503 KOps/s 52.6361 KOps/s $\color{#35bf28}+0.03\%$
test_mod_add[eager] 87.5640μs 34.0142μs 29.3995 KOps/s 29.3914 KOps/s $\color{#35bf28}+0.03\%$
test_mod_add[compile] 0.1025ms 49.2650μs 20.2984 KOps/s 20.7642 KOps/s $\color{#d91a1a}-2.24\%$
test_mod_add[compile-overhead] 0.1397ms 47.7289μs 20.9517 KOps/s 21.0539 KOps/s $\color{#d91a1a}-0.49\%$
test_mod_wrap[eager] 0.3920ms 0.2270ms 4.4047 KOps/s 4.4791 KOps/s $\color{#d91a1a}-1.66\%$
test_mod_wrap[compile] 0.3031ms 0.2067ms 4.8374 KOps/s 4.8289 KOps/s $\color{#35bf28}+0.18\%$
test_mod_wrap[compile-overhead] 0.3624ms 0.2047ms 4.8843 KOps/s 4.9404 KOps/s $\color{#d91a1a}-1.14\%$
test_mod_wrap_and_backward[eager] 12.2917ms 11.1636ms 89.5771 Ops/s 92.5519 Ops/s $\color{#d91a1a}-3.21\%$
test_mod_wrap_and_backward[compile] 12.2613ms 11.1039ms 90.0588 Ops/s 79.8660 Ops/s $\textbf{\color{#35bf28}+12.76\%}$
test_mod_wrap_and_backward[compile-overhead] 12.0727ms 11.1135ms 89.9807 Ops/s 79.9735 Ops/s $\textbf{\color{#35bf28}+12.51\%}$
test_seq_add[eager] 0.2324ms 0.1128ms 8.8635 KOps/s 8.9034 KOps/s $\color{#d91a1a}-0.45\%$
test_seq_add[compile] 0.1287ms 62.7777μs 15.9292 KOps/s 16.3092 KOps/s $\color{#d91a1a}-2.33\%$
test_seq_add[compile-overhead] 0.1292ms 59.6980μs 16.7510 KOps/s 16.5432 KOps/s $\color{#35bf28}+1.26\%$
test_seq_wrap[eager] 0.7195ms 0.4463ms 2.2408 KOps/s 2.3191 KOps/s $\color{#d91a1a}-3.37\%$
test_seq_wrap[compile] 0.3464ms 0.2259ms 4.4269 KOps/s 4.4813 KOps/s $\color{#d91a1a}-1.21\%$
test_seq_wrap[compile-overhead] 0.3454ms 0.2271ms 4.4040 KOps/s 4.5036 KOps/s $\color{#d91a1a}-2.21\%$
test_func_call_runtime[False-eager] 0.8425ms 0.5472ms 1.8276 KOps/s 1.8378 KOps/s $\color{#d91a1a}-0.56\%$
test_func_call_runtime[False-compile] 0.7622ms 0.4278ms 2.3375 KOps/s 2.3734 KOps/s $\color{#d91a1a}-1.51\%$
test_func_call_runtime[False-compile-overhead] 0.6721ms 0.4323ms 2.3131 KOps/s 2.3171 KOps/s $\color{#d91a1a}-0.17\%$
test_func_call_runtime[True-eager] 1.0132ms 0.7591ms 1.3173 KOps/s 1.3440 KOps/s $\color{#d91a1a}-1.98\%$
test_func_call_runtime[True-compile] 0.6221ms 0.4699ms 2.1283 KOps/s 2.1828 KOps/s $\color{#d91a1a}-2.50\%$
test_func_call_runtime[True-compile-overhead] 0.6116ms 0.4707ms 2.1243 KOps/s 2.1634 KOps/s $\color{#d91a1a}-1.81\%$
test_func_call_cm_runtime[False-eager] 0.8440ms 0.5430ms 1.8416 KOps/s 1.8226 KOps/s $\color{#35bf28}+1.04\%$
test_func_call_cm_runtime[False-compile] 0.8076ms 0.4314ms 2.3181 KOps/s 2.3825 KOps/s $\color{#d91a1a}-2.70\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5622ms 0.4283ms 2.3346 KOps/s 2.3667 KOps/s $\color{#d91a1a}-1.36\%$
test_func_call_cm_runtime[True-eager] 1.0355ms 0.9065ms 1.1031 KOps/s 1.1270 KOps/s $\color{#d91a1a}-2.12\%$
test_func_call_cm_runtime[True-compile] 0.6967ms 0.4973ms 2.0110 KOps/s 2.0740 KOps/s $\color{#d91a1a}-3.04\%$
test_func_call_cm_runtime[True-compile-overhead] 0.7547ms 0.4971ms 2.0118 KOps/s 2.0594 KOps/s $\color{#d91a1a}-2.31\%$
test_vmap_func_call_cm_runtime[eager] 2.5078ms 1.8854ms 530.4018 Ops/s 528.0010 Ops/s $\color{#35bf28}+0.45\%$
test_vmap_func_call_cm_runtime[compile] 0.8691ms 0.5250ms 1.9047 KOps/s 1.9540 KOps/s $\color{#d91a1a}-2.52\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.0097ms 0.5243ms 1.9072 KOps/s 1.9283 KOps/s $\color{#d91a1a}-1.09\%$
test_distributed 0.3248ms 0.1265ms 7.9064 KOps/s 7.8791 KOps/s $\color{#35bf28}+0.35\%$
test_tdmodule 58.1790μs 26.5360μs 37.6846 KOps/s 40.4123 KOps/s $\textbf{\color{#d91a1a}-6.75\%}$
test_tdmodule_dispatch 80.9710μs 48.1567μs 20.7655 KOps/s 21.8165 KOps/s $\color{#d91a1a}-4.82\%$
test_tdseq 48.2000μs 26.1955μs 38.1745 KOps/s 39.6970 KOps/s $\color{#d91a1a}-3.84\%$
test_tdseq_dispatch 99.8760μs 50.6693μs 19.7358 KOps/s 20.4946 KOps/s $\color{#d91a1a}-3.70\%$
test_instantiation_functorch 2.2863ms 1.5160ms 659.6262 Ops/s 643.7796 Ops/s $\color{#35bf28}+2.46\%$
test_exec_functorch 0.3226ms 0.1773ms 5.6396 KOps/s 5.4686 KOps/s $\color{#35bf28}+3.13\%$
test_exec_functional_call 0.3135ms 0.1731ms 5.7776 KOps/s 5.7591 KOps/s $\color{#35bf28}+0.32\%$
test_exec_td_decorator 0.4913ms 0.2317ms 4.3159 KOps/s 4.2847 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed_decorator[True-True] 0.9153ms 0.6564ms 1.5234 KOps/s 1.5487 KOps/s $\color{#d91a1a}-1.63\%$
test_vmap_mlp_speed_decorator[True-False] 1.3172ms 0.6787ms 1.4733 KOps/s 1.5512 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7977ms 0.5223ms 1.9147 KOps/s 1.9070 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed_decorator[False-False] 0.8181ms 0.5235ms 1.9103 KOps/s 1.9030 KOps/s $\color{#35bf28}+0.38\%$
test_to_module_speed[True] 2.0904ms 1.2927ms 773.5484 Ops/s 775.7532 Ops/s $\color{#d91a1a}-0.28\%$
test_to_module_speed[False] 2.0047ms 1.2623ms 792.2227 Ops/s 785.2880 Ops/s $\color{#35bf28}+0.88\%$
test_tc_init 94.3460μs 45.7334μs 21.8658 KOps/s 23.0979 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_tc_init_nested 0.1680ms 90.6991μs 11.0255 KOps/s 11.5223 KOps/s $\color{#d91a1a}-4.31\%$
test_tc_first_layer_tensor 28.4330μs 1.5177μs 658.8964 KOps/s 639.6708 KOps/s $\color{#35bf28}+3.01\%$
test_tc_first_layer_nontensor 25.3270μs 4.7540μs 210.3483 KOps/s 206.7197 KOps/s $\color{#35bf28}+1.76\%$
test_tc_second_layer_tensor 32.7810μs 2.8300μs 353.3576 KOps/s 347.7769 KOps/s $\color{#35bf28}+1.60\%$
test_tc_second_layer_nontensor 40.3580μs 6.0546μs 165.1629 KOps/s 161.2841 KOps/s $\color{#35bf28}+2.40\%$
test_unbind 0.2257s 12.5676ms 79.5700 Ops/s 81.1200 Ops/s $\color{#d91a1a}-1.91\%$
test_full_like 17.2218ms 11.8555ms 84.3492 Ops/s 84.0951 Ops/s $\color{#35bf28}+0.30\%$
test_zeros_like 10.7106ms 7.2331ms 138.2539 Ops/s 138.3671 Ops/s $\color{#d91a1a}-0.08\%$
test_ones_like 16.1535ms 7.8974ms 126.6234 Ops/s 121.0681 Ops/s $\color{#35bf28}+4.59\%$
test_clone 12.8547ms 9.6884ms 103.2163 Ops/s 100.0379 Ops/s $\color{#35bf28}+3.18\%$
test_squeeze 59.5320μs 11.9572μs 83.6317 KOps/s 83.4679 KOps/s $\color{#35bf28}+0.20\%$
test_unsqueeze 0.1895ms 90.7052μs 11.0247 KOps/s 11.1871 KOps/s $\color{#d91a1a}-1.45\%$
test_split 0.5069ms 0.1943ms 5.1461 KOps/s 5.1697 KOps/s $\color{#d91a1a}-0.45\%$
test_permute 0.3219ms 0.2203ms 4.5384 KOps/s 4.5361 KOps/s $\color{#35bf28}+0.05\%$
test_stack 31.8853ms 24.9190ms 40.1300 Ops/s 39.6509 Ops/s $\color{#35bf28}+1.21\%$
test_cat 29.6519ms 24.5538ms 40.7269 Ops/s 40.4676 Ops/s $\color{#35bf28}+0.64\%$

Copy link

github-actions bot commented Nov 21, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}31$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.3110μs 10.5243μs 95.0185 KOps/s 98.4916 KOps/s $\color{#d91a1a}-3.53\%$
test_plain_set_stack_nested 44.3800μs 10.5713μs 94.5960 KOps/s 97.7609 KOps/s $\color{#d91a1a}-3.24\%$
test_plain_set_nested_inplace 37.0310μs 11.3783μs 87.8867 KOps/s 90.6325 KOps/s $\color{#d91a1a}-3.03\%$
test_plain_set_stack_nested_inplace 47.6010μs 11.4420μs 87.3976 KOps/s 90.2947 KOps/s $\color{#d91a1a}-3.21\%$
test_items 30.5810μs 2.8859μs 346.5121 KOps/s 339.9630 KOps/s $\color{#35bf28}+1.93\%$
test_items_nested 0.4138ms 0.3491ms 2.8648 KOps/s 2.8037 KOps/s $\color{#35bf28}+2.18\%$
test_items_nested_locked 0.4166ms 0.3507ms 2.8512 KOps/s 2.7920 KOps/s $\color{#35bf28}+2.12\%$
test_items_nested_leaf 0.1405ms 58.4545μs 17.1073 KOps/s 17.2500 KOps/s $\color{#d91a1a}-0.83\%$
test_items_stack_nested 0.3921ms 0.3512ms 2.8470 KOps/s 2.7965 KOps/s $\color{#35bf28}+1.81\%$
test_items_stack_nested_leaf 84.0620μs 59.6644μs 16.7604 KOps/s 17.3847 KOps/s $\color{#d91a1a}-3.59\%$
test_items_stack_nested_locked 0.4132ms 0.3523ms 2.8387 KOps/s 2.7751 KOps/s $\color{#35bf28}+2.29\%$
test_keys 27.5500μs 3.4408μs 290.6310 KOps/s 286.2105 KOps/s $\color{#35bf28}+1.54\%$
test_keys_nested 0.2584ms 70.4304μs 14.1984 KOps/s 14.2138 KOps/s $\color{#d91a1a}-0.11\%$
test_keys_nested_locked 0.7081ms 75.5220μs 13.2412 KOps/s 13.1525 KOps/s $\color{#35bf28}+0.67\%$
test_keys_nested_leaf 0.2449ms 61.1582μs 16.3510 KOps/s 16.2390 KOps/s $\color{#35bf28}+0.69\%$
test_keys_stack_nested 0.2640ms 71.1068μs 14.0634 KOps/s 14.1932 KOps/s $\color{#d91a1a}-0.91\%$
test_keys_stack_nested_leaf 91.9820μs 62.0235μs 16.1229 KOps/s 16.3589 KOps/s $\color{#d91a1a}-1.44\%$
test_keys_stack_nested_locked 0.1252ms 76.4041μs 13.0883 KOps/s 13.2127 KOps/s $\color{#d91a1a}-0.94\%$
test_values 6.1168μs 0.8437μs 1.1852 MOps/s 1.1844 MOps/s $\color{#35bf28}+0.07\%$
test_values_nested 0.1897ms 31.1968μs 32.0546 KOps/s 32.2884 KOps/s $\color{#d91a1a}-0.72\%$
test_values_nested_locked 61.9710μs 32.8663μs 30.4263 KOps/s 30.8446 KOps/s $\color{#d91a1a}-1.36\%$
test_values_nested_leaf 89.8520μs 33.5065μs 29.8449 KOps/s 30.0127 KOps/s $\color{#d91a1a}-0.56\%$
test_values_stack_nested 68.3510μs 31.8361μs 31.4109 KOps/s 31.9896 KOps/s $\color{#d91a1a}-1.81\%$
test_values_stack_nested_leaf 99.0510μs 34.0912μs 29.3331 KOps/s 29.8416 KOps/s $\color{#d91a1a}-1.70\%$
test_values_stack_nested_locked 95.1620μs 33.2666μs 30.0602 KOps/s 30.6668 KOps/s $\color{#d91a1a}-1.98\%$
test_membership 2.4716μs 0.5110μs 1.9569 MOps/s 1.9839 MOps/s $\color{#d91a1a}-1.36\%$
test_membership_nested 15.6355μs 2.0301μs 492.5803 KOps/s 506.5291 KOps/s $\color{#d91a1a}-2.75\%$
test_membership_nested_leaf 21.7005μs 2.0220μs 494.5525 KOps/s 490.5059 KOps/s $\color{#35bf28}+0.82\%$
test_membership_stacked_nested 28.3210μs 2.1431μs 466.6125 KOps/s 480.4990 KOps/s $\color{#d91a1a}-2.89\%$
test_membership_stacked_nested_leaf 57.6110μs 2.1461μs 465.9671 KOps/s 481.1816 KOps/s $\color{#d91a1a}-3.16\%$
test_membership_nested_last 64.7810μs 2.9797μs 335.6071 KOps/s 341.9729 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_nested_leaf_last 38.4510μs 2.9974μs 333.6213 KOps/s 342.7576 KOps/s $\color{#d91a1a}-2.67\%$
test_membership_stacked_nested_last 45.4510μs 4.4617μs 224.1307 KOps/s 345.0212 KOps/s $\textbf{\color{#d91a1a}-35.04\%}$
test_membership_stacked_nested_leaf_last 0.1456ms 4.4559μs 224.4204 KOps/s 343.8837 KOps/s $\textbf{\color{#d91a1a}-34.74\%}$
test_nested_getleaf 34.0100μs 6.1823μs 161.7526 KOps/s 162.7630 KOps/s $\color{#d91a1a}-0.62\%$
test_nested_get 47.0110μs 5.8702μs 170.3532 KOps/s 171.7547 KOps/s $\color{#d91a1a}-0.82\%$
test_stacked_getleaf 39.7410μs 6.1641μs 162.2293 KOps/s 163.0706 KOps/s $\color{#d91a1a}-0.52\%$
test_stacked_get 41.0810μs 5.8604μs 170.6361 KOps/s 170.2058 KOps/s $\color{#35bf28}+0.25\%$
test_nested_getitemleaf 35.9500μs 6.2281μs 160.5619 KOps/s 157.8922 KOps/s $\color{#35bf28}+1.69\%$
test_nested_getitem 48.1010μs 5.9562μs 167.8922 KOps/s 168.0920 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_getitemleaf 31.4400μs 6.2707μs 159.4714 KOps/s 159.5180 KOps/s $\color{#d91a1a}-0.03\%$
test_stacked_getitem 38.4210μs 5.9273μs 168.7112 KOps/s 168.1465 KOps/s $\color{#35bf28}+0.34\%$
test_lock_nested 9.2430ms 0.3729ms 2.6818 KOps/s 2.7208 KOps/s $\color{#d91a1a}-1.43\%$
test_lock_stack_nested 0.3721ms 0.3320ms 3.0118 KOps/s 3.0242 KOps/s $\color{#d91a1a}-0.41\%$
test_unlock_nested 0.6521ms 0.3066ms 3.2617 KOps/s 3.3248 KOps/s $\color{#d91a1a}-1.90\%$
test_unlock_stack_nested 0.3606ms 0.2725ms 3.6698 KOps/s 3.7169 KOps/s $\color{#d91a1a}-1.27\%$
test_flatten_speed 0.1095ms 74.1411μs 13.4878 KOps/s 13.5119 KOps/s $\color{#d91a1a}-0.18\%$
test_unflatten_speed 0.3386ms 0.3014ms 3.3178 KOps/s 3.2740 KOps/s $\color{#35bf28}+1.34\%$
test_common_ops 1.6665ms 0.5966ms 1.6761 KOps/s 1.8007 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_creation 0.1147ms 1.4567μs 686.4974 KOps/s 688.2564 KOps/s $\color{#d91a1a}-0.26\%$
test_creation_empty 0.1570ms 7.1304μs 140.2444 KOps/s 154.4362 KOps/s $\textbf{\color{#d91a1a}-9.19\%}$
test_creation_nested_1 0.1716ms 8.6149μs 116.0776 KOps/s 125.1023 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_creation_nested_2 0.1791ms 11.1082μs 90.0235 KOps/s 95.6558 KOps/s $\textbf{\color{#d91a1a}-5.89\%}$
test_clone 0.1107ms 11.0919μs 90.1560 KOps/s 98.9329 KOps/s $\textbf{\color{#d91a1a}-8.87\%}$
test_getitem[int] 1.5114ms 10.7005μs 93.4531 KOps/s 94.9045 KOps/s $\color{#d91a1a}-1.53\%$
test_getitem[slice_int] 0.1118ms 20.7758μs 48.1330 KOps/s 48.6436 KOps/s $\color{#d91a1a}-1.05\%$
test_getitem[range] 0.1396ms 39.9930μs 25.0043 KOps/s 27.2042 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_getitem[tuple] 0.1097ms 18.0139μs 55.5128 KOps/s 56.5907 KOps/s $\color{#d91a1a}-1.90\%$
test_getitem[list] 0.2657ms 34.4405μs 29.0356 KOps/s 31.6191 KOps/s $\textbf{\color{#d91a1a}-8.17\%}$
test_setitem_dim[int] 37.6410μs 18.2941μs 54.6623 KOps/s 57.9996 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_setitem_dim[slice_int] 60.9910μs 38.5559μs 25.9364 KOps/s 27.1606 KOps/s $\color{#d91a1a}-4.51\%$
test_setitem_dim[range] 81.3410μs 53.8685μs 18.5637 KOps/s 19.6446 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_setitem_dim[tuple] 53.9710μs 31.5050μs 31.7410 KOps/s 32.7069 KOps/s $\color{#d91a1a}-2.95\%$
test_setitem 81.6710μs 15.0081μs 66.6308 KOps/s 72.7976 KOps/s $\textbf{\color{#d91a1a}-8.47\%}$
test_set 87.3520μs 14.6000μs 68.4930 KOps/s 75.1873 KOps/s $\textbf{\color{#d91a1a}-8.90\%}$
test_set_shared 1.6383ms 0.1476ms 6.7769 KOps/s 6.7838 KOps/s $\color{#d91a1a}-0.10\%$
test_update 0.5241ms 17.2870μs 57.8471 KOps/s 64.6149 KOps/s $\textbf{\color{#d91a1a}-10.47\%}$
test_update_nested 84.5310μs 22.0144μs 45.4249 KOps/s 50.0059 KOps/s $\textbf{\color{#d91a1a}-9.16\%}$
test_update__nested 0.7816ms 24.4504μs 40.8991 KOps/s 41.4518 KOps/s $\color{#d91a1a}-1.33\%$
test_set_nested 0.1561ms 15.7515μs 63.4862 KOps/s 68.6103 KOps/s $\textbf{\color{#d91a1a}-7.47\%}$
test_set_nested_new 0.1250ms 17.9211μs 55.8001 KOps/s 61.0164 KOps/s $\textbf{\color{#d91a1a}-8.55\%}$
test_select 0.1010ms 29.9617μs 33.3760 KOps/s 35.4283 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_select_nested 0.1248ms 41.6982μs 23.9819 KOps/s 23.6483 KOps/s $\color{#35bf28}+1.41\%$
test_exclude_nested 0.1266ms 60.9422μs 16.4090 KOps/s 16.2085 KOps/s $\color{#35bf28}+1.24\%$
test_empty[True] 0.3038ms 0.2724ms 3.6707 KOps/s 3.6076 KOps/s $\color{#35bf28}+1.75\%$
test_empty[False] 4.2861μs 0.7419μs 1.3480 MOps/s 1.3433 MOps/s $\color{#35bf28}+0.34\%$
test_to 87.9410μs 55.1710μs 18.1255 KOps/s 17.6355 KOps/s $\color{#35bf28}+2.78\%$
test_to_nonblocking 0.1952ms 45.0849μs 22.1804 KOps/s 22.4339 KOps/s $\color{#d91a1a}-1.13\%$
test_unbind_speed 0.3278ms 0.2315ms 4.3205 KOps/s 4.4947 KOps/s $\color{#d91a1a}-3.88\%$
test_unbind_speed_stack0 0.3872ms 0.2303ms 4.3422 KOps/s 4.4781 KOps/s $\color{#d91a1a}-3.03\%$
test_unbind_speed_stack1 0.7247ms 0.5815ms 1.7196 KOps/s 1.5613 KOps/s $\textbf{\color{#35bf28}+10.14\%}$
test_split 98.5020ms 1.5924ms 628.0020 Ops/s 646.3993 Ops/s $\color{#d91a1a}-2.85\%$
test_chunk 99.4652ms 1.5908ms 628.6222 Ops/s 590.6075 Ops/s $\textbf{\color{#35bf28}+6.44\%}$
test_consolidate[False-None] 99.8331ms 2.7959ms 357.6625 Ops/s 391.6761 Ops/s $\textbf{\color{#d91a1a}-8.68\%}$
test_consolidate[default-None] 1.8360ms 1.6586ms 602.9195 Ops/s 604.4135 Ops/s $\color{#d91a1a}-0.25\%$
test_consolidate[reduce-overhead-None] 1.8547ms 1.7033ms 587.1013 Ops/s 598.9677 Ops/s $\color{#d91a1a}-1.98\%$
test_consolidate_njt[False-None] 6.7723ms 6.3690ms 157.0096 Ops/s 158.5023 Ops/s $\color{#d91a1a}-0.94\%$
test_to[False-False-None] 1.8039ms 1.6143ms 619.4444 Ops/s 609.2602 Ops/s $\color{#35bf28}+1.67\%$
test_to[True-False-None] 1.5079ms 1.2362ms 808.9507 Ops/s 805.7162 Ops/s $\color{#35bf28}+0.40\%$
test_to[within-False-None] 4.0889ms 3.8924ms 256.9084 Ops/s 257.0963 Ops/s $\color{#d91a1a}-0.07\%$
test_to[True-default-None] 5.3475ms 5.0026ms 199.8941 Ops/s 198.2639 Ops/s $\color{#35bf28}+0.82\%$
test_to_njt[False-False-None] 7.1007ms 6.7874ms 147.3322 Ops/s 147.0606 Ops/s $\color{#35bf28}+0.18\%$
test_to_njt[True-False-None] 5.5353ms 5.2474ms 190.5695 Ops/s 190.1452 Ops/s $\color{#35bf28}+0.22\%$
test_to_njt[within-False-None] 12.3544ms 11.7431ms 85.1564 Ops/s 84.9260 Ops/s $\color{#35bf28}+0.27\%$
test_creation[device0] 0.5407ms 78.3762μs 12.7590 KOps/s 12.3100 KOps/s $\color{#35bf28}+3.65\%$
test_creation_from_tensor 0.7059ms 82.7142μs 12.0898 KOps/s 11.7933 KOps/s $\color{#35bf28}+2.51\%$
test_add_one[memmap_tensor0] 0.3984ms 6.9513μs 143.8589 KOps/s 153.1211 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_contiguous[memmap_tensor0] 1.8065μs 0.4240μs 2.3584 MOps/s 2.5264 MOps/s $\textbf{\color{#d91a1a}-6.65\%}$
test_stack[memmap_tensor0] 0.1445ms 4.4882μs 222.8074 KOps/s 224.9039 KOps/s $\color{#d91a1a}-0.93\%$
test_memmaptd_index 1.5755ms 0.2497ms 4.0041 KOps/s 4.0255 KOps/s $\color{#d91a1a}-0.53\%$
test_memmaptd_index_astensor 0.8806ms 0.3065ms 3.2622 KOps/s 3.2552 KOps/s $\color{#35bf28}+0.21\%$
test_memmaptd_index_op 1.0273ms 0.5706ms 1.7525 KOps/s 1.8174 KOps/s $\color{#d91a1a}-3.57\%$
test_serialize_model 0.1331s 0.1306s 7.6543 Ops/s 7.6174 Ops/s $\color{#35bf28}+0.48\%$
test_serialize_model_pickle 1.3477s 1.2122s 0.8249 Ops/s 0.8430 Ops/s $\color{#d91a1a}-2.14\%$
test_serialize_weights 0.4308s 0.1729s 5.7831 Ops/s 7.6940 Ops/s $\textbf{\color{#d91a1a}-24.84\%}$
test_serialize_weights_returnearly 0.3459s 53.8704ms 18.5631 Ops/s 13.8139 Ops/s $\textbf{\color{#35bf28}+34.38\%}$
test_serialize_weights_pickle 1.3770s 1.2169s 0.8218 Ops/s 0.8032 Ops/s $\color{#35bf28}+2.32\%$
test_reshape_pytree 0.1326ms 21.7302μs 46.0189 KOps/s 44.7245 KOps/s $\color{#35bf28}+2.89\%$
test_reshape_td 0.1635ms 26.2905μs 38.0365 KOps/s 34.5580 KOps/s $\textbf{\color{#35bf28}+10.07\%}$
test_view_pytree 0.1632ms 21.6373μs 46.2164 KOps/s 45.7474 KOps/s $\color{#35bf28}+1.03\%$
test_view_td 0.1033ms 29.1857μs 34.2634 KOps/s 31.7693 KOps/s $\textbf{\color{#35bf28}+7.85\%}$
test_unbind_pytree 0.1786ms 28.0977μs 35.5901 KOps/s 34.6475 KOps/s $\color{#35bf28}+2.72\%$
test_unbind_td 0.7151ms 35.1098μs 28.4821 KOps/s 28.7120 KOps/s $\color{#d91a1a}-0.80\%$
test_split_pytree 0.1755ms 28.9460μs 34.5471 KOps/s 33.2123 KOps/s $\color{#35bf28}+4.02\%$
test_split_td 0.8877ms 38.0847μs 26.2573 KOps/s 25.4665 KOps/s $\color{#35bf28}+3.10\%$
test_add_pytree 0.1538ms 34.8164μs 28.7221 KOps/s 30.0325 KOps/s $\color{#d91a1a}-4.36\%$
test_add_td 79.6110μs 46.8475μs 21.3459 KOps/s 23.5866 KOps/s $\textbf{\color{#d91a1a}-9.50\%}$
test_compile_add_one_nested[tensordict-compile] 0.2685ms 0.1203ms 8.3104 KOps/s 8.0704 KOps/s $\color{#35bf28}+2.97\%$
test_compile_add_one_nested[tensordict-eager] 0.3056ms 0.1223ms 8.1799 KOps/s 7.8523 KOps/s $\color{#35bf28}+4.17\%$
test_compile_add_one_nested[pytree-compile] 0.2432ms 94.4999μs 10.5820 KOps/s 10.2227 KOps/s $\color{#35bf28}+3.51\%$
test_compile_add_one_nested[pytree-eager] 1.4348ms 0.1452ms 6.8864 KOps/s 6.8343 KOps/s $\color{#35bf28}+0.76\%$
test_compile_copy_nested[tensordict-compile] 0.2020ms 21.4206μs 46.6840 KOps/s 47.6607 KOps/s $\color{#d91a1a}-2.05\%$
test_compile_copy_nested[tensordict-eager] 0.2245ms 26.6089μs 37.5814 KOps/s 37.4659 KOps/s $\color{#35bf28}+0.31\%$
test_compile_copy_nested[pytree-compile] 0.2770ms 64.2078μs 15.5744 KOps/s 15.4787 KOps/s $\color{#35bf28}+0.62\%$
test_compile_copy_nested[pytree-eager] 0.1710ms 49.0034μs 20.4067 KOps/s 20.0612 KOps/s $\color{#35bf28}+1.72\%$
test_compile_add_one_flat[tensordict-compile] 0.3168ms 0.1413ms 7.0759 KOps/s 6.9710 KOps/s $\color{#35bf28}+1.50\%$
test_compile_add_one_flat[tensordict-eager] 0.3525ms 0.2064ms 4.8443 KOps/s 4.7901 KOps/s $\color{#35bf28}+1.13\%$
test_compile_add_one_flat[tensorclass-compile] 0.2473ms 97.1759μs 10.2906 KOps/s 9.8658 KOps/s $\color{#35bf28}+4.31\%$
test_compile_add_one_flat[tensorclass-eager] 0.2440ms 50.2648μs 19.8946 KOps/s 19.1750 KOps/s $\color{#35bf28}+3.75\%$
test_compile_add_one_flat[pytree-compile] 0.2438ms 0.1358ms 7.3635 KOps/s 7.3379 KOps/s $\color{#35bf28}+0.35\%$
test_compile_add_one_flat[pytree-eager] 0.6616ms 0.4670ms 2.1415 KOps/s 2.1339 KOps/s $\color{#35bf28}+0.36\%$
test_compile_add_self_flat[tensordict-eager] 0.3932ms 0.2477ms 4.0367 KOps/s 4.0143 KOps/s $\color{#35bf28}+0.56\%$
test_compile_add_self_flat[tensordict-compile] 0.2719ms 0.1426ms 7.0150 KOps/s 6.7532 KOps/s $\color{#35bf28}+3.88\%$
test_compile_add_self_flat[tensorclass-eager] 0.2095ms 59.9878μs 16.6701 KOps/s 15.8181 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_compile_add_self_flat[tensorclass-compile] 0.2584ms 98.6961μs 10.1321 KOps/s 9.6436 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_compile_add_self_flat[pytree-eager] 0.5504ms 0.3903ms 2.5618 KOps/s 2.5273 KOps/s $\color{#35bf28}+1.37\%$
test_compile_add_self_flat[pytree-compile] 0.2775ms 0.1368ms 7.3092 KOps/s 7.1257 KOps/s $\color{#35bf28}+2.57\%$
test_compile_copy_flat[tensordict-compile] 0.1602ms 18.4458μs 54.2130 KOps/s 56.5806 KOps/s $\color{#d91a1a}-4.18\%$
test_compile_copy_flat[tensordict-eager] 61.5510μs 26.8222μs 37.2826 KOps/s 36.6051 KOps/s $\color{#35bf28}+1.85\%$
test_compile_copy_flat[pytree-compile] 0.1650ms 69.0668μs 14.4787 KOps/s 14.3989 KOps/s $\color{#35bf28}+0.55\%$
test_compile_copy_flat[pytree-eager] 0.1081ms 51.2620μs 19.5076 KOps/s 19.6826 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_assign_and_add[tensordict-compile] 1.6407ms 0.3942ms 2.5369 KOps/s 2.2159 KOps/s $\textbf{\color{#35bf28}+14.49\%}$
test_compile_assign_and_add[tensordict-eager] 3.0534ms 2.7025ms 370.0276 Ops/s 395.7843 Ops/s $\textbf{\color{#d91a1a}-6.51\%}$
test_compile_assign_and_add[pytree-compile] 1.5617ms 0.4244ms 2.3565 KOps/s 2.2504 KOps/s $\color{#35bf28}+4.71\%$
test_compile_assign_and_add[pytree-eager] 2.8924ms 2.5940ms 385.5024 Ops/s 396.1485 Ops/s $\color{#d91a1a}-2.69\%$
test_compile_indexing[tensor-tensordict-compile] 0.2785ms 0.1157ms 8.6411 KOps/s 8.8315 KOps/s $\color{#d91a1a}-2.16\%$
test_compile_indexing[tensor-tensordict-eager] 0.5557ms 77.1140μs 12.9678 KOps/s 12.5167 KOps/s $\color{#35bf28}+3.60\%$
test_compile_indexing[tensor-tensorclass-compile] 0.4889ms 0.1040ms 9.6194 KOps/s 9.4478 KOps/s $\color{#35bf28}+1.82\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2860ms 66.4457μs 15.0499 KOps/s 14.6928 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[tensor-pytree-compile] 0.3123ms 0.1093ms 9.1463 KOps/s 9.5114 KOps/s $\color{#d91a1a}-3.84\%$
test_compile_indexing[tensor-pytree-eager] 0.2752ms 69.3051μs 14.4289 KOps/s 14.6950 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_indexing[slice-tensordict-compile] 0.2717ms 99.6963μs 10.0305 KOps/s 9.9854 KOps/s $\color{#35bf28}+0.45\%$
test_compile_indexing[slice-tensordict-eager] 0.1825ms 16.9182μs 59.1080 KOps/s 58.2897 KOps/s $\color{#35bf28}+1.40\%$
test_compile_indexing[slice-tensorclass-compile] 0.2420ms 95.2271μs 10.5012 KOps/s 10.3487 KOps/s $\color{#35bf28}+1.47\%$
test_compile_indexing[slice-tensorclass-eager] 0.1512ms 15.8869μs 62.9448 KOps/s 63.6647 KOps/s $\color{#d91a1a}-1.13\%$
test_compile_indexing[slice-pytree-compile] 0.2438ms 96.2703μs 10.3874 KOps/s 10.3670 KOps/s $\color{#35bf28}+0.20\%$
test_compile_indexing[slice-pytree-eager] 0.1425ms 15.7489μs 63.4965 KOps/s 63.8661 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_indexing[int-tensordict-compile] 0.2710ms 99.8874μs 10.0113 KOps/s 9.9032 KOps/s $\color{#35bf28}+1.09\%$
test_compile_indexing[int-tensordict-eager] 0.7024ms 16.7772μs 59.6047 KOps/s 48.4301 KOps/s $\textbf{\color{#35bf28}+23.07\%}$
test_compile_indexing[int-tensorclass-compile] 0.2768ms 96.5712μs 10.3551 KOps/s 10.3191 KOps/s $\color{#35bf28}+0.35\%$
test_compile_indexing[int-tensorclass-eager] 0.1210ms 15.5829μs 64.1728 KOps/s 63.9034 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[int-pytree-compile] 0.2685ms 96.2300μs 10.3918 KOps/s 10.3471 KOps/s $\color{#35bf28}+0.43\%$
test_compile_indexing[int-pytree-eager] 48.1310μs 15.5997μs 64.1036 KOps/s 63.9766 KOps/s $\color{#35bf28}+0.20\%$
test_mod_add[eager] 0.2004ms 38.0707μs 26.2669 KOps/s 28.2361 KOps/s $\textbf{\color{#d91a1a}-6.97\%}$
test_mod_add[compile] 0.4143ms 78.4355μs 12.7493 KOps/s 12.5871 KOps/s $\color{#35bf28}+1.29\%$
test_mod_add[compile-overhead] 0.3233ms 0.1710ms 5.8475 KOps/s 5.7406 KOps/s $\color{#35bf28}+1.86\%$
test_mod_wrap[eager] 0.4140ms 0.2500ms 3.9996 KOps/s 3.9012 KOps/s $\color{#35bf28}+2.52\%$
test_mod_wrap[compile] 0.4340ms 0.2835ms 3.5270 KOps/s 3.5040 KOps/s $\color{#35bf28}+0.66\%$
test_mod_wrap[compile-overhead] 7.1485ms 3.7889ms 263.9293 Ops/s 267.1695 Ops/s $\color{#d91a1a}-1.21\%$
test_mod_wrap_and_backward[eager] 1.9299ms 1.3442ms 743.9109 Ops/s 691.3393 Ops/s $\textbf{\color{#35bf28}+7.60\%}$
test_mod_wrap_and_backward[compile] 1.4679ms 1.2397ms 806.6299 Ops/s 735.8102 Ops/s $\textbf{\color{#35bf28}+9.62\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3667ms 0.9113ms 1.0974 KOps/s 901.8095 Ops/s $\textbf{\color{#35bf28}+21.69\%}$
test_seq_add[eager] 0.2913ms 0.1092ms 9.1615 KOps/s 9.1745 KOps/s $\color{#d91a1a}-0.14\%$
test_seq_add[compile] 0.2519ms 87.3157μs 11.4527 KOps/s 11.2181 KOps/s $\color{#35bf28}+2.09\%$
test_seq_add[compile-overhead] 0.2829ms 0.1346ms 7.4290 KOps/s 7.7623 KOps/s $\color{#d91a1a}-4.29\%$
test_seq_wrap[eager] 0.6748ms 0.4289ms 2.3317 KOps/s 2.4161 KOps/s $\color{#d91a1a}-3.49\%$
test_seq_wrap[compile] 0.5200ms 0.3085ms 3.2419 KOps/s 3.3234 KOps/s $\color{#d91a1a}-2.45\%$
test_seq_wrap[compile-overhead] 0.4154ms 0.2206ms 4.5330 KOps/s 4.4553 KOps/s $\color{#35bf28}+1.74\%$
test_func_call_runtime[False-eager] 0.9248ms 0.7360ms 1.3587 KOps/s 1.3590 KOps/s $\color{#d91a1a}-0.03\%$
test_func_call_runtime[False-compile] 0.9288ms 0.7248ms 1.3797 KOps/s 1.3615 KOps/s $\color{#35bf28}+1.34\%$
test_func_call_runtime[False-compile-overhead] 0.5013ms 0.3562ms 2.8074 KOps/s 2.7760 KOps/s $\color{#35bf28}+1.13\%$
test_func_call_runtime[True-eager] 1.0884ms 0.8864ms 1.1282 KOps/s 1.1005 KOps/s $\color{#35bf28}+2.52\%$
test_func_call_runtime[True-compile] 0.9289ms 0.7451ms 1.3421 KOps/s 1.3371 KOps/s $\color{#35bf28}+0.37\%$
test_func_call_runtime[True-compile-overhead] 0.5215ms 0.3786ms 2.6413 KOps/s 2.6343 KOps/s $\color{#35bf28}+0.26\%$
test_func_call_cm_runtime[False-eager] 0.9210ms 0.7507ms 1.3320 KOps/s 1.3579 KOps/s $\color{#d91a1a}-1.90\%$
test_func_call_cm_runtime[False-compile] 0.8819ms 0.7301ms 1.3696 KOps/s 1.3645 KOps/s $\color{#35bf28}+0.37\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5196ms 0.3599ms 2.7785 KOps/s 2.7760 KOps/s $\color{#35bf28}+0.09\%$
test_func_call_cm_runtime[True-eager] 1.1248ms 0.9827ms 1.0176 KOps/s 992.5009 Ops/s $\color{#35bf28}+2.53\%$
test_func_call_cm_runtime[True-compile] 0.9372ms 0.7750ms 1.2904 KOps/s 1.2726 KOps/s $\color{#35bf28}+1.40\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5408ms 0.4030ms 2.4816 KOps/s 2.4609 KOps/s $\color{#35bf28}+0.84\%$
test_vmap_func_call_cm_runtime[eager] 2.5373ms 2.0588ms 485.7312 Ops/s 479.4608 Ops/s $\color{#35bf28}+1.31\%$
test_vmap_func_call_cm_runtime[compile] 0.9641ms 0.7877ms 1.2696 KOps/s 1.2580 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5312ms 0.4059ms 2.4634 KOps/s 2.4427 KOps/s $\color{#35bf28}+0.85\%$
test_distributed 3.2999ms 0.1820ms 5.4934 KOps/s 8.3948 KOps/s $\textbf{\color{#d91a1a}-34.56\%}$
test_tdmodule 0.1049ms 19.4738μs 51.3511 KOps/s 54.8887 KOps/s $\textbf{\color{#d91a1a}-6.45\%}$
test_tdmodule_dispatch 0.1891ms 34.5935μs 28.9072 KOps/s 30.7915 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_tdseq 39.7610μs 19.1804μs 52.1366 KOps/s 55.0504 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_tdseq_dispatch 54.3110μs 34.6858μs 28.8302 KOps/s 28.5696 KOps/s $\color{#35bf28}+0.91\%$
test_instantiation_functorch 1.6492ms 1.5239ms 656.1972 Ops/s 652.6599 Ops/s $\color{#35bf28}+0.54\%$
test_exec_functorch 0.2810ms 0.1450ms 6.8961 KOps/s 7.0842 KOps/s $\color{#d91a1a}-2.66\%$
test_exec_functional_call 0.2574ms 0.1374ms 7.2805 KOps/s 7.5342 KOps/s $\color{#d91a1a}-3.37\%$
test_exec_td_decorator 0.4043ms 0.1838ms 5.4406 KOps/s 5.6321 KOps/s $\color{#d91a1a}-3.40\%$
test_vmap_mlp_speed_decorator[True-True] 0.8527ms 0.6788ms 1.4732 KOps/s 1.4782 KOps/s $\color{#d91a1a}-0.34\%$
test_vmap_mlp_speed_decorator[True-False] 0.8818ms 0.6870ms 1.4557 KOps/s 1.4798 KOps/s $\color{#d91a1a}-1.63\%$
test_vmap_mlp_speed_decorator[False-True] 0.7779ms 0.6128ms 1.6318 KOps/s 1.6873 KOps/s $\color{#d91a1a}-3.29\%$
test_vmap_mlp_speed_decorator[False-False] 0.7651ms 0.6135ms 1.6299 KOps/s 1.6882 KOps/s $\color{#d91a1a}-3.46\%$
test_vmap_transformer_speed_decorator[True-True] 19.2285ms 18.9907ms 52.6573 Ops/s 52.3764 Ops/s $\color{#35bf28}+0.54\%$
test_vmap_transformer_speed_decorator[True-False] 19.6848ms 19.0209ms 52.5738 Ops/s 52.2499 Ops/s $\color{#35bf28}+0.62\%$
test_vmap_transformer_speed_decorator[False-True] 19.3425ms 18.9093ms 52.8840 Ops/s 52.7720 Ops/s $\color{#35bf28}+0.21\%$
test_vmap_transformer_speed_decorator[False-False] 19.6596ms 18.8987ms 52.9136 Ops/s 52.5705 Ops/s $\color{#35bf28}+0.65\%$
test_to_module_speed[True] 1.0246ms 0.9250ms 1.0811 KOps/s 1.0722 KOps/s $\color{#35bf28}+0.83\%$
test_to_module_speed[False] 1.2441ms 0.9059ms 1.1039 KOps/s 1.0985 KOps/s $\color{#35bf28}+0.50\%$
test_tc_init 75.5010μs 33.6730μs 29.6974 KOps/s 30.3461 KOps/s $\color{#d91a1a}-2.14\%$
test_tc_init_nested 0.1158ms 69.4435μs 14.4002 KOps/s 15.1854 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_tc_first_layer_tensor 4.6714μs 0.6993μs 1.4300 MOps/s 1.4346 MOps/s $\color{#d91a1a}-0.32\%$
test_tc_first_layer_nontensor 27.9600μs 2.3111μs 432.6934 KOps/s 432.7084 KOps/s $-0.00\%$
test_tc_second_layer_tensor 9.9403μs 1.4188μs 704.8406 KOps/s 702.4801 KOps/s $\color{#35bf28}+0.34\%$
test_tc_second_layer_nontensor 36.7110μs 3.0023μs 333.0762 KOps/s 328.9032 KOps/s $\color{#35bf28}+1.27\%$
test_unbind 0.2389s 9.9505ms 100.4975 Ops/s 152.0667 Ops/s $\textbf{\color{#d91a1a}-33.91\%}$
test_full_like 10.2126ms 9.6432ms 103.7005 Ops/s 102.7222 Ops/s $\color{#35bf28}+0.95\%$
test_zeros_like 4.9620ms 4.4171ms 226.3930 Ops/s 230.0867 Ops/s $\color{#d91a1a}-1.61\%$
test_ones_like 4.9986ms 4.4412ms 225.1625 Ops/s 226.2782 Ops/s $\color{#d91a1a}-0.49\%$
test_clone 7.4527ms 6.9028ms 144.8687 Ops/s 144.7483 Ops/s $\color{#35bf28}+0.08\%$
test_squeeze 0.1896ms 9.3956μs 106.4323 KOps/s 110.9623 KOps/s $\color{#d91a1a}-4.08\%$
test_unsqueeze 0.1933ms 70.7597μs 14.1323 KOps/s 14.4830 KOps/s $\color{#d91a1a}-2.42\%$
test_split 0.3919ms 0.1569ms 6.3747 KOps/s 6.5004 KOps/s $\color{#d91a1a}-1.93\%$
test_permute 0.3265ms 0.1737ms 5.7578 KOps/s 5.8015 KOps/s $\color{#d91a1a}-0.75\%$
test_stack 52.1096ms 51.6567ms 19.3586 Ops/s 19.1713 Ops/s $\color{#35bf28}+0.98\%$
test_cat 52.4260ms 51.4571ms 19.4337 Ops/s 19.2628 Ops/s $\color{#35bf28}+0.89\%$

@kurtamohler
Copy link
Collaborator

kurtamohler commented Nov 22, 2024

@kurtamohler I'm investigating some issues with compile and I'd appretiate some help if you have time (ofc!)

I found a fix for the first broken case (plain TensorDict). I'll push a PR to PyTorch by tomorrow--just need to figure out how to write a test that doesn't depend on tensordict. There are already a few tests in PyTorch that I can base it on

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: 089f6d745257b142b28e1005dc9adf82ed3b394b
Pull Request resolved: #1100
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: b716dab9a20137b68587f5b3b08fa735b43d6aec
Pull Request resolved: #1100
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: 81cec096a6a7921b21521d696eb216ca0443a3a9
Pull Request resolved: #1100
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: 81cec096a6a7921b21521d696eb216ca0443a3a9
Pull Request resolved: #1100
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Nov 25, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: 87e1ae8af75ae3833c1e984dbbf9f69c1831ad1c
Pull Request resolved: #1100
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: bd701ecfaf68605801a215d3cd9d49268b888bb3
Pull Request resolved: #1100
@vmoens vmoens merged commit a3bbdbd into gh/vmoens/36/base Nov 25, 2024
33 of 37 checks passed
vmoens added a commit that referenced this pull request Nov 25, 2024
…vious

ghstack-source-id: bd701ecfaf68605801a215d3cd9d49268b888bb3
Pull Request resolved: #1100
@vmoens vmoens deleted the gh/vmoens/36/head branch November 25, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants