-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor] Avoid TDParams parameters and buffers construction when obvious + new constructor #1100
Conversation
…vious ghstack-source-id: 6c833eb5b6144174e733bc7eedae435a6e9fce18 Pull Request resolved: #1100
@kurtamohler This works: from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn
module = torch.nn.Module()
module.params = torch.nn.Parameter(torch.randn(3))
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)
@torch.compile(fullgraph=True)
def func(z, params2):
with params2.to_module(module):
out = z + module.params
return out
print(func(torch.zeros(()), params2)) All of these don't
from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn
module = torch.nn.Module()
module.params = torch.nn.Parameter(torch.randn(3))
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)
# Isolate the inner tensordict
params2 = params2._param_td
@torch.compile(fullgraph=True)
def func(z, params2):
with params2.to_module(module):
out = z + module.params
return out
print(func(torch.zeros(()), params2))
from tensordict import from_module, TensorDictParams, TensorDict
import torch.nn
module = torch.nn.Module()
module.params = TensorDictParams(
# string="a string!",
TensorDict(a=0.0)
)
params2 = from_module(module).data.clone()
params2 *= 0
params2 = TensorDictParams(params2)
@torch.compile(fullgraph=True)
def func(z, params2):
with params2.to_module(module):
out = z + module.params["a"]
return out
print(func(torch.zeros(()), params2)) The use case where we have a non-tensor defined in the tensordict (see the comment "string" key above) is also important because it might happen that we have a TensorDIctParams with non-tensors somewhere in the module. It's crucial that _dyanmo works fine with this kinds of ops. I suspect that just handling (commenting on this PR as it's part of the effort to rationalize TensorDictParams, see also pytorch/pytorch#141118 for a related issue) |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 37.5200μs | 17.7718μs | 56.2690 KOps/s | 60.5652 KOps/s | |
test_plain_set_stack_nested | 46.6470μs | 18.0200μs | 55.4939 KOps/s | 60.3031 KOps/s | |
test_plain_set_nested_inplace | 63.7390μs | 19.6052μs | 51.0068 KOps/s | 53.2813 KOps/s | |
test_plain_set_stack_nested_inplace | 62.8770μs | 19.6325μs | 50.9360 KOps/s | 54.4696 KOps/s | |
test_items | 41.9280μs | 4.1561μs | 240.6095 KOps/s | 242.7487 KOps/s | |
test_items_nested | 0.6360ms | 0.3995ms | 2.5033 KOps/s | 2.5069 KOps/s | |
test_items_nested_locked | 0.5435ms | 0.3958ms | 2.5265 KOps/s | 2.5228 KOps/s | |
test_items_nested_leaf | 0.1341ms | 71.0420μs | 14.0762 KOps/s | 13.9968 KOps/s | |
test_items_stack_nested | 0.5159ms | 0.4012ms | 2.4923 KOps/s | 2.4942 KOps/s | |
test_items_stack_nested_leaf | 0.1708ms | 73.6840μs | 13.5715 KOps/s | 13.3787 KOps/s | |
test_items_stack_nested_locked | 0.5504ms | 0.3984ms | 2.5097 KOps/s | 2.5151 KOps/s | |
test_keys | 42.9000μs | 3.4675μs | 288.3888 KOps/s | 286.6262 KOps/s | |
test_keys_nested | 0.2310ms | 0.1359ms | 7.3611 KOps/s | 7.3130 KOps/s | |
test_keys_nested_locked | 1.7814ms | 0.1398ms | 7.1518 KOps/s | 6.9696 KOps/s | |
test_keys_nested_leaf | 0.1961ms | 0.1158ms | 8.6356 KOps/s | 8.3802 KOps/s | |
test_keys_stack_nested | 0.2261ms | 0.1356ms | 7.3723 KOps/s | 7.3073 KOps/s | |
test_keys_stack_nested_leaf | 0.2512ms | 0.1166ms | 8.5734 KOps/s | 8.4847 KOps/s | |
test_keys_stack_nested_locked | 0.2379ms | 0.1420ms | 7.0443 KOps/s | 7.0577 KOps/s | |
test_values | 8.6160μs | 1.0246μs | 975.9560 KOps/s | 950.0466 KOps/s | |
test_values_nested | 0.1044ms | 55.0898μs | 18.1522 KOps/s | 17.6642 KOps/s | |
test_values_nested_locked | 0.1068ms | 55.0887μs | 18.1525 KOps/s | 17.0400 KOps/s | |
test_values_nested_leaf | 0.1061ms | 59.7719μs | 16.7303 KOps/s | 16.4579 KOps/s | |
test_values_stack_nested | 0.1060ms | 57.0571μs | 17.5263 KOps/s | 17.7627 KOps/s | |
test_values_stack_nested_leaf | 0.1459ms | 60.5929μs | 16.5036 KOps/s | 16.1652 KOps/s | |
test_values_stack_nested_locked | 0.1108ms | 56.2071μs | 17.7913 KOps/s | 17.7906 KOps/s | |
test_membership | 38.7420μs | 0.8889μs | 1.1250 MOps/s | 1.1613 MOps/s | |
test_membership_nested | 31.0180μs | 2.9269μs | 341.6584 KOps/s | 344.0245 KOps/s | |
test_membership_nested_leaf | 44.0820μs | 2.9372μs | 340.4555 KOps/s | 333.8411 KOps/s | |
test_membership_stacked_nested | 25.2570μs | 2.8999μs | 344.8426 KOps/s | 350.2106 KOps/s | |
test_membership_stacked_nested_leaf | 44.8440μs | 2.8973μs | 345.1486 KOps/s | 349.0789 KOps/s | |
test_membership_nested_last | 32.8680μs | 4.1853μs | 238.9312 KOps/s | 240.1866 KOps/s | |
test_membership_nested_leaf_last | 35.4160μs | 4.2498μs | 235.3028 KOps/s | 241.0408 KOps/s | |
test_membership_stacked_nested_last | 24.9770μs | 4.2179μs | 237.0838 KOps/s | 210.1122 KOps/s | |
test_membership_stacked_nested_leaf_last | 22.6530μs | 4.1856μs | 238.9168 KOps/s | 210.2320 KOps/s | |
test_nested_getleaf | 33.3320μs | 10.6844μs | 93.5948 KOps/s | 93.9898 KOps/s | |
test_nested_get | 38.5720μs | 10.0721μs | 99.2844 KOps/s | 98.6059 KOps/s | |
test_stacked_getleaf | 37.5900μs | 10.6499μs | 93.8973 KOps/s | 93.6311 KOps/s | |
test_stacked_get | 36.0370μs | 10.0965μs | 99.0446 KOps/s | 97.5684 KOps/s | |
test_nested_getitemleaf | 37.8110μs | 11.0705μs | 90.3305 KOps/s | 90.9144 KOps/s | |
test_nested_getitem | 44.5670μs | 10.3433μs | 96.6806 KOps/s | 96.9396 KOps/s | |
test_stacked_getitemleaf | 37.9910μs | 11.1043μs | 90.0556 KOps/s | 90.8437 KOps/s | |
test_stacked_getitem | 57.4570μs | 10.4879μs | 95.3479 KOps/s | 96.6725 KOps/s | |
test_lock_nested | 3.3389ms | 0.4459ms | 2.2425 KOps/s | 2.2569 KOps/s | |
test_lock_stack_nested | 0.7865ms | 0.4129ms | 2.4219 KOps/s | 2.4157 KOps/s | |
test_unlock_nested | 0.6716ms | 0.3581ms | 2.7929 KOps/s | 2.7550 KOps/s | |
test_unlock_stack_nested | 1.2056ms | 0.3318ms | 3.0139 KOps/s | 3.0551 KOps/s | |
test_flatten_speed | 0.5941ms | 98.1445μs | 10.1891 KOps/s | 10.5127 KOps/s | |
test_unflatten_speed | 0.6165ms | 0.4900ms | 2.0408 KOps/s | 2.0309 KOps/s | |
test_common_ops | 4.7274ms | 0.7899ms | 1.2660 KOps/s | 1.3880 KOps/s | |
test_creation | 22.3610μs | 2.1098μs | 473.9829 KOps/s | 491.2926 KOps/s | |
test_creation_empty | 45.3150μs | 10.9852μs | 91.0312 KOps/s | 114.3637 KOps/s | |
test_creation_nested_1 | 50.5950μs | 13.6014μs | 73.5221 KOps/s | 88.0584 KOps/s | |
test_creation_nested_2 | 63.0880μs | 17.7648μs | 56.2912 KOps/s | 63.2149 KOps/s | |
test_clone | 0.2059ms | 13.0927μs | 76.3787 KOps/s | 77.7749 KOps/s | |
test_getitem[int] | 1.0644ms | 12.8644μs | 77.7342 KOps/s | 80.4661 KOps/s | |
test_getitem[slice_int] | 0.1545ms | 25.6610μs | 38.9696 KOps/s | 41.1266 KOps/s | |
test_getitem[range] | 0.1852ms | 50.8388μs | 19.6700 KOps/s | 21.9900 KOps/s | |
test_getitem[tuple] | 0.1588ms | 20.5853μs | 48.5783 KOps/s | 50.4903 KOps/s | |
test_getitem[list] | 0.3423ms | 46.5229μs | 21.4948 KOps/s | 23.7212 KOps/s | |
test_setitem_dim[int] | 54.6420μs | 26.0998μs | 38.3145 KOps/s | 31.6433 KOps/s | |
test_setitem_dim[slice_int] | 99.4760μs | 53.2522μs | 18.7786 KOps/s | 18.7787 KOps/s | |
test_setitem_dim[range] | 0.1331ms | 77.5365μs | 12.8971 KOps/s | 13.8328 KOps/s | |
test_setitem_dim[tuple] | 0.1143ms | 41.5637μs | 24.0595 KOps/s | 23.4739 KOps/s | |
test_setitem | 91.1300μs | 19.9476μs | 50.1313 KOps/s | 52.5229 KOps/s | |
test_set | 0.1429ms | 19.6318μs | 50.9378 KOps/s | 55.0237 KOps/s | |
test_set_shared | 3.6787ms | 0.1701ms | 5.8776 KOps/s | 5.9883 KOps/s | |
test_update | 0.1500ms | 22.7302μs | 43.9943 KOps/s | 51.2418 KOps/s | |
test_update_nested | 93.0140μs | 32.1860μs | 31.0694 KOps/s | 31.8976 KOps/s | |
test_update__nested | 0.6807ms | 32.2360μs | 31.0212 KOps/s | 31.7999 KOps/s | |
test_set_nested | 89.1460μs | 21.9397μs | 45.5795 KOps/s | 49.4529 KOps/s | |
test_set_nested_new | 0.4547ms | 28.5111μs | 35.0741 KOps/s | 39.6675 KOps/s | |
test_select | 0.1102ms | 43.2832μs | 23.1036 KOps/s | 24.3586 KOps/s | |
test_select_nested | 0.1469ms | 59.2179μs | 16.8868 KOps/s | 16.9315 KOps/s | |
test_exclude_nested | 0.1492ms | 77.5872μs | 12.8887 KOps/s | 12.9206 KOps/s | |
test_empty[True] | 0.6890ms | 0.3797ms | 2.6336 KOps/s | 2.6364 KOps/s | |
test_empty[False] | 11.7317μs | 1.2588μs | 794.3869 KOps/s | 820.5689 KOps/s | |
test_unbind_speed | 0.5439ms | 0.2631ms | 3.8008 KOps/s | 3.8175 KOps/s | |
test_unbind_speed_stack0 | 0.7690ms | 0.2586ms | 3.8671 KOps/s | 3.8759 KOps/s | |
test_unbind_speed_stack1 | 0.1038s | 0.7745ms | 1.2911 KOps/s | 1.4237 KOps/s | |
test_split | 0.1017s | 1.7512ms | 571.0257 Ops/s | 568.7830 Ops/s | |
test_chunk | 0.1022s | 1.7554ms | 569.6702 Ops/s | 565.4031 Ops/s | |
test_consolidate_njt[False-None] | 8.4264ms | 8.2149ms | 121.7296 Ops/s | 120.0505 Ops/s | |
test_creation[device0] | 0.2185ms | 92.6805μs | 10.7898 KOps/s | 10.7893 KOps/s | |
test_creation_from_tensor | 3.9865ms | 95.5234μs | 10.4686 KOps/s | 10.3484 KOps/s | |
test_add_one[memmap_tensor0] | 0.1574ms | 4.8192μs | 207.5039 KOps/s | 203.0677 KOps/s | |
test_contiguous[memmap_tensor0] | 26.5700μs | 0.5163μs | 1.9367 MOps/s | 1.9824 MOps/s | |
test_stack[memmap_tensor0] | 29.5150μs | 3.3374μs | 299.6360 KOps/s | 294.8212 KOps/s | |
test_memmaptd_index | 0.8364ms | 0.2429ms | 4.1170 KOps/s | 4.2266 KOps/s | |
test_memmaptd_index_astensor | 1.3755ms | 0.3292ms | 3.0375 KOps/s | 3.1916 KOps/s | |
test_memmaptd_index_op | 0.9583ms | 0.5783ms | 1.7293 KOps/s | 1.8326 KOps/s | |
test_serialize_model | 0.1253s | 0.1135s | 8.8144 Ops/s | 7.3362 Ops/s | |
test_serialize_model_pickle | 0.4457s | 0.3907s | 2.5594 Ops/s | 2.5295 Ops/s | |
test_serialize_weights | 0.2162s | 0.1273s | 7.8554 Ops/s | 8.6245 Ops/s | |
test_serialize_weights_returnearly | 0.1743s | 0.1584s | 6.3120 Ops/s | 6.3093 Ops/s | |
test_serialize_weights_pickle | 0.6024s | 0.4488s | 2.2283 Ops/s | 2.3864 Ops/s | |
test_serialize_weights_filesystem | 0.1465s | 0.1410s | 7.0925 Ops/s | 7.0290 Ops/s | |
test_serialize_model_filesystem | 0.2750s | 0.1685s | 5.9350 Ops/s | 6.5525 Ops/s | |
test_reshape_pytree | 67.3160μs | 27.0471μs | 36.9725 KOps/s | 37.3610 KOps/s | |
test_reshape_td | 79.2180μs | 31.9825μs | 31.2671 KOps/s | 31.1081 KOps/s | |
test_view_pytree | 86.0240μs | 27.0911μs | 36.9125 KOps/s | 37.7652 KOps/s | |
test_view_td | 74.3190μs | 38.5364μs | 25.9495 KOps/s | 26.5413 KOps/s | |
test_unbind_pytree | 61.5650μs | 29.8761μs | 33.4716 KOps/s | 33.7374 KOps/s | |
test_unbind_td | 0.3220ms | 38.8036μs | 25.7708 KOps/s | 26.6517 KOps/s | |
test_split_pytree | 64.2500μs | 29.7305μs | 33.6355 KOps/s | 33.5452 KOps/s | |
test_split_td | 0.4950ms | 44.3371μs | 22.5545 KOps/s | 22.8858 KOps/s | |
test_add_pytree | 82.7140μs | 36.0168μs | 27.7648 KOps/s | 25.9122 KOps/s | |
test_add_td | 0.1222ms | 56.9907μs | 17.5467 KOps/s | 19.1445 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1289ms | 61.2723μs | 16.3206 KOps/s | 16.3860 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 1.3991ms | 0.1603ms | 6.2389 KOps/s | 6.2866 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1057ms | 45.6893μs | 21.8869 KOps/s | 21.8378 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2707ms | 0.1178ms | 8.4896 KOps/s | 8.3059 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 60.0920μs | 25.8561μs | 38.6757 KOps/s | 38.9260 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1184ms | 53.0457μs | 18.8517 KOps/s | 18.5055 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1667ms | 78.3625μs | 12.7612 KOps/s | 12.8349 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1295ms | 67.7222μs | 14.7662 KOps/s | 14.8267 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1830ms | 0.1056ms | 9.4681 KOps/s | 9.6577 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.4185ms | 0.1965ms | 5.0903 KOps/s | 5.0473 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 96.0000μs | 44.6646μs | 22.3891 KOps/s | 22.7047 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.5132ms | 61.0907μs | 16.3691 KOps/s | 16.4554 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2339ms | 0.1038ms | 9.6333 KOps/s | 9.9685 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.3696ms | 0.2018ms | 4.9557 KOps/s | 4.9278 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3899ms | 0.2081ms | 4.8045 KOps/s | 4.7944 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.1838ms | 0.1071ms | 9.3328 KOps/s | 9.5650 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2162ms | 54.1135μs | 18.4797 KOps/s | 18.7843 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1035ms | 47.4682μs | 21.0667 KOps/s | 22.3377 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6323ms | 0.1600ms | 6.2513 KOps/s | 6.2507 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1956ms | 0.1043ms | 9.5875 KOps/s | 9.8230 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 72.4050μs | 20.7560μs | 48.1789 KOps/s | 48.2453 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1568ms | 60.5011μs | 16.5286 KOps/s | 16.8786 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1570ms | 81.9717μs | 12.1993 KOps/s | 12.5200 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1333ms | 70.1679μs | 14.2515 KOps/s | 14.3859 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3080ms | 0.2088ms | 4.7900 KOps/s | 4.9392 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.4650ms | 1.2749ms | 784.3582 Ops/s | 786.7951 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2976ms | 0.2029ms | 4.9274 KOps/s | 5.0518 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 0.9752ms | 0.7726ms | 1.2944 KOps/s | 1.2946 KOps/s | |
test_compile_assign_and_add_stack[compile] | 0.8092ms | 0.4602ms | 2.1728 KOps/s | 2.2720 KOps/s | |
test_compile_assign_and_add_stack[eager] | 3.7653ms | 2.6694ms | 374.6115 Ops/s | 403.5179 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1071ms | 36.6510μs | 27.2844 KOps/s | 29.2727 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5195ms | 33.6921μs | 29.6805 KOps/s | 30.6632 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 80.0300μs | 28.9048μs | 34.5964 KOps/s | 35.2370 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 75.3800μs | 23.3987μs | 42.7375 KOps/s | 43.1994 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 79.9490μs | 29.5253μs | 33.8692 KOps/s | 34.0809 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 70.2710μs | 23.2727μs | 42.9689 KOps/s | 42.7391 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1252ms | 53.0416μs | 18.8531 KOps/s | 19.4881 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.5973ms | 20.6311μs | 48.4704 KOps/s | 49.7367 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2688ms | 44.6806μs | 22.3811 KOps/s | 22.6535 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 51.9280μs | 19.1465μs | 52.2290 KOps/s | 52.9354 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1077ms | 45.1204μs | 22.1629 KOps/s | 22.2743 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.3895ms | 19.7492μs | 50.6349 KOps/s | 52.9362 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1102ms | 53.4855μs | 18.6967 KOps/s | 19.3953 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.9422ms | 20.6925μs | 48.3267 KOps/s | 51.3132 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2836ms | 45.2140μs | 22.1170 KOps/s | 22.3694 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 58.5100μs | 19.2102μs | 52.0556 KOps/s | 53.0178 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1017ms | 44.8239μs | 22.3095 KOps/s | 22.0816 KOps/s | |
test_compile_indexing[int-pytree-eager] | 58.3090μs | 18.9932μs | 52.6503 KOps/s | 52.6361 KOps/s | |
test_mod_add[eager] | 87.5640μs | 34.0142μs | 29.3995 KOps/s | 29.3914 KOps/s | |
test_mod_add[compile] | 0.1025ms | 49.2650μs | 20.2984 KOps/s | 20.7642 KOps/s | |
test_mod_add[compile-overhead] | 0.1397ms | 47.7289μs | 20.9517 KOps/s | 21.0539 KOps/s | |
test_mod_wrap[eager] | 0.3920ms | 0.2270ms | 4.4047 KOps/s | 4.4791 KOps/s | |
test_mod_wrap[compile] | 0.3031ms | 0.2067ms | 4.8374 KOps/s | 4.8289 KOps/s | |
test_mod_wrap[compile-overhead] | 0.3624ms | 0.2047ms | 4.8843 KOps/s | 4.9404 KOps/s | |
test_mod_wrap_and_backward[eager] | 12.2917ms | 11.1636ms | 89.5771 Ops/s | 92.5519 Ops/s | |
test_mod_wrap_and_backward[compile] | 12.2613ms | 11.1039ms | 90.0588 Ops/s | 79.8660 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.0727ms | 11.1135ms | 89.9807 Ops/s | 79.9735 Ops/s | |
test_seq_add[eager] | 0.2324ms | 0.1128ms | 8.8635 KOps/s | 8.9034 KOps/s | |
test_seq_add[compile] | 0.1287ms | 62.7777μs | 15.9292 KOps/s | 16.3092 KOps/s | |
test_seq_add[compile-overhead] | 0.1292ms | 59.6980μs | 16.7510 KOps/s | 16.5432 KOps/s | |
test_seq_wrap[eager] | 0.7195ms | 0.4463ms | 2.2408 KOps/s | 2.3191 KOps/s | |
test_seq_wrap[compile] | 0.3464ms | 0.2259ms | 4.4269 KOps/s | 4.4813 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3454ms | 0.2271ms | 4.4040 KOps/s | 4.5036 KOps/s | |
test_func_call_runtime[False-eager] | 0.8425ms | 0.5472ms | 1.8276 KOps/s | 1.8378 KOps/s | |
test_func_call_runtime[False-compile] | 0.7622ms | 0.4278ms | 2.3375 KOps/s | 2.3734 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6721ms | 0.4323ms | 2.3131 KOps/s | 2.3171 KOps/s | |
test_func_call_runtime[True-eager] | 1.0132ms | 0.7591ms | 1.3173 KOps/s | 1.3440 KOps/s | |
test_func_call_runtime[True-compile] | 0.6221ms | 0.4699ms | 2.1283 KOps/s | 2.1828 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6116ms | 0.4707ms | 2.1243 KOps/s | 2.1634 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8440ms | 0.5430ms | 1.8416 KOps/s | 1.8226 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8076ms | 0.4314ms | 2.3181 KOps/s | 2.3825 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.5622ms | 0.4283ms | 2.3346 KOps/s | 2.3667 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.0355ms | 0.9065ms | 1.1031 KOps/s | 1.1270 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.6967ms | 0.4973ms | 2.0110 KOps/s | 2.0740 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.7547ms | 0.4971ms | 2.0118 KOps/s | 2.0594 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5078ms | 1.8854ms | 530.4018 Ops/s | 528.0010 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.8691ms | 0.5250ms | 1.9047 KOps/s | 1.9540 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 1.0097ms | 0.5243ms | 1.9072 KOps/s | 1.9283 KOps/s | |
test_distributed | 0.3248ms | 0.1265ms | 7.9064 KOps/s | 7.8791 KOps/s | |
test_tdmodule | 58.1790μs | 26.5360μs | 37.6846 KOps/s | 40.4123 KOps/s | |
test_tdmodule_dispatch | 80.9710μs | 48.1567μs | 20.7655 KOps/s | 21.8165 KOps/s | |
test_tdseq | 48.2000μs | 26.1955μs | 38.1745 KOps/s | 39.6970 KOps/s | |
test_tdseq_dispatch | 99.8760μs | 50.6693μs | 19.7358 KOps/s | 20.4946 KOps/s | |
test_instantiation_functorch | 2.2863ms | 1.5160ms | 659.6262 Ops/s | 643.7796 Ops/s | |
test_exec_functorch | 0.3226ms | 0.1773ms | 5.6396 KOps/s | 5.4686 KOps/s | |
test_exec_functional_call | 0.3135ms | 0.1731ms | 5.7776 KOps/s | 5.7591 KOps/s | |
test_exec_td_decorator | 0.4913ms | 0.2317ms | 4.3159 KOps/s | 4.2847 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.9153ms | 0.6564ms | 1.5234 KOps/s | 1.5487 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.3172ms | 0.6787ms | 1.4733 KOps/s | 1.5512 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7977ms | 0.5223ms | 1.9147 KOps/s | 1.9070 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8181ms | 0.5235ms | 1.9103 KOps/s | 1.9030 KOps/s | |
test_to_module_speed[True] | 2.0904ms | 1.2927ms | 773.5484 Ops/s | 775.7532 Ops/s | |
test_to_module_speed[False] | 2.0047ms | 1.2623ms | 792.2227 Ops/s | 785.2880 Ops/s | |
test_tc_init | 94.3460μs | 45.7334μs | 21.8658 KOps/s | 23.0979 KOps/s | |
test_tc_init_nested | 0.1680ms | 90.6991μs | 11.0255 KOps/s | 11.5223 KOps/s | |
test_tc_first_layer_tensor | 28.4330μs | 1.5177μs | 658.8964 KOps/s | 639.6708 KOps/s | |
test_tc_first_layer_nontensor | 25.3270μs | 4.7540μs | 210.3483 KOps/s | 206.7197 KOps/s | |
test_tc_second_layer_tensor | 32.7810μs | 2.8300μs | 353.3576 KOps/s | 347.7769 KOps/s | |
test_tc_second_layer_nontensor | 40.3580μs | 6.0546μs | 165.1629 KOps/s | 161.2841 KOps/s | |
test_unbind | 0.2257s | 12.5676ms | 79.5700 Ops/s | 81.1200 Ops/s | |
test_full_like | 17.2218ms | 11.8555ms | 84.3492 Ops/s | 84.0951 Ops/s | |
test_zeros_like | 10.7106ms | 7.2331ms | 138.2539 Ops/s | 138.3671 Ops/s | |
test_ones_like | 16.1535ms | 7.8974ms | 126.6234 Ops/s | 121.0681 Ops/s | |
test_clone | 12.8547ms | 9.6884ms | 103.2163 Ops/s | 100.0379 Ops/s | |
test_squeeze | 59.5320μs | 11.9572μs | 83.6317 KOps/s | 83.4679 KOps/s | |
test_unsqueeze | 0.1895ms | 90.7052μs | 11.0247 KOps/s | 11.1871 KOps/s | |
test_split | 0.5069ms | 0.1943ms | 5.1461 KOps/s | 5.1697 KOps/s | |
test_permute | 0.3219ms | 0.2203ms | 4.5384 KOps/s | 4.5361 KOps/s | |
test_stack | 31.8853ms | 24.9190ms | 40.1300 Ops/s | 39.6509 Ops/s | |
test_cat | 29.6519ms | 24.5538ms | 40.7269 Ops/s | 40.4676 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 30.3110μs | 10.5243μs | 95.0185 KOps/s | 98.4916 KOps/s | |
test_plain_set_stack_nested | 44.3800μs | 10.5713μs | 94.5960 KOps/s | 97.7609 KOps/s | |
test_plain_set_nested_inplace | 37.0310μs | 11.3783μs | 87.8867 KOps/s | 90.6325 KOps/s | |
test_plain_set_stack_nested_inplace | 47.6010μs | 11.4420μs | 87.3976 KOps/s | 90.2947 KOps/s | |
test_items | 30.5810μs | 2.8859μs | 346.5121 KOps/s | 339.9630 KOps/s | |
test_items_nested | 0.4138ms | 0.3491ms | 2.8648 KOps/s | 2.8037 KOps/s | |
test_items_nested_locked | 0.4166ms | 0.3507ms | 2.8512 KOps/s | 2.7920 KOps/s | |
test_items_nested_leaf | 0.1405ms | 58.4545μs | 17.1073 KOps/s | 17.2500 KOps/s | |
test_items_stack_nested | 0.3921ms | 0.3512ms | 2.8470 KOps/s | 2.7965 KOps/s | |
test_items_stack_nested_leaf | 84.0620μs | 59.6644μs | 16.7604 KOps/s | 17.3847 KOps/s | |
test_items_stack_nested_locked | 0.4132ms | 0.3523ms | 2.8387 KOps/s | 2.7751 KOps/s | |
test_keys | 27.5500μs | 3.4408μs | 290.6310 KOps/s | 286.2105 KOps/s | |
test_keys_nested | 0.2584ms | 70.4304μs | 14.1984 KOps/s | 14.2138 KOps/s | |
test_keys_nested_locked | 0.7081ms | 75.5220μs | 13.2412 KOps/s | 13.1525 KOps/s | |
test_keys_nested_leaf | 0.2449ms | 61.1582μs | 16.3510 KOps/s | 16.2390 KOps/s | |
test_keys_stack_nested | 0.2640ms | 71.1068μs | 14.0634 KOps/s | 14.1932 KOps/s | |
test_keys_stack_nested_leaf | 91.9820μs | 62.0235μs | 16.1229 KOps/s | 16.3589 KOps/s | |
test_keys_stack_nested_locked | 0.1252ms | 76.4041μs | 13.0883 KOps/s | 13.2127 KOps/s | |
test_values | 6.1168μs | 0.8437μs | 1.1852 MOps/s | 1.1844 MOps/s | |
test_values_nested | 0.1897ms | 31.1968μs | 32.0546 KOps/s | 32.2884 KOps/s | |
test_values_nested_locked | 61.9710μs | 32.8663μs | 30.4263 KOps/s | 30.8446 KOps/s | |
test_values_nested_leaf | 89.8520μs | 33.5065μs | 29.8449 KOps/s | 30.0127 KOps/s | |
test_values_stack_nested | 68.3510μs | 31.8361μs | 31.4109 KOps/s | 31.9896 KOps/s | |
test_values_stack_nested_leaf | 99.0510μs | 34.0912μs | 29.3331 KOps/s | 29.8416 KOps/s | |
test_values_stack_nested_locked | 95.1620μs | 33.2666μs | 30.0602 KOps/s | 30.6668 KOps/s | |
test_membership | 2.4716μs | 0.5110μs | 1.9569 MOps/s | 1.9839 MOps/s | |
test_membership_nested | 15.6355μs | 2.0301μs | 492.5803 KOps/s | 506.5291 KOps/s | |
test_membership_nested_leaf | 21.7005μs | 2.0220μs | 494.5525 KOps/s | 490.5059 KOps/s | |
test_membership_stacked_nested | 28.3210μs | 2.1431μs | 466.6125 KOps/s | 480.4990 KOps/s | |
test_membership_stacked_nested_leaf | 57.6110μs | 2.1461μs | 465.9671 KOps/s | 481.1816 KOps/s | |
test_membership_nested_last | 64.7810μs | 2.9797μs | 335.6071 KOps/s | 341.9729 KOps/s | |
test_membership_nested_leaf_last | 38.4510μs | 2.9974μs | 333.6213 KOps/s | 342.7576 KOps/s | |
test_membership_stacked_nested_last | 45.4510μs | 4.4617μs | 224.1307 KOps/s | 345.0212 KOps/s | |
test_membership_stacked_nested_leaf_last | 0.1456ms | 4.4559μs | 224.4204 KOps/s | 343.8837 KOps/s | |
test_nested_getleaf | 34.0100μs | 6.1823μs | 161.7526 KOps/s | 162.7630 KOps/s | |
test_nested_get | 47.0110μs | 5.8702μs | 170.3532 KOps/s | 171.7547 KOps/s | |
test_stacked_getleaf | 39.7410μs | 6.1641μs | 162.2293 KOps/s | 163.0706 KOps/s | |
test_stacked_get | 41.0810μs | 5.8604μs | 170.6361 KOps/s | 170.2058 KOps/s | |
test_nested_getitemleaf | 35.9500μs | 6.2281μs | 160.5619 KOps/s | 157.8922 KOps/s | |
test_nested_getitem | 48.1010μs | 5.9562μs | 167.8922 KOps/s | 168.0920 KOps/s | |
test_stacked_getitemleaf | 31.4400μs | 6.2707μs | 159.4714 KOps/s | 159.5180 KOps/s | |
test_stacked_getitem | 38.4210μs | 5.9273μs | 168.7112 KOps/s | 168.1465 KOps/s | |
test_lock_nested | 9.2430ms | 0.3729ms | 2.6818 KOps/s | 2.7208 KOps/s | |
test_lock_stack_nested | 0.3721ms | 0.3320ms | 3.0118 KOps/s | 3.0242 KOps/s | |
test_unlock_nested | 0.6521ms | 0.3066ms | 3.2617 KOps/s | 3.3248 KOps/s | |
test_unlock_stack_nested | 0.3606ms | 0.2725ms | 3.6698 KOps/s | 3.7169 KOps/s | |
test_flatten_speed | 0.1095ms | 74.1411μs | 13.4878 KOps/s | 13.5119 KOps/s | |
test_unflatten_speed | 0.3386ms | 0.3014ms | 3.3178 KOps/s | 3.2740 KOps/s | |
test_common_ops | 1.6665ms | 0.5966ms | 1.6761 KOps/s | 1.8007 KOps/s | |
test_creation | 0.1147ms | 1.4567μs | 686.4974 KOps/s | 688.2564 KOps/s | |
test_creation_empty | 0.1570ms | 7.1304μs | 140.2444 KOps/s | 154.4362 KOps/s | |
test_creation_nested_1 | 0.1716ms | 8.6149μs | 116.0776 KOps/s | 125.1023 KOps/s | |
test_creation_nested_2 | 0.1791ms | 11.1082μs | 90.0235 KOps/s | 95.6558 KOps/s | |
test_clone | 0.1107ms | 11.0919μs | 90.1560 KOps/s | 98.9329 KOps/s | |
test_getitem[int] | 1.5114ms | 10.7005μs | 93.4531 KOps/s | 94.9045 KOps/s | |
test_getitem[slice_int] | 0.1118ms | 20.7758μs | 48.1330 KOps/s | 48.6436 KOps/s | |
test_getitem[range] | 0.1396ms | 39.9930μs | 25.0043 KOps/s | 27.2042 KOps/s | |
test_getitem[tuple] | 0.1097ms | 18.0139μs | 55.5128 KOps/s | 56.5907 KOps/s | |
test_getitem[list] | 0.2657ms | 34.4405μs | 29.0356 KOps/s | 31.6191 KOps/s | |
test_setitem_dim[int] | 37.6410μs | 18.2941μs | 54.6623 KOps/s | 57.9996 KOps/s | |
test_setitem_dim[slice_int] | 60.9910μs | 38.5559μs | 25.9364 KOps/s | 27.1606 KOps/s | |
test_setitem_dim[range] | 81.3410μs | 53.8685μs | 18.5637 KOps/s | 19.6446 KOps/s | |
test_setitem_dim[tuple] | 53.9710μs | 31.5050μs | 31.7410 KOps/s | 32.7069 KOps/s | |
test_setitem | 81.6710μs | 15.0081μs | 66.6308 KOps/s | 72.7976 KOps/s | |
test_set | 87.3520μs | 14.6000μs | 68.4930 KOps/s | 75.1873 KOps/s | |
test_set_shared | 1.6383ms | 0.1476ms | 6.7769 KOps/s | 6.7838 KOps/s | |
test_update | 0.5241ms | 17.2870μs | 57.8471 KOps/s | 64.6149 KOps/s | |
test_update_nested | 84.5310μs | 22.0144μs | 45.4249 KOps/s | 50.0059 KOps/s | |
test_update__nested | 0.7816ms | 24.4504μs | 40.8991 KOps/s | 41.4518 KOps/s | |
test_set_nested | 0.1561ms | 15.7515μs | 63.4862 KOps/s | 68.6103 KOps/s | |
test_set_nested_new | 0.1250ms | 17.9211μs | 55.8001 KOps/s | 61.0164 KOps/s | |
test_select | 0.1010ms | 29.9617μs | 33.3760 KOps/s | 35.4283 KOps/s | |
test_select_nested | 0.1248ms | 41.6982μs | 23.9819 KOps/s | 23.6483 KOps/s | |
test_exclude_nested | 0.1266ms | 60.9422μs | 16.4090 KOps/s | 16.2085 KOps/s | |
test_empty[True] | 0.3038ms | 0.2724ms | 3.6707 KOps/s | 3.6076 KOps/s | |
test_empty[False] | 4.2861μs | 0.7419μs | 1.3480 MOps/s | 1.3433 MOps/s | |
test_to | 87.9410μs | 55.1710μs | 18.1255 KOps/s | 17.6355 KOps/s | |
test_to_nonblocking | 0.1952ms | 45.0849μs | 22.1804 KOps/s | 22.4339 KOps/s | |
test_unbind_speed | 0.3278ms | 0.2315ms | 4.3205 KOps/s | 4.4947 KOps/s | |
test_unbind_speed_stack0 | 0.3872ms | 0.2303ms | 4.3422 KOps/s | 4.4781 KOps/s | |
test_unbind_speed_stack1 | 0.7247ms | 0.5815ms | 1.7196 KOps/s | 1.5613 KOps/s | |
test_split | 98.5020ms | 1.5924ms | 628.0020 Ops/s | 646.3993 Ops/s | |
test_chunk | 99.4652ms | 1.5908ms | 628.6222 Ops/s | 590.6075 Ops/s | |
test_consolidate[False-None] | 99.8331ms | 2.7959ms | 357.6625 Ops/s | 391.6761 Ops/s | |
test_consolidate[default-None] | 1.8360ms | 1.6586ms | 602.9195 Ops/s | 604.4135 Ops/s | |
test_consolidate[reduce-overhead-None] | 1.8547ms | 1.7033ms | 587.1013 Ops/s | 598.9677 Ops/s | |
test_consolidate_njt[False-None] | 6.7723ms | 6.3690ms | 157.0096 Ops/s | 158.5023 Ops/s | |
test_to[False-False-None] | 1.8039ms | 1.6143ms | 619.4444 Ops/s | 609.2602 Ops/s | |
test_to[True-False-None] | 1.5079ms | 1.2362ms | 808.9507 Ops/s | 805.7162 Ops/s | |
test_to[within-False-None] | 4.0889ms | 3.8924ms | 256.9084 Ops/s | 257.0963 Ops/s | |
test_to[True-default-None] | 5.3475ms | 5.0026ms | 199.8941 Ops/s | 198.2639 Ops/s | |
test_to_njt[False-False-None] | 7.1007ms | 6.7874ms | 147.3322 Ops/s | 147.0606 Ops/s | |
test_to_njt[True-False-None] | 5.5353ms | 5.2474ms | 190.5695 Ops/s | 190.1452 Ops/s | |
test_to_njt[within-False-None] | 12.3544ms | 11.7431ms | 85.1564 Ops/s | 84.9260 Ops/s | |
test_creation[device0] | 0.5407ms | 78.3762μs | 12.7590 KOps/s | 12.3100 KOps/s | |
test_creation_from_tensor | 0.7059ms | 82.7142μs | 12.0898 KOps/s | 11.7933 KOps/s | |
test_add_one[memmap_tensor0] | 0.3984ms | 6.9513μs | 143.8589 KOps/s | 153.1211 KOps/s | |
test_contiguous[memmap_tensor0] | 1.8065μs | 0.4240μs | 2.3584 MOps/s | 2.5264 MOps/s | |
test_stack[memmap_tensor0] | 0.1445ms | 4.4882μs | 222.8074 KOps/s | 224.9039 KOps/s | |
test_memmaptd_index | 1.5755ms | 0.2497ms | 4.0041 KOps/s | 4.0255 KOps/s | |
test_memmaptd_index_astensor | 0.8806ms | 0.3065ms | 3.2622 KOps/s | 3.2552 KOps/s | |
test_memmaptd_index_op | 1.0273ms | 0.5706ms | 1.7525 KOps/s | 1.8174 KOps/s | |
test_serialize_model | 0.1331s | 0.1306s | 7.6543 Ops/s | 7.6174 Ops/s | |
test_serialize_model_pickle | 1.3477s | 1.2122s | 0.8249 Ops/s | 0.8430 Ops/s | |
test_serialize_weights | 0.4308s | 0.1729s | 5.7831 Ops/s | 7.6940 Ops/s | |
test_serialize_weights_returnearly | 0.3459s | 53.8704ms | 18.5631 Ops/s | 13.8139 Ops/s | |
test_serialize_weights_pickle | 1.3770s | 1.2169s | 0.8218 Ops/s | 0.8032 Ops/s | |
test_reshape_pytree | 0.1326ms | 21.7302μs | 46.0189 KOps/s | 44.7245 KOps/s | |
test_reshape_td | 0.1635ms | 26.2905μs | 38.0365 KOps/s | 34.5580 KOps/s | |
test_view_pytree | 0.1632ms | 21.6373μs | 46.2164 KOps/s | 45.7474 KOps/s | |
test_view_td | 0.1033ms | 29.1857μs | 34.2634 KOps/s | 31.7693 KOps/s | |
test_unbind_pytree | 0.1786ms | 28.0977μs | 35.5901 KOps/s | 34.6475 KOps/s | |
test_unbind_td | 0.7151ms | 35.1098μs | 28.4821 KOps/s | 28.7120 KOps/s | |
test_split_pytree | 0.1755ms | 28.9460μs | 34.5471 KOps/s | 33.2123 KOps/s | |
test_split_td | 0.8877ms | 38.0847μs | 26.2573 KOps/s | 25.4665 KOps/s | |
test_add_pytree | 0.1538ms | 34.8164μs | 28.7221 KOps/s | 30.0325 KOps/s | |
test_add_td | 79.6110μs | 46.8475μs | 21.3459 KOps/s | 23.5866 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.2685ms | 0.1203ms | 8.3104 KOps/s | 8.0704 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3056ms | 0.1223ms | 8.1799 KOps/s | 7.8523 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2432ms | 94.4999μs | 10.5820 KOps/s | 10.2227 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 1.4348ms | 0.1452ms | 6.8864 KOps/s | 6.8343 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.2020ms | 21.4206μs | 46.6840 KOps/s | 47.6607 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.2245ms | 26.6089μs | 37.5814 KOps/s | 37.4659 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2770ms | 64.2078μs | 15.5744 KOps/s | 15.4787 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1710ms | 49.0034μs | 20.4067 KOps/s | 20.0612 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3168ms | 0.1413ms | 7.0759 KOps/s | 6.9710 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3525ms | 0.2064ms | 4.8443 KOps/s | 4.7901 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2473ms | 97.1759μs | 10.2906 KOps/s | 9.8658 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.2440ms | 50.2648μs | 19.8946 KOps/s | 19.1750 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2438ms | 0.1358ms | 7.3635 KOps/s | 7.3379 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6616ms | 0.4670ms | 2.1415 KOps/s | 2.1339 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3932ms | 0.2477ms | 4.0367 KOps/s | 4.0143 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2719ms | 0.1426ms | 7.0150 KOps/s | 6.7532 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2095ms | 59.9878μs | 16.6701 KOps/s | 15.8181 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.2584ms | 98.6961μs | 10.1321 KOps/s | 9.6436 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5504ms | 0.3903ms | 2.5618 KOps/s | 2.5273 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2775ms | 0.1368ms | 7.3092 KOps/s | 7.1257 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1602ms | 18.4458μs | 54.2130 KOps/s | 56.5806 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 61.5510μs | 26.8222μs | 37.2826 KOps/s | 36.6051 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1650ms | 69.0668μs | 14.4787 KOps/s | 14.3989 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1081ms | 51.2620μs | 19.5076 KOps/s | 19.6826 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 1.6407ms | 0.3942ms | 2.5369 KOps/s | 2.2159 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.0534ms | 2.7025ms | 370.0276 Ops/s | 395.7843 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 1.5617ms | 0.4244ms | 2.3565 KOps/s | 2.2504 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 2.8924ms | 2.5940ms | 385.5024 Ops/s | 396.1485 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.2785ms | 0.1157ms | 8.6411 KOps/s | 8.8315 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5557ms | 77.1140μs | 12.9678 KOps/s | 12.5167 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.4889ms | 0.1040ms | 9.6194 KOps/s | 9.4478 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.2860ms | 66.4457μs | 15.0499 KOps/s | 14.6928 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.3123ms | 0.1093ms | 9.1463 KOps/s | 9.5114 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.2752ms | 69.3051μs | 14.4289 KOps/s | 14.6950 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.2717ms | 99.6963μs | 10.0305 KOps/s | 9.9854 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1825ms | 16.9182μs | 59.1080 KOps/s | 58.2897 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2420ms | 95.2271μs | 10.5012 KOps/s | 10.3487 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.1512ms | 15.8869μs | 62.9448 KOps/s | 63.6647 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.2438ms | 96.2703μs | 10.3874 KOps/s | 10.3670 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.1425ms | 15.7489μs | 63.4965 KOps/s | 63.8661 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2710ms | 99.8874μs | 10.0113 KOps/s | 9.9032 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.7024ms | 16.7772μs | 59.6047 KOps/s | 48.4301 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2768ms | 96.5712μs | 10.3551 KOps/s | 10.3191 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.1210ms | 15.5829μs | 64.1728 KOps/s | 63.9034 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.2685ms | 96.2300μs | 10.3918 KOps/s | 10.3471 KOps/s | |
test_compile_indexing[int-pytree-eager] | 48.1310μs | 15.5997μs | 64.1036 KOps/s | 63.9766 KOps/s | |
test_mod_add[eager] | 0.2004ms | 38.0707μs | 26.2669 KOps/s | 28.2361 KOps/s | |
test_mod_add[compile] | 0.4143ms | 78.4355μs | 12.7493 KOps/s | 12.5871 KOps/s | |
test_mod_add[compile-overhead] | 0.3233ms | 0.1710ms | 5.8475 KOps/s | 5.7406 KOps/s | |
test_mod_wrap[eager] | 0.4140ms | 0.2500ms | 3.9996 KOps/s | 3.9012 KOps/s | |
test_mod_wrap[compile] | 0.4340ms | 0.2835ms | 3.5270 KOps/s | 3.5040 KOps/s | |
test_mod_wrap[compile-overhead] | 7.1485ms | 3.7889ms | 263.9293 Ops/s | 267.1695 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.9299ms | 1.3442ms | 743.9109 Ops/s | 691.3393 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.4679ms | 1.2397ms | 806.6299 Ops/s | 735.8102 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3667ms | 0.9113ms | 1.0974 KOps/s | 901.8095 Ops/s | |
test_seq_add[eager] | 0.2913ms | 0.1092ms | 9.1615 KOps/s | 9.1745 KOps/s | |
test_seq_add[compile] | 0.2519ms | 87.3157μs | 11.4527 KOps/s | 11.2181 KOps/s | |
test_seq_add[compile-overhead] | 0.2829ms | 0.1346ms | 7.4290 KOps/s | 7.7623 KOps/s | |
test_seq_wrap[eager] | 0.6748ms | 0.4289ms | 2.3317 KOps/s | 2.4161 KOps/s | |
test_seq_wrap[compile] | 0.5200ms | 0.3085ms | 3.2419 KOps/s | 3.3234 KOps/s | |
test_seq_wrap[compile-overhead] | 0.4154ms | 0.2206ms | 4.5330 KOps/s | 4.4553 KOps/s | |
test_func_call_runtime[False-eager] | 0.9248ms | 0.7360ms | 1.3587 KOps/s | 1.3590 KOps/s | |
test_func_call_runtime[False-compile] | 0.9288ms | 0.7248ms | 1.3797 KOps/s | 1.3615 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5013ms | 0.3562ms | 2.8074 KOps/s | 2.7760 KOps/s | |
test_func_call_runtime[True-eager] | 1.0884ms | 0.8864ms | 1.1282 KOps/s | 1.1005 KOps/s | |
test_func_call_runtime[True-compile] | 0.9289ms | 0.7451ms | 1.3421 KOps/s | 1.3371 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.5215ms | 0.3786ms | 2.6413 KOps/s | 2.6343 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9210ms | 0.7507ms | 1.3320 KOps/s | 1.3579 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8819ms | 0.7301ms | 1.3696 KOps/s | 1.3645 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.5196ms | 0.3599ms | 2.7785 KOps/s | 2.7760 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1248ms | 0.9827ms | 1.0176 KOps/s | 992.5009 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9372ms | 0.7750ms | 1.2904 KOps/s | 1.2726 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.5408ms | 0.4030ms | 2.4816 KOps/s | 2.4609 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5373ms | 2.0588ms | 485.7312 Ops/s | 479.4608 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9641ms | 0.7877ms | 1.2696 KOps/s | 1.2580 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5312ms | 0.4059ms | 2.4634 KOps/s | 2.4427 KOps/s | |
test_distributed | 3.2999ms | 0.1820ms | 5.4934 KOps/s | 8.3948 KOps/s | |
test_tdmodule | 0.1049ms | 19.4738μs | 51.3511 KOps/s | 54.8887 KOps/s | |
test_tdmodule_dispatch | 0.1891ms | 34.5935μs | 28.9072 KOps/s | 30.7915 KOps/s | |
test_tdseq | 39.7610μs | 19.1804μs | 52.1366 KOps/s | 55.0504 KOps/s | |
test_tdseq_dispatch | 54.3110μs | 34.6858μs | 28.8302 KOps/s | 28.5696 KOps/s | |
test_instantiation_functorch | 1.6492ms | 1.5239ms | 656.1972 Ops/s | 652.6599 Ops/s | |
test_exec_functorch | 0.2810ms | 0.1450ms | 6.8961 KOps/s | 7.0842 KOps/s | |
test_exec_functional_call | 0.2574ms | 0.1374ms | 7.2805 KOps/s | 7.5342 KOps/s | |
test_exec_td_decorator | 0.4043ms | 0.1838ms | 5.4406 KOps/s | 5.6321 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8527ms | 0.6788ms | 1.4732 KOps/s | 1.4782 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8818ms | 0.6870ms | 1.4557 KOps/s | 1.4798 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7779ms | 0.6128ms | 1.6318 KOps/s | 1.6873 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7651ms | 0.6135ms | 1.6299 KOps/s | 1.6882 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.2285ms | 18.9907ms | 52.6573 Ops/s | 52.3764 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.6848ms | 19.0209ms | 52.5738 Ops/s | 52.2499 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.3425ms | 18.9093ms | 52.8840 Ops/s | 52.7720 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.6596ms | 18.8987ms | 52.9136 Ops/s | 52.5705 Ops/s | |
test_to_module_speed[True] | 1.0246ms | 0.9250ms | 1.0811 KOps/s | 1.0722 KOps/s | |
test_to_module_speed[False] | 1.2441ms | 0.9059ms | 1.1039 KOps/s | 1.0985 KOps/s | |
test_tc_init | 75.5010μs | 33.6730μs | 29.6974 KOps/s | 30.3461 KOps/s | |
test_tc_init_nested | 0.1158ms | 69.4435μs | 14.4002 KOps/s | 15.1854 KOps/s | |
test_tc_first_layer_tensor | 4.6714μs | 0.6993μs | 1.4300 MOps/s | 1.4346 MOps/s | |
test_tc_first_layer_nontensor | 27.9600μs | 2.3111μs | 432.6934 KOps/s | 432.7084 KOps/s | |
test_tc_second_layer_tensor | 9.9403μs | 1.4188μs | 704.8406 KOps/s | 702.4801 KOps/s | |
test_tc_second_layer_nontensor | 36.7110μs | 3.0023μs | 333.0762 KOps/s | 328.9032 KOps/s | |
test_unbind | 0.2389s | 9.9505ms | 100.4975 Ops/s | 152.0667 Ops/s | |
test_full_like | 10.2126ms | 9.6432ms | 103.7005 Ops/s | 102.7222 Ops/s | |
test_zeros_like | 4.9620ms | 4.4171ms | 226.3930 Ops/s | 230.0867 Ops/s | |
test_ones_like | 4.9986ms | 4.4412ms | 225.1625 Ops/s | 226.2782 Ops/s | |
test_clone | 7.4527ms | 6.9028ms | 144.8687 Ops/s | 144.7483 Ops/s | |
test_squeeze | 0.1896ms | 9.3956μs | 106.4323 KOps/s | 110.9623 KOps/s | |
test_unsqueeze | 0.1933ms | 70.7597μs | 14.1323 KOps/s | 14.4830 KOps/s | |
test_split | 0.3919ms | 0.1569ms | 6.3747 KOps/s | 6.5004 KOps/s | |
test_permute | 0.3265ms | 0.1737ms | 5.7578 KOps/s | 5.8015 KOps/s | |
test_stack | 52.1096ms | 51.6567ms | 19.3586 Ops/s | 19.1713 Ops/s | |
test_cat | 52.4260ms | 51.4571ms | 19.4337 Ops/s | 19.2628 Ops/s |
I found a fix for the first broken case (plain TensorDict). I'll push a PR to PyTorch by tomorrow--just need to figure out how to write a test that doesn't depend on tensordict. There are already a few tests in PyTorch that I can base it on |
…vious ghstack-source-id: 089f6d745257b142b28e1005dc9adf82ed3b394b Pull Request resolved: #1100
…vious ghstack-source-id: b716dab9a20137b68587f5b3b08fa735b43d6aec Pull Request resolved: #1100
…vious ghstack-source-id: 81cec096a6a7921b21521d696eb216ca0443a3a9 Pull Request resolved: #1100
…vious ghstack-source-id: 81cec096a6a7921b21521d696eb216ca0443a3a9 Pull Request resolved: #1100
…vious ghstack-source-id: 87e1ae8af75ae3833c1e984dbbf9f69c1831ad1c Pull Request resolved: #1100
…vious ghstack-source-id: bd701ecfaf68605801a215d3cd9d49268b888bb3 Pull Request resolved: #1100
…vious ghstack-source-id: bd701ecfaf68605801a215d3cd9d49268b888bb3 Pull Request resolved: #1100
Stack from ghstack (oldest at bottom):