Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix error for incongruent devices in load_state_dict #529

Merged
merged 1 commit into from
Sep 14, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 14, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 14, 2023
@vmoens vmoens added the bug Something isn't working label Sep 14, 2023
@vmoens vmoens marked this pull request as ready for review September 14, 2023 19:53
@vmoens vmoens merged commit a721766 into main Sep 14, 2023
@vmoens vmoens deleted the fix_error_device_sd branch September 14, 2023 19:56
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 109. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.0020μs 23.0495μs 43.3848 KOps/s 44.1458 KOps/s $\color{#d91a1a}-1.72\%$
test_plain_set_stack_nested 0.2586ms 0.2157ms 4.6370 KOps/s 4.7528 KOps/s $\color{#d91a1a}-2.44\%$
test_plain_set_nested_inplace 0.1544ms 26.9052μs 37.1675 KOps/s 37.6965 KOps/s $\color{#d91a1a}-1.40\%$
test_plain_set_stack_nested_inplace 0.2908ms 0.2523ms 3.9643 KOps/s 3.9860 KOps/s $\color{#d91a1a}-0.54\%$
test_items 0.1660ms 4.1970μs 238.2638 KOps/s 238.0113 KOps/s $\color{#35bf28}+0.11\%$
test_items_nested 2.8173ms 0.4183ms 2.3905 KOps/s 2.4149 KOps/s $\color{#d91a1a}-1.01\%$
test_items_nested_locked 0.4650ms 0.4152ms 2.4083 KOps/s 2.4071 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested_leaf 0.3446ms 0.2525ms 3.9597 KOps/s 3.9605 KOps/s $\color{#d91a1a}-0.02\%$
test_items_stack_nested 2.3943ms 2.3029ms 434.2405 Ops/s 429.4598 Ops/s $\color{#35bf28}+1.11\%$
test_items_stack_nested_leaf 2.1759ms 2.1013ms 475.9000 Ops/s 471.2449 Ops/s $\color{#35bf28}+0.99\%$
test_items_stack_nested_locked 1.4312ms 1.1249ms 888.9525 Ops/s 874.8152 Ops/s $\color{#35bf28}+1.62\%$
test_keys 0.2180ms 6.0780μs 164.5279 KOps/s 163.6806 KOps/s $\color{#35bf28}+0.52\%$
test_keys_nested 2.0612ms 0.2147ms 4.6584 KOps/s 4.6612 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested_locked 0.2679ms 0.2111ms 4.7361 KOps/s 4.6820 KOps/s $\color{#35bf28}+1.15\%$
test_keys_nested_leaf 0.8426ms 0.2049ms 4.8802 KOps/s 4.5021 KOps/s $\textbf{\color{#35bf28}+8.40\%}$
test_keys_stack_nested 2.2307ms 2.1140ms 473.0405 Ops/s 465.3794 Ops/s $\color{#35bf28}+1.65\%$
test_keys_stack_nested_leaf 2.2060ms 2.1138ms 473.0777 Ops/s 465.3413 Ops/s $\color{#35bf28}+1.66\%$
test_keys_stack_nested_locked 1.0115ms 0.9297ms 1.0757 KOps/s 1.0394 KOps/s $\color{#35bf28}+3.49\%$
test_values 62.2030μs 1.8926μs 528.3758 KOps/s 509.5081 KOps/s $\color{#35bf28}+3.70\%$
test_values_nested 0.1487ms 72.8776μs 13.7216 KOps/s 13.5503 KOps/s $\color{#35bf28}+1.26\%$
test_values_nested_locked 0.1061ms 73.1970μs 13.6618 KOps/s 13.4973 KOps/s $\color{#35bf28}+1.22\%$
test_values_nested_leaf 0.3323ms 65.7071μs 15.2191 KOps/s 15.1201 KOps/s $\color{#35bf28}+0.65\%$
test_values_stack_nested 1.9184ms 1.8475ms 541.2696 Ops/s 531.8885 Ops/s $\color{#35bf28}+1.76\%$
test_values_stack_nested_leaf 1.9084ms 1.8343ms 545.1580 Ops/s 536.7215 Ops/s $\color{#35bf28}+1.57\%$
test_values_stack_nested_locked 0.8182ms 0.7408ms 1.3499 KOps/s 1.3254 KOps/s $\color{#35bf28}+1.85\%$
test_membership 30.0020μs 2.1622μs 462.5012 KOps/s 461.7507 KOps/s $\color{#35bf28}+0.16\%$
test_membership_nested 52.6030μs 4.0861μs 244.7336 KOps/s 239.8698 KOps/s $\color{#35bf28}+2.03\%$
test_membership_nested_leaf 35.8020μs 4.0964μs 244.1192 KOps/s 238.0152 KOps/s $\color{#35bf28}+2.56\%$
test_membership_stacked_nested 50.3030μs 16.9858μs 58.8728 KOps/s 58.8421 KOps/s $\color{#35bf28}+0.05\%$
test_membership_stacked_nested_leaf 47.3030μs 16.9547μs 58.9807 KOps/s 58.9254 KOps/s $\color{#35bf28}+0.09\%$
test_membership_nested_last 39.0020μs 8.7773μs 113.9302 KOps/s 114.5600 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested_leaf_last 31.3010μs 8.6827μs 115.1720 KOps/s 115.0287 KOps/s $\color{#35bf28}+0.12\%$
test_membership_stacked_nested_last 0.2988ms 0.2602ms 3.8427 KOps/s 3.7799 KOps/s $\color{#35bf28}+1.66\%$
test_membership_stacked_nested_leaf_last 50.5020μs 19.8654μs 50.3387 KOps/s 50.3448 KOps/s $\color{#d91a1a}-0.01\%$
test_nested_getleaf 85.4050μs 17.8406μs 56.0520 KOps/s 55.2435 KOps/s $\color{#35bf28}+1.46\%$
test_nested_get 46.8030μs 16.9117μs 59.1307 KOps/s 57.7132 KOps/s $\color{#35bf28}+2.46\%$
test_stacked_getleaf 1.1671ms 1.0179ms 982.4039 Ops/s 973.1798 Ops/s $\color{#35bf28}+0.95\%$
test_stacked_get 1.0481ms 0.9745ms 1.0262 KOps/s 1.0131 KOps/s $\color{#35bf28}+1.29\%$
test_nested_getitemleaf 47.4030μs 17.9197μs 55.8044 KOps/s 54.7831 KOps/s $\color{#35bf28}+1.86\%$
test_nested_getitem 45.7030μs 16.9894μs 58.8602 KOps/s 58.0504 KOps/s $\color{#35bf28}+1.40\%$
test_stacked_getitemleaf 1.1625ms 1.0172ms 983.0774 Ops/s 976.3913 Ops/s $\color{#35bf28}+0.68\%$
test_stacked_getitem 1.0615ms 0.9748ms 1.0259 KOps/s 1.0252 KOps/s $\color{#35bf28}+0.06\%$
test_lock_nested 80.0257ms 1.7488ms 571.8313 Ops/s 599.0030 Ops/s $\color{#d91a1a}-4.54\%$
test_lock_stack_nested 0.1057s 23.2786ms 42.9579 Ops/s 46.4561 Ops/s $\textbf{\color{#d91a1a}-7.53\%}$
test_unlock_nested 2.1150ms 1.6829ms 594.1997 Ops/s 565.2347 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_unlock_stack_nested 0.1039s 22.0838ms 45.2820 Ops/s 45.3297 Ops/s $\color{#d91a1a}-0.11\%$
test_flatten_speed 1.2555ms 1.1696ms 854.9869 Ops/s 858.8428 Ops/s $\color{#d91a1a}-0.45\%$
test_unflatten_speed 3.0445ms 2.1088ms 474.2030 Ops/s 476.2028 Ops/s $\color{#d91a1a}-0.42\%$
test_common_ops 4.4640ms 1.2692ms 787.8883 Ops/s 787.3901 Ops/s $\color{#35bf28}+0.06\%$
test_creation 1.0508ms 7.4399μs 134.4111 KOps/s 137.1781 KOps/s $\color{#d91a1a}-2.02\%$
test_creation_empty 59.4040μs 15.8529μs 63.0800 KOps/s 63.7438 KOps/s $\color{#d91a1a}-1.04\%$
test_creation_nested_1 64.9030μs 28.7479μs 34.7851 KOps/s 35.6660 KOps/s $\color{#d91a1a}-2.47\%$
test_creation_nested_2 61.0030μs 31.1796μs 32.0723 KOps/s 32.4375 KOps/s $\color{#d91a1a}-1.13\%$
test_clone 0.1930ms 27.9994μs 35.7150 KOps/s 34.7910 KOps/s $\color{#35bf28}+2.66\%$
test_getitem[int] 55.0030μs 31.4482μs 31.7983 KOps/s 31.1185 KOps/s $\color{#35bf28}+2.18\%$
test_getitem[slice_int] 0.1298ms 62.6476μs 15.9623 KOps/s 16.0851 KOps/s $\color{#d91a1a}-0.76\%$
test_getitem[range] 0.1321ms 94.6155μs 10.5691 KOps/s 10.6270 KOps/s $\color{#d91a1a}-0.54\%$
test_getitem[tuple] 0.1003ms 52.0700μs 19.2049 KOps/s 19.4854 KOps/s $\color{#d91a1a}-1.44\%$
test_getitem[list] 0.3940ms 89.2232μs 11.2079 KOps/s 11.1947 KOps/s $\color{#35bf28}+0.12\%$
test_setitem_dim[int] 81.3040μs 37.7497μs 26.4903 KOps/s 26.2864 KOps/s $\color{#35bf28}+0.78\%$
test_setitem_dim[slice_int] 0.1089ms 66.9583μs 14.9347 KOps/s 14.6403 KOps/s $\color{#35bf28}+2.01\%$
test_setitem_dim[range] 0.1204ms 90.8993μs 11.0012 KOps/s 10.8070 KOps/s $\color{#35bf28}+1.80\%$
test_setitem_dim[tuple] 88.5040μs 55.0862μs 18.1534 KOps/s 17.9007 KOps/s $\color{#35bf28}+1.41\%$
test_setitem 0.1812ms 37.2510μs 26.8449 KOps/s 26.8327 KOps/s $\color{#35bf28}+0.05\%$
test_set 0.1992ms 35.5712μs 28.1126 KOps/s 27.8132 KOps/s $\color{#35bf28}+1.08\%$
test_set_shared 4.1377ms 0.2234ms 4.4767 KOps/s 4.8119 KOps/s $\textbf{\color{#d91a1a}-6.97\%}$
test_update 0.2271ms 40.0433μs 24.9730 KOps/s 24.6456 KOps/s $\color{#35bf28}+1.33\%$
test_update_nested 0.2576ms 61.0419μs 16.3822 KOps/s 16.8677 KOps/s $\color{#d91a1a}-2.88\%$
test_set_nested 0.4589ms 39.3370μs 25.4214 KOps/s 25.4828 KOps/s $\color{#d91a1a}-0.24\%$
test_set_nested_new 0.2094ms 60.9467μs 16.4078 KOps/s 16.3356 KOps/s $\color{#35bf28}+0.44\%$
test_select 0.3102ms 0.1134ms 8.8204 KOps/s 8.8583 KOps/s $\color{#d91a1a}-0.43\%$
test_unbind_speed 0.8877ms 0.7597ms 1.3163 KOps/s 1.3169 KOps/s $\color{#d91a1a}-0.05\%$
test_unbind_speed_stack0 89.9474ms 10.1297ms 98.7197 Ops/s 94.4127 Ops/s $\color{#35bf28}+4.56\%$
test_unbind_speed_stack1 65.1040μs 1.3258μs 754.2401 KOps/s 742.8277 KOps/s $\color{#35bf28}+1.54\%$
test_creation[device0] 0.6542ms 0.5329ms 1.8766 KOps/s 1.8998 KOps/s $\color{#d91a1a}-1.22\%$
test_creation_from_tensor 3.2590ms 0.5982ms 1.6716 KOps/s 1.6948 KOps/s $\color{#d91a1a}-1.37\%$
test_add_one[memmap_tensor0] 2.0440ms 37.4521μs 26.7007 KOps/s 26.7348 KOps/s $\color{#d91a1a}-0.13\%$
test_contiguous[memmap_tensor0] 38.6020μs 10.0739μs 99.2666 KOps/s 98.5086 KOps/s $\color{#35bf28}+0.77\%$
test_stack[memmap_tensor0] 0.1084ms 30.8451μs 32.4201 KOps/s 32.1379 KOps/s $\color{#35bf28}+0.88\%$
test_memmaptd_index 0.3957ms 0.3557ms 2.8114 KOps/s 2.7885 KOps/s $\color{#35bf28}+0.82\%$
test_memmaptd_index_astensor 1.6554ms 1.5387ms 649.9028 Ops/s 628.8021 Ops/s $\color{#35bf28}+3.36\%$
test_memmaptd_index_op 3.1664ms 3.0770ms 324.9870 Ops/s 325.1571 Ops/s $\color{#d91a1a}-0.05\%$
test_reshape_pytree 0.1152ms 43.8971μs 22.7805 KOps/s 23.3772 KOps/s $\color{#d91a1a}-2.55\%$
test_reshape_td 95.8050μs 53.2095μs 18.7936 KOps/s 19.0766 KOps/s $\color{#d91a1a}-1.48\%$
test_view_pytree 0.1136ms 40.6615μs 24.5933 KOps/s 24.8859 KOps/s $\color{#d91a1a}-1.18\%$
test_view_td 44.3020μs 10.3083μs 97.0094 KOps/s 94.8785 KOps/s $\color{#35bf28}+2.25\%$
test_unbind_pytree 90.5050μs 43.9358μs 22.7605 KOps/s 22.3752 KOps/s $\color{#35bf28}+1.72\%$
test_unbind_td 0.2120ms 0.1119ms 8.9353 KOps/s 8.6993 KOps/s $\color{#35bf28}+2.71\%$
test_split_pytree 0.1032ms 51.1334μs 19.5567 KOps/s 19.6074 KOps/s $\color{#d91a1a}-0.26\%$
test_split_td 0.8744ms 0.1327ms 7.5362 KOps/s 7.4375 KOps/s $\color{#35bf28}+1.33\%$
test_add_pytree 0.1022ms 53.6324μs 18.6455 KOps/s 18.6907 KOps/s $\color{#d91a1a}-0.24\%$
test_add_td 0.2226ms 88.3086μs 11.3239 KOps/s 11.7482 KOps/s $\color{#d91a1a}-3.61\%$
test_distributed 30.6020μs 10.7842μs 92.7282 KOps/s 91.9091 KOps/s $\color{#35bf28}+0.89\%$
test_tdmodule 0.2437ms 34.2725μs 29.1779 KOps/s 29.1944 KOps/s $\color{#d91a1a}-0.06\%$
test_tdmodule_dispatch 0.3220ms 66.1936μs 15.1072 KOps/s 15.4186 KOps/s $\color{#d91a1a}-2.02\%$
test_tdseq 0.6258ms 36.9557μs 27.0594 KOps/s 26.3952 KOps/s $\color{#35bf28}+2.52\%$
test_tdseq_dispatch 0.2512ms 79.3622μs 12.6005 KOps/s 12.7412 KOps/s $\color{#d91a1a}-1.10\%$
test_instantiation_functorch 2.0355ms 1.8941ms 527.9662 Ops/s 519.6814 Ops/s $\color{#35bf28}+1.59\%$
test_instantiation_td 2.3724ms 1.5814ms 632.3625 Ops/s 628.4234 Ops/s $\color{#35bf28}+0.63\%$
test_exec_functorch 0.2872ms 0.2204ms 4.5367 KOps/s 4.5408 KOps/s $\color{#d91a1a}-0.09\%$
test_exec_td 0.2568ms 0.2060ms 4.8554 KOps/s 4.7973 KOps/s $\color{#35bf28}+1.21\%$
test_vmap_mlp_speed[True-True] 7.7727ms 1.4112ms 708.6087 Ops/s 666.8419 Ops/s $\textbf{\color{#35bf28}+6.26\%}$
test_vmap_mlp_speed[True-False] 3.5301ms 0.7291ms 1.3716 KOps/s 1.3779 KOps/s $\color{#d91a1a}-0.46\%$
test_vmap_mlp_speed[False-True] 7.1178ms 1.1734ms 852.2384 Ops/s 840.3489 Ops/s $\color{#35bf28}+1.41\%$
test_vmap_mlp_speed[False-False] 3.6553ms 0.5406ms 1.8497 KOps/s 1.8232 KOps/s $\color{#35bf28}+1.46\%$
test_vmap_transformer_speed[True-True] 22.7771ms 16.6291ms 60.1357 Ops/s 58.0827 Ops/s $\color{#35bf28}+3.53\%$
test_vmap_transformer_speed[True-False] 17.2549ms 10.6403ms 93.9821 Ops/s 94.4267 Ops/s $\color{#d91a1a}-0.47\%$
test_vmap_transformer_speed[False-True] 21.2921ms 16.0851ms 62.1694 Ops/s 62.6164 Ops/s $\color{#d91a1a}-0.71\%$
test_vmap_transformer_speed[False-False] 17.6441ms 10.6320ms 94.0556 Ops/s 95.4953 Ops/s $\color{#d91a1a}-1.51\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants