-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Pass type directly during reduction #1223
Open
vmoens
wants to merge
4
commits into
gh/vmoens/47/base
Choose a base branch
from
gh/vmoens/47/head
base: gh/vmoens/47/base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmoens
added a commit
that referenced
this pull request
Feb 19, 2025
ghstack-source-id: 6e10ea39a5e74e66f052f06d7709044e32ae01dd Pull Request resolved: #1223
vmoens
added a commit
that referenced
this pull request
Feb 19, 2025
ghstack-source-id: 2a0f011758991f07958b2b1742d3d2136b6e9fb8 Pull Request resolved: #1223
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 39.8850μs | 20.5048μs | 48.7690 KOps/s | 47.7345 KOps/s | |
test_plain_set_stack_nested | 47.9800μs | 20.4873μs | 48.8106 KOps/s | 47.5427 KOps/s | |
test_plain_set_nested_inplace | 60.6610μs | 22.4488μs | 44.5459 KOps/s | 44.1894 KOps/s | |
test_plain_set_stack_nested_inplace | 77.0040μs | 22.2510μs | 44.9419 KOps/s | 43.0229 KOps/s | |
test_items | 18.6850μs | 4.1943μs | 238.4177 KOps/s | 234.6094 KOps/s | |
test_items_nested | 0.5353ms | 0.4044ms | 2.4731 KOps/s | 2.4532 KOps/s | |
test_items_nested_locked | 0.5807ms | 0.4052ms | 2.4678 KOps/s | 2.4770 KOps/s | |
test_items_nested_leaf | 0.1337ms | 75.3616μs | 13.2694 KOps/s | 12.8418 KOps/s | |
test_items_stack_nested | 0.7252ms | 0.4078ms | 2.4521 KOps/s | 2.4292 KOps/s | |
test_items_stack_nested_leaf | 0.1379ms | 78.8650μs | 12.6799 KOps/s | 12.8380 KOps/s | |
test_items_stack_nested_locked | 0.7015ms | 0.4085ms | 2.4478 KOps/s | 2.4416 KOps/s | |
test_keys | 15.4890μs | 4.1186μs | 242.8023 KOps/s | 286.1688 KOps/s | |
test_keys_nested | 0.2660ms | 0.1618ms | 6.1810 KOps/s | 6.1381 KOps/s | |
test_keys_nested_locked | 1.6958ms | 0.1686ms | 5.9315 KOps/s | 5.9164 KOps/s | |
test_keys_nested_leaf | 0.2366ms | 0.1418ms | 7.0521 KOps/s | 7.0685 KOps/s | |
test_keys_stack_nested | 0.2650ms | 0.1637ms | 6.1082 KOps/s | 6.0949 KOps/s | |
test_keys_stack_nested_leaf | 0.2606ms | 0.1428ms | 7.0022 KOps/s | 6.9933 KOps/s | |
test_keys_stack_nested_locked | 0.2433ms | 0.1697ms | 5.8940 KOps/s | 5.8850 KOps/s | |
test_values | 4.9912μs | 1.0635μs | 940.2531 KOps/s | 640.5968 KOps/s | |
test_values_nested | 0.1083ms | 61.6920μs | 16.2096 KOps/s | 16.3225 KOps/s | |
test_values_nested_locked | 0.1114ms | 61.3355μs | 16.3038 KOps/s | 16.2468 KOps/s | |
test_values_nested_leaf | 0.1375ms | 70.5952μs | 14.1653 KOps/s | 14.2925 KOps/s | |
test_values_stack_nested | 0.1193ms | 62.4272μs | 16.0187 KOps/s | 16.3552 KOps/s | |
test_values_stack_nested_leaf | 0.1245ms | 70.0448μs | 14.2766 KOps/s | 13.6116 KOps/s | |
test_values_stack_nested_locked | 0.1121ms | 62.4228μs | 16.0198 KOps/s | 16.3534 KOps/s | |
test_membership | 22.4620μs | 0.9017μs | 1.1090 MOps/s | 1.1534 MOps/s | |
test_membership_nested | 27.9020μs | 2.9280μs | 341.5293 KOps/s | 345.2147 KOps/s | |
test_membership_nested_leaf | 41.8480μs | 2.9282μs | 341.5123 KOps/s | 346.5580 KOps/s | |
test_membership_stacked_nested | 17.4430μs | 2.9595μs | 337.8905 KOps/s | 345.8242 KOps/s | |
test_membership_stacked_nested_leaf | 22.8030μs | 2.9589μs | 337.9690 KOps/s | 344.3473 KOps/s | |
test_membership_nested_last | 49.1820μs | 4.3672μs | 228.9773 KOps/s | 232.2639 KOps/s | |
test_membership_nested_leaf_last | 46.5870μs | 4.3846μs | 228.0731 KOps/s | 233.1887 KOps/s | |
test_membership_stacked_nested_last | 24.8560μs | 4.3764μs | 228.4992 KOps/s | 228.2309 KOps/s | |
test_membership_stacked_nested_leaf_last | 30.8280μs | 4.4269μs | 225.8896 KOps/s | 232.3913 KOps/s | |
test_nested_getleaf | 47.8800μs | 10.6021μs | 94.3209 KOps/s | 93.8512 KOps/s | |
test_nested_get | 46.6070μs | 10.1372μs | 98.6466 KOps/s | 100.5741 KOps/s | |
test_stacked_getleaf | 40.2460μs | 10.5398μs | 94.8785 KOps/s | 95.1630 KOps/s | |
test_stacked_get | 52.9990μs | 9.9516μs | 100.4863 KOps/s | 100.1030 KOps/s | |
test_nested_getitemleaf | 51.8370μs | 11.1267μs | 89.8743 KOps/s | 91.0583 KOps/s | |
test_nested_getitem | 0.1469ms | 10.7018μs | 93.4423 KOps/s | 95.0640 KOps/s | |
test_stacked_getitemleaf | 0.3815ms | 11.3572μs | 88.0502 KOps/s | 90.8626 KOps/s | |
test_stacked_getitem | 52.4380μs | 10.6809μs | 93.6251 KOps/s | 95.0921 KOps/s | |
test_lock_nested | 0.6633ms | 0.4047ms | 2.4707 KOps/s | 2.4389 KOps/s | |
test_lock_stack_nested | 0.9342ms | 0.4141ms | 2.4152 KOps/s | 2.3462 KOps/s | |
test_unlock_nested | 0.5425ms | 0.3274ms | 3.0548 KOps/s | 2.9764 KOps/s | |
test_unlock_stack_nested | 0.5196ms | 0.3339ms | 2.9951 KOps/s | 2.9095 KOps/s | |
test_flatten_speed | 0.1923ms | 98.5486μs | 10.1473 KOps/s | 9.8732 KOps/s | |
test_unflatten_speed | 0.8536ms | 0.5190ms | 1.9267 KOps/s | 1.9408 KOps/s | |
test_common_ops | 4.8152ms | 0.7831ms | 1.2771 KOps/s | 1.2657 KOps/s | |
test_creation | 27.7920μs | 2.5141μs | 397.7565 KOps/s | 394.4056 KOps/s | |
test_creation_empty | 52.6250μs | 11.8516μs | 84.3765 KOps/s | 79.2883 KOps/s | |
test_creation_nested_1 | 42.4490μs | 14.5569μs | 68.6959 KOps/s | 64.8173 KOps/s | |
test_creation_nested_2 | 43.4310μs | 19.0819μs | 52.4057 KOps/s | 46.4253 KOps/s | |
test_clone | 70.7920μs | 13.3816μs | 74.7293 KOps/s | 75.0554 KOps/s | |
test_getitem[int] | 0.8463ms | 12.7611μs | 78.3629 KOps/s | 78.3668 KOps/s | |
test_getitem[slice_int] | 0.1275ms | 24.1345μs | 41.4345 KOps/s | 42.1752 KOps/s | |
test_getitem[range] | 0.1664ms | 50.5470μs | 19.7836 KOps/s | 20.1707 KOps/s | |
test_getitem[tuple] | 0.1612ms | 20.4325μs | 48.9416 KOps/s | 48.6161 KOps/s | |
test_getitem[list] | 0.1801ms | 45.0474μs | 22.1989 KOps/s | 22.0837 KOps/s | |
test_setitem_dim[int] | 56.9760μs | 24.9631μs | 40.0591 KOps/s | 39.5264 KOps/s | |
test_setitem_dim[slice_int] | 0.1022ms | 50.0825μs | 19.9671 KOps/s | 19.8976 KOps/s | |
test_setitem_dim[range] | 0.1275ms | 75.6612μs | 13.2168 KOps/s | 13.1740 KOps/s | |
test_setitem_dim[tuple] | 71.2020μs | 39.6721μs | 25.2066 KOps/s | 24.5678 KOps/s | |
test_setitem | 78.5560μs | 20.1849μs | 49.5419 KOps/s | 47.1589 KOps/s | |
test_set | 68.1780μs | 19.5965μs | 51.0295 KOps/s | 48.6505 KOps/s | |
test_set_shared | 4.2196ms | 0.1792ms | 5.5808 KOps/s | 5.5045 KOps/s | |
test_update | 0.1152ms | 22.8413μs | 43.7803 KOps/s | 41.1430 KOps/s | |
test_update_nested | 84.0970μs | 33.2093μs | 30.1121 KOps/s | 28.3662 KOps/s | |
test_update__nested | 0.5037ms | 33.7272μs | 29.6496 KOps/s | 30.3214 KOps/s | |
test_set_nested | 77.7120μs | 21.5919μs | 46.3137 KOps/s | 40.4833 KOps/s | |
test_set_nested_new | 78.9780μs | 26.3499μs | 37.9508 KOps/s | 36.8924 KOps/s | |
test_select | 98.7050μs | 42.7016μs | 23.4183 KOps/s | 23.6909 KOps/s | |
test_select_nested | 0.1261ms | 63.3203μs | 15.7927 KOps/s | 16.0330 KOps/s | |
test_exclude_nested | 0.1375ms | 81.5693μs | 12.2595 KOps/s | 12.5335 KOps/s | |
test_empty[True] | 0.7392ms | 0.4123ms | 2.4256 KOps/s | 2.5035 KOps/s | |
test_empty[False] | 11.6540μs | 1.3610μs | 734.7709 KOps/s | 728.5810 KOps/s | |
test_unbind_speed | 0.3539ms | 0.2698ms | 3.7068 KOps/s | 3.6723 KOps/s | |
test_unbind_speed_stack0 | 0.5228ms | 0.2671ms | 3.7437 KOps/s | 3.6828 KOps/s | |
test_unbind_speed_stack1 | 0.1011s | 0.7221ms | 1.3849 KOps/s | 1.2197 KOps/s | |
test_split | 0.1006s | 1.7536ms | 570.2457 Ops/s | 574.4426 Ops/s | |
test_chunk | 0.1021s | 1.7507ms | 571.1974 Ops/s | 631.7694 Ops/s | |
test_consolidate_njt[False-None] | 8.4555ms | 8.2012ms | 121.9327 Ops/s | 109.4120 Ops/s | |
test_creation[device0] | 0.2223ms | 91.4560μs | 10.9342 KOps/s | 10.7784 KOps/s | |
test_creation_from_tensor | 3.6087ms | 94.8949μs | 10.5380 KOps/s | 10.3299 KOps/s | |
test_add_one[memmap_tensor0] | 0.1132ms | 5.0658μs | 197.4009 KOps/s | 188.1869 KOps/s | |
test_contiguous[memmap_tensor0] | 10.8500μs | 0.5149μs | 1.9423 MOps/s | 1.9459 MOps/s | |
test_stack[memmap_tensor0] | 25.6570μs | 3.4219μs | 292.2347 KOps/s | 284.6706 KOps/s | |
test_memmaptd_index | 0.9541ms | 0.2273ms | 4.3993 KOps/s | 4.3199 KOps/s | |
test_memmaptd_index_astensor | 0.4549ms | 0.3123ms | 3.2017 KOps/s | 3.1796 KOps/s | |
test_memmaptd_index_op | 1.3660ms | 0.5825ms | 1.7167 KOps/s | 1.6608 KOps/s | |
test_serialize_model | 0.1287s | 0.1155s | 8.6610 Ops/s | 8.6532 Ops/s | |
test_serialize_model_pickle | 0.4465s | 0.3913s | 2.5558 Ops/s | 2.5180 Ops/s | |
test_serialize_weights | 0.1248s | 0.1146s | 8.7260 Ops/s | 8.8421 Ops/s | |
test_serialize_weights_returnearly | 0.1745s | 0.1590s | 6.2882 Ops/s | 6.1184 Ops/s | |
test_serialize_weights_pickle | 0.5389s | 0.4752s | 2.1045 Ops/s | 1.2207 Ops/s | |
test_serialize_weights_filesystem | 0.2393s | 0.1545s | 6.4711 Ops/s | 7.0373 Ops/s | |
test_serialize_model_filesystem | 0.1582s | 0.1477s | 6.7715 Ops/s | 6.8512 Ops/s | |
test_reshape_pytree | 0.2958ms | 27.8503μs | 35.9062 KOps/s | 38.5249 KOps/s | |
test_reshape_td | 76.2120μs | 32.4102μs | 30.8545 KOps/s | 31.1467 KOps/s | |
test_view_pytree | 78.0440μs | 25.4872μs | 39.2354 KOps/s | 38.3536 KOps/s | |
test_view_td | 90.7100μs | 40.1389μs | 24.9135 KOps/s | 24.7819 KOps/s | |
test_unbind_pytree | 66.3540μs | 29.2150μs | 34.2289 KOps/s | 34.0076 KOps/s | |
test_unbind_td | 0.3017ms | 39.5112μs | 25.3093 KOps/s | 24.0675 KOps/s | |
test_split_pytree | 86.9130μs | 28.7654μs | 34.7639 KOps/s | 32.9528 KOps/s | |
test_split_td | 0.1969ms | 45.1837μs | 22.1319 KOps/s | 22.6602 KOps/s | |
test_add_pytree | 77.6150μs | 35.4837μs | 28.1820 KOps/s | 28.0182 KOps/s | |
test_add_td | 0.2189ms | 57.8149μs | 17.2966 KOps/s | 17.5886 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1639ms | 67.1738μs | 14.8867 KOps/s | 15.1355 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3321ms | 0.1695ms | 5.9011 KOps/s | 5.8407 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1430ms | 46.0206μs | 21.7294 KOps/s | 21.6704 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2602ms | 0.1179ms | 8.4784 KOps/s | 8.4505 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 83.0350μs | 28.2003μs | 35.4606 KOps/s | 35.8259 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1128ms | 57.5370μs | 17.3801 KOps/s | 17.2363 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1606ms | 80.2090μs | 12.4674 KOps/s | 12.3672 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1259ms | 65.5538μs | 15.2546 KOps/s | 14.9079 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1786ms | 0.1077ms | 9.2831 KOps/s | 9.3678 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.6671ms | 0.2184ms | 4.5789 KOps/s | 4.4129 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1100ms | 47.6021μs | 21.0075 KOps/s | 21.7331 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.7277ms | 68.6080μs | 14.5756 KOps/s | 14.6421 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.1782ms | 0.1008ms | 9.9186 KOps/s | 9.9629 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 1.1920ms | 0.2062ms | 4.8485 KOps/s | 4.9660 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.6111ms | 0.2357ms | 4.2430 KOps/s | 4.3106 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.1950ms | 0.1079ms | 9.2693 KOps/s | 9.1988 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.6001ms | 62.2908μs | 16.0537 KOps/s | 16.1418 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1009ms | 48.8834μs | 20.4568 KOps/s | 20.3219 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.2406ms | 0.1577ms | 6.3418 KOps/s | 6.3404 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2623ms | 0.1018ms | 9.8217 KOps/s | 9.9723 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 53.8110μs | 21.5170μs | 46.4749 KOps/s | 46.2204 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1302ms | 66.5840μs | 15.0186 KOps/s | 15.0179 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.5049ms | 86.1777μs | 11.6039 KOps/s | 12.1669 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1398ms | 67.6328μs | 14.7857 KOps/s | 14.2558 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.7922ms | 0.2143ms | 4.6668 KOps/s | 4.6176 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 1.8737ms | 1.3583ms | 736.2081 Ops/s | 727.1543 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.3062ms | 0.2119ms | 4.7191 KOps/s | 4.7600 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 0.8908ms | 0.8135ms | 1.2293 KOps/s | 1.2231 KOps/s | |
test_compile_assign_and_add_stack[compile] | 0.6383ms | 0.4578ms | 2.1842 KOps/s | 2.2034 KOps/s | |
test_compile_assign_and_add_stack[eager] | 2.9344ms | 2.7140ms | 368.4594 Ops/s | 356.3522 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1051ms | 38.3125μs | 26.1012 KOps/s | 26.8075 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5736ms | 33.5731μs | 29.7858 KOps/s | 29.2009 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 79.5690μs | 32.2906μs | 30.9688 KOps/s | 32.6874 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.3034ms | 23.2104μs | 43.0840 KOps/s | 43.8223 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1009ms | 31.9755μs | 31.2739 KOps/s | 32.0431 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 84.1670μs | 23.7056μs | 42.1842 KOps/s | 43.6599 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1304ms | 54.5326μs | 18.3377 KOps/s | 19.5877 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.3654ms | 20.2705μs | 49.3327 KOps/s | 49.6106 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1086ms | 46.8924μs | 21.3254 KOps/s | 22.4418 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 60.4520μs | 18.6948μs | 53.4909 KOps/s | 54.4442 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1089ms | 47.5922μs | 21.0118 KOps/s | 22.0481 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 59.8310μs | 18.7344μs | 53.3779 KOps/s | 53.7702 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1398ms | 54.9686μs | 18.1922 KOps/s | 18.4957 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.8875ms | 20.1624μs | 49.5972 KOps/s | 50.2575 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1080ms | 47.5647μs | 21.0240 KOps/s | 22.0219 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 54.1210μs | 18.6478μs | 53.6256 KOps/s | 54.2379 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.5884ms | 47.3604μs | 21.1147 KOps/s | 21.3672 KOps/s | |
test_compile_indexing[int-pytree-eager] | 74.6190μs | 18.7988μs | 53.1949 KOps/s | 54.7519 KOps/s | |
test_mod_add[eager] | 0.1045ms | 34.6082μs | 28.8949 KOps/s | 29.1898 KOps/s | |
test_mod_add[compile] | 0.1532ms | 63.8922μs | 15.6514 KOps/s | 15.6641 KOps/s | |
test_mod_add[compile-overhead] | 0.1346ms | 63.1456μs | 15.8364 KOps/s | 16.0312 KOps/s | |
test_mod_wrap[eager] | 0.4522ms | 0.2205ms | 4.5342 KOps/s | 4.3958 KOps/s | |
test_mod_wrap[compile] | 1.6467ms | 0.2227ms | 4.4904 KOps/s | 4.2943 KOps/s | |
test_mod_wrap[compile-overhead] | 0.4411ms | 0.2216ms | 4.5124 KOps/s | 4.1871 KOps/s | |
test_mod_wrap_and_backward[eager] | 15.2552ms | 11.4430ms | 87.3894 Ops/s | 90.6309 Ops/s | |
test_mod_wrap_and_backward[compile] | 13.8527ms | 11.2881ms | 88.5887 Ops/s | 91.9739 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 15.8753ms | 11.3926ms | 87.7760 Ops/s | 90.6443 Ops/s | |
test_seq_add[eager] | 0.1999ms | 0.1170ms | 8.5465 KOps/s | 8.6379 KOps/s | |
test_seq_add[compile] | 0.1768ms | 75.9183μs | 13.1720 KOps/s | 13.5876 KOps/s | |
test_seq_add[compile-overhead] | 0.3011ms | 71.7022μs | 13.9466 KOps/s | 13.8431 KOps/s | |
test_seq_wrap[eager] | 0.6083ms | 0.4401ms | 2.2725 KOps/s | 2.2243 KOps/s | |
test_seq_wrap[compile] | 0.4655ms | 0.2377ms | 4.2074 KOps/s | 4.1371 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3355ms | 0.2363ms | 4.2314 KOps/s | 4.1749 KOps/s | |
test_func_call_runtime[False-eager] | 0.9513ms | 0.5441ms | 1.8378 KOps/s | 1.7666 KOps/s | |
test_func_call_runtime[False-compile] | 0.5980ms | 0.4383ms | 2.2813 KOps/s | 2.2526 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5205ms | 0.4379ms | 2.2836 KOps/s | 2.2593 KOps/s | |
test_func_call_runtime[True-eager] | 0.8961ms | 0.7570ms | 1.3210 KOps/s | 1.2957 KOps/s | |
test_func_call_runtime[True-compile] | 2.1094ms | 0.4628ms | 2.1607 KOps/s | 2.1418 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.8775ms | 0.4604ms | 2.1718 KOps/s | 2.1334 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9403ms | 0.5410ms | 1.8483 KOps/s | 1.8098 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.5755ms | 0.4370ms | 2.2885 KOps/s | 2.2558 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.7459ms | 0.4375ms | 2.2858 KOps/s | 2.2508 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.4318ms | 0.8946ms | 1.1178 KOps/s | 1.0896 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.9547ms | 0.7962ms | 1.2560 KOps/s | 1.2076 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.0039ms | 0.7990ms | 1.2515 KOps/s | 1.2029 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 3.0207ms | 1.8909ms | 528.8469 Ops/s | 515.5253 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9374ms | 0.5334ms | 1.8749 KOps/s | 1.8646 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.8625ms | 0.5313ms | 1.8821 KOps/s | 1.8679 KOps/s | |
test_distributed | 0.2245ms | 0.1232ms | 8.1140 KOps/s | 7.8014 KOps/s | |
test_tdmodule | 0.5294ms | 26.7239μs | 37.4197 KOps/s | 36.5522 KOps/s | |
test_tdmodule_dispatch | 74.0280μs | 47.8735μs | 20.8884 KOps/s | 20.2603 KOps/s | |
test_tdseq | 45.2850μs | 27.8031μs | 35.9672 KOps/s | 33.6787 KOps/s | |
test_tdseq_dispatch | 0.1255ms | 54.5873μs | 18.3193 KOps/s | 18.4617 KOps/s | |
test_instantiation_functorch | 2.3550ms | 1.5107ms | 661.9292 Ops/s | 660.5450 Ops/s | |
test_exec_functorch | 0.3438ms | 0.1783ms | 5.6073 KOps/s | 5.5380 KOps/s | |
test_exec_functional_call | 0.3164ms | 0.1737ms | 5.7570 KOps/s | 5.7275 KOps/s | |
test_exec_td_decorator | 0.4943ms | 0.2351ms | 4.2538 KOps/s | 4.2530 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8579ms | 0.6548ms | 1.5272 KOps/s | 1.5093 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0240ms | 0.6633ms | 1.5076 KOps/s | 1.4806 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7429ms | 0.5251ms | 1.9045 KOps/s | 1.8674 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.9551ms | 0.5288ms | 1.8912 KOps/s | 1.8654 KOps/s | |
test_to_module_speed[True] | 1.8557ms | 1.3050ms | 766.3047 Ops/s | 764.5533 Ops/s | |
test_to_module_speed[False] | 1.4532ms | 1.2788ms | 781.9869 Ops/s | 775.6743 Ops/s | |
test_tc_init | 82.9350μs | 45.3374μs | 22.0568 KOps/s | 21.6296 KOps/s | |
test_tc_init_nested | 0.1551ms | 89.1290μs | 11.2197 KOps/s | 10.8892 KOps/s | |
test_tc_first_layer_tensor | 18.3040μs | 1.5699μs | 636.9640 KOps/s | 645.6686 KOps/s | |
test_tc_first_layer_nontensor | 25.1870μs | 4.7424μs | 210.8631 KOps/s | 213.8098 KOps/s | |
test_tc_second_layer_tensor | 42.5200μs | 2.9598μs | 337.8656 KOps/s | 350.0351 KOps/s | |
test_tc_second_layer_nontensor | 27.5920μs | 6.1993μs | 161.3088 KOps/s | 166.2020 KOps/s | |
test_unbind | 0.2332s | 12.8928ms | 77.5630 Ops/s | 68.1804 Ops/s | |
test_full_like | 8.8565ms | 6.5159ms | 153.4701 Ops/s | 133.3698 Ops/s | |
test_zeros_like | 5.0391ms | 2.6270ms | 380.6604 Ops/s | 222.6627 Ops/s | |
test_ones_like | 4.6784ms | 3.2041ms | 312.0983 Ops/s | 320.7666 Ops/s | |
test_clone | 6.0001ms | 4.7928ms | 208.6451 Ops/s | 205.8061 Ops/s | |
test_squeeze | 59.7510μs | 12.1628μs | 82.2181 KOps/s | 78.8345 KOps/s | |
test_unsqueeze | 0.1650ms | 90.4950μs | 11.0503 KOps/s | 10.6375 KOps/s | |
test_split | 0.4795ms | 0.1933ms | 5.1745 KOps/s | 5.2276 KOps/s | |
test_permute | 0.4128ms | 0.1975ms | 5.0635 KOps/s | 4.9725 KOps/s | |
test_stack | 26.8243ms | 24.4554ms | 40.8908 Ops/s | 40.0737 Ops/s | |
test_cat | 44.8406ms | 25.1761ms | 39.7203 Ops/s | 40.5086 Ops/s |
vmoens
added a commit
that referenced
this pull request
Feb 19, 2025
ghstack-source-id: 9ca5bf2a5bc1f3fd88c29360fb088836ce35e8a7 Pull Request resolved: #1223
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 33.1300μs | 12.5570μs | 79.6366 KOps/s | 79.1701 KOps/s | |
test_plain_set_stack_nested | 45.4200μs | 12.6355μs | 79.1420 KOps/s | 78.5162 KOps/s | |
test_plain_set_nested_inplace | 38.6700μs | 13.6957μs | 73.0155 KOps/s | 72.6685 KOps/s | |
test_plain_set_stack_nested_inplace | 42.0900μs | 13.5604μs | 73.7442 KOps/s | 73.2678 KOps/s | |
test_items | 25.1000μs | 2.8982μs | 345.0403 KOps/s | 330.5454 KOps/s | |
test_items_nested | 0.4142ms | 0.3652ms | 2.7384 KOps/s | 2.7136 KOps/s | |
test_items_nested_locked | 0.4066ms | 0.3665ms | 2.7285 KOps/s | 2.7319 KOps/s | |
test_items_nested_leaf | 89.7210μs | 60.4545μs | 16.5414 KOps/s | 16.5910 KOps/s | |
test_items_stack_nested | 0.4035ms | 0.3650ms | 2.7396 KOps/s | 2.7501 KOps/s | |
test_items_stack_nested_leaf | 88.0410μs | 62.2274μs | 16.0701 KOps/s | 15.9832 KOps/s | |
test_items_stack_nested_locked | 0.4184ms | 0.3637ms | 2.7493 KOps/s | 2.7532 KOps/s | |
test_keys | 29.0100μs | 3.4499μs | 289.8641 KOps/s | 290.4680 KOps/s | |
test_keys_nested | 0.1247ms | 88.1331μs | 11.3465 KOps/s | 11.3732 KOps/s | |
test_keys_nested_locked | 0.7403ms | 94.0278μs | 10.6351 KOps/s | 10.7402 KOps/s | |
test_keys_nested_leaf | 0.1022ms | 79.5002μs | 12.5786 KOps/s | 12.6613 KOps/s | |
test_keys_stack_nested | 0.1184ms | 89.4472μs | 11.1798 KOps/s | 11.3032 KOps/s | |
test_keys_stack_nested_leaf | 0.1076ms | 80.6979μs | 12.3919 KOps/s | 12.5131 KOps/s | |
test_keys_stack_nested_locked | 0.1217ms | 95.0318μs | 10.5228 KOps/s | 10.5205 KOps/s | |
test_values | 6.0683μs | 0.8604μs | 1.1622 MOps/s | 1.1699 MOps/s | |
test_values_nested | 61.5910μs | 37.2829μs | 26.8219 KOps/s | 26.6680 KOps/s | |
test_values_nested_locked | 64.1610μs | 39.0512μs | 25.6074 KOps/s | 25.4169 KOps/s | |
test_values_nested_leaf | 67.8510μs | 42.2505μs | 23.6684 KOps/s | 23.5142 KOps/s | |
test_values_stack_nested | 65.9000μs | 38.2393μs | 26.1511 KOps/s | 26.4199 KOps/s | |
test_values_stack_nested_leaf | 65.2110μs | 42.7629μs | 23.3848 KOps/s | 23.3657 KOps/s | |
test_values_stack_nested_locked | 66.5010μs | 39.8273μs | 25.1084 KOps/s | 25.2497 KOps/s | |
test_membership | 2.0295μs | 0.5263μs | 1.9000 MOps/s | 1.8530 MOps/s | |
test_membership_nested | 37.9600μs | 2.1031μs | 475.4785 KOps/s | 473.2621 KOps/s | |
test_membership_nested_leaf | 19.5600μs | 2.0648μs | 484.3063 KOps/s | 491.6508 KOps/s | |
test_membership_stacked_nested | 34.8300μs | 2.1225μs | 471.1487 KOps/s | 467.1690 KOps/s | |
test_membership_stacked_nested_leaf | 32.0600μs | 2.1245μs | 470.6892 KOps/s | 467.5773 KOps/s | |
test_membership_nested_last | 26.7910μs | 3.1389μs | 318.5842 KOps/s | 320.0244 KOps/s | |
test_membership_nested_leaf_last | 41.8800μs | 3.1386μs | 318.6179 KOps/s | 319.9374 KOps/s | |
test_membership_stacked_nested_last | 31.2200μs | 4.1182μs | 242.8248 KOps/s | 242.4356 KOps/s | |
test_membership_stacked_nested_leaf_last | 32.1000μs | 4.1326μs | 241.9756 KOps/s | 243.7340 KOps/s | |
test_nested_getleaf | 33.1000μs | 6.2424μs | 160.1949 KOps/s | 162.3213 KOps/s | |
test_nested_get | 33.3300μs | 5.9340μs | 168.5207 KOps/s | 170.4022 KOps/s | |
test_stacked_getleaf | 49.0910μs | 6.3298μs | 157.9827 KOps/s | 164.0702 KOps/s | |
test_stacked_get | 86.6010μs | 5.7952μs | 172.5558 KOps/s | 171.2912 KOps/s | |
test_nested_getitemleaf | 34.0700μs | 6.3767μs | 156.8206 KOps/s | 154.2281 KOps/s | |
test_nested_getitem | 29.9910μs | 6.1189μs | 163.4291 KOps/s | 165.1074 KOps/s | |
test_stacked_getitemleaf | 37.6910μs | 6.3736μs | 156.8961 KOps/s | 155.4861 KOps/s | |
test_stacked_getitem | 37.4500μs | 6.0214μs | 166.0747 KOps/s | 166.7937 KOps/s | |
test_lock_nested | 0.4010ms | 0.3367ms | 2.9702 KOps/s | 2.9679 KOps/s | |
test_lock_stack_nested | 0.4189ms | 0.3420ms | 2.9242 KOps/s | 2.8863 KOps/s | |
test_unlock_nested | 0.3493ms | 0.2816ms | 3.5507 KOps/s | 3.5313 KOps/s | |
test_unlock_stack_nested | 0.3111ms | 0.2809ms | 3.5598 KOps/s | 3.5172 KOps/s | |
test_flatten_speed | 0.1129ms | 77.2906μs | 12.9382 KOps/s | 12.8772 KOps/s | |
test_unflatten_speed | 0.3673ms | 0.3231ms | 3.0948 KOps/s | 3.1082 KOps/s | |
test_common_ops | 0.7480ms | 0.6189ms | 1.6159 KOps/s | 1.6137 KOps/s | |
test_creation | 0.1267ms | 1.7607μs | 567.9610 KOps/s | 566.9679 KOps/s | |
test_creation_empty | 38.2300μs | 8.4816μs | 117.9026 KOps/s | 115.0093 KOps/s | |
test_creation_nested_1 | 37.1700μs | 10.1111μs | 98.9014 KOps/s | 96.1851 KOps/s | |
test_creation_nested_2 | 48.7710μs | 13.0141μs | 76.8397 KOps/s | 76.1703 KOps/s | |
test_clone | 51.2910μs | 10.5303μs | 94.9644 KOps/s | 92.5007 KOps/s | |
test_getitem[int] | 1.1259ms | 10.4895μs | 95.3333 KOps/s | 94.4960 KOps/s | |
test_getitem[slice_int] | 0.1076ms | 20.6850μs | 48.3443 KOps/s | 48.3081 KOps/s | |
test_getitem[range] | 0.1319ms | 38.3790μs | 26.0559 KOps/s | 26.2403 KOps/s | |
test_getitem[tuple] | 0.1048ms | 17.9670μs | 55.6576 KOps/s | 55.4325 KOps/s | |
test_getitem[list] | 0.1545ms | 35.6608μs | 28.0420 KOps/s | 27.9282 KOps/s | |
test_setitem_dim[int] | 46.9400μs | 19.2545μs | 51.9358 KOps/s | 48.4558 KOps/s | |
test_setitem_dim[slice_int] | 69.4410μs | 38.8755μs | 25.7231 KOps/s | 24.8987 KOps/s | |
test_setitem_dim[range] | 76.3110μs | 54.2038μs | 18.4489 KOps/s | 17.6594 KOps/s | |
test_setitem_dim[tuple] | 53.5410μs | 33.1225μs | 30.1910 KOps/s | 29.7849 KOps/s | |
test_setitem | 60.1300μs | 15.2768μs | 65.4586 KOps/s | 64.6335 KOps/s | |
test_set | 79.0910μs | 14.6560μs | 68.2315 KOps/s | 66.2499 KOps/s | |
test_set_shared | 0.5071ms | 0.1572ms | 6.3610 KOps/s | 6.2850 KOps/s | |
test_update | 0.3195ms | 17.7533μs | 56.3275 KOps/s | 54.8607 KOps/s | |
test_update_nested | 70.3900μs | 23.4074μs | 42.7215 KOps/s | 41.6193 KOps/s | |
test_update__nested | 0.5006ms | 25.8303μs | 38.7143 KOps/s | 38.6343 KOps/s | |
test_set_nested | 76.8300μs | 16.0207μs | 62.4193 KOps/s | 61.1013 KOps/s | |
test_set_nested_new | 79.9810μs | 18.2057μs | 54.9280 KOps/s | 53.7992 KOps/s | |
test_select | 78.9510μs | 30.2554μs | 33.0519 KOps/s | 32.2510 KOps/s | |
test_select_nested | 73.6810μs | 43.7972μs | 22.8325 KOps/s | 22.6436 KOps/s | |
test_exclude_nested | 88.4910μs | 63.0202μs | 15.8679 KOps/s | 15.7881 KOps/s | |
test_empty[True] | 0.3350ms | 0.2952ms | 3.3881 KOps/s | 3.3974 KOps/s | |
test_empty[False] | 2.1315μs | 0.8444μs | 1.1842 MOps/s | 1.1652 MOps/s | |
test_to | 86.6310μs | 56.1034μs | 17.8242 KOps/s | 16.5078 KOps/s | |
test_to_nonblocking | 77.7000μs | 47.0734μs | 21.2434 KOps/s | 21.4596 KOps/s | |
test_unbind_speed | 0.2998ms | 0.2409ms | 4.1506 KOps/s | 4.1543 KOps/s | |
test_unbind_speed_stack0 | 0.3064ms | 0.2393ms | 4.1787 KOps/s | 4.1166 KOps/s | |
test_unbind_speed_stack1 | 92.9789ms | 0.7306ms | 1.3687 KOps/s | 1.3563 KOps/s | |
test_split | 1.5723ms | 1.4528ms | 688.3268 Ops/s | 626.6267 Ops/s | |
test_chunk | 94.7964ms | 1.7451ms | 573.0374 Ops/s | 624.4282 Ops/s | |
test_consolidate[False-None] | 2.8562ms | 2.7772ms | 360.0725 Ops/s | 365.1427 Ops/s | |
test_consolidate[default-None] | 1.8133ms | 1.7286ms | 578.5015 Ops/s | 590.5399 Ops/s | |
test_consolidate[reduce-overhead-None] | 1.8570ms | 1.7768ms | 562.8189 Ops/s | 576.4568 Ops/s | |
test_consolidate_njt[False-None] | 6.9526ms | 6.5928ms | 151.6813 Ops/s | 152.0302 Ops/s | |
test_to[False-False-None] | 1.8097ms | 1.7298ms | 578.0873 Ops/s | 587.5137 Ops/s | |
test_to[True-False-None] | 1.6087ms | 1.3876ms | 720.6923 Ops/s | 731.1790 Ops/s | |
test_to[within-False-None] | 0.2951s | 5.4883ms | 182.2062 Ops/s | 237.8970 Ops/s | |
test_to[True-default-None] | 5.7136ms | 5.3258ms | 187.7655 Ops/s | 188.2238 Ops/s | |
test_to_njt[False-False-None] | 7.1418ms | 6.9989ms | 142.8804 Ops/s | 143.9772 Ops/s | |
test_to_njt[True-False-None] | 5.7181ms | 5.5740ms | 179.4031 Ops/s | 178.5933 Ops/s | |
test_to_njt[within-False-None] | 13.2585ms | 12.4269ms | 80.4709 Ops/s | 81.8274 Ops/s | |
test_creation[device0] | 0.4637ms | 80.1888μs | 12.4706 KOps/s | 12.5774 KOps/s | |
test_creation_from_tensor | 0.4783ms | 86.7402μs | 11.5287 KOps/s | 11.5858 KOps/s | |
test_add_one[memmap_tensor0] | 0.4686ms | 6.7563μs | 148.0091 KOps/s | 148.0512 KOps/s | |
test_contiguous[memmap_tensor0] | 1.9370μs | 0.4188μs | 2.3875 MOps/s | 2.3119 MOps/s | |
test_stack[memmap_tensor0] | 41.1810μs | 4.2818μs | 233.5457 KOps/s | 231.6412 KOps/s | |
test_memmaptd_index | 1.7834ms | 0.2469ms | 4.0498 KOps/s | 4.1237 KOps/s | |
test_memmaptd_index_astensor | 0.4394ms | 0.3084ms | 3.2431 KOps/s | 3.3087 KOps/s | |
test_memmaptd_index_op | 0.7409ms | 0.5884ms | 1.6996 KOps/s | 1.7032 KOps/s | |
test_serialize_model | 0.1307s | 0.1301s | 7.6875 Ops/s | 7.6686 Ops/s | |
test_serialize_model_pickle | 1.3494s | 1.2096s | 0.8267 Ops/s | 0.8263 Ops/s | |
test_serialize_weights | 0.2782s | 0.1507s | 6.6376 Ops/s | 7.7093 Ops/s | |
test_serialize_weights_returnearly | 0.3337s | 54.1093ms | 18.4811 Ops/s | 15.6169 Ops/s | |
test_serialize_weights_pickle | 1.3483s | 1.1830s | 0.8453 Ops/s | 0.8224 Ops/s | |
test_reshape_pytree | 58.5300μs | 22.5463μs | 44.3532 KOps/s | 44.0217 KOps/s | |
test_reshape_td | 63.5510μs | 27.2206μs | 36.7369 KOps/s | 36.3202 KOps/s | |
test_view_pytree | 48.9500μs | 21.8831μs | 45.6973 KOps/s | 44.7474 KOps/s | |
test_view_td | 65.1700μs | 32.7944μs | 30.4930 KOps/s | 28.1435 KOps/s | |
test_unbind_pytree | 53.5600μs | 27.7152μs | 36.0813 KOps/s | 35.2475 KOps/s | |
test_unbind_td | 0.5329ms | 37.3236μs | 26.7927 KOps/s | 26.4049 KOps/s | |
test_split_pytree | 55.8200μs | 29.7693μs | 33.5917 KOps/s | 33.7084 KOps/s | |
test_split_td | 0.7255ms | 38.6030μs | 25.9048 KOps/s | 25.2477 KOps/s | |
test_add_pytree | 72.0210μs | 34.6800μs | 28.8351 KOps/s | 28.0797 KOps/s | |
test_add_td | 87.6810μs | 48.4438μs | 20.6425 KOps/s | 17.5370 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1803ms | 0.1221ms | 8.1899 KOps/s | 7.8578 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2296ms | 0.1338ms | 7.4757 KOps/s | 7.4044 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1379ms | 95.9642μs | 10.4206 KOps/s | 10.2090 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.9772ms | 0.1516ms | 6.5981 KOps/s | 6.6628 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 77.6610μs | 24.5493μs | 40.7344 KOps/s | 40.0518 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 62.9500μs | 29.4923μs | 33.9072 KOps/s | 33.5207 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.3793ms | 63.8359μs | 15.6652 KOps/s | 15.2968 KOps/s | |
test_compile_copy_nested[pytree-eager] | 79.7900μs | 48.9212μs | 20.4411 KOps/s | 20.2183 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1874ms | 0.1447ms | 6.9129 KOps/s | 6.9668 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3060ms | 0.2169ms | 4.6108 KOps/s | 4.6596 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1371ms | 99.5618μs | 10.0440 KOps/s | 10.1670 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1140ms | 55.5123μs | 18.0140 KOps/s | 17.7005 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.1821ms | 0.1369ms | 7.3047 KOps/s | 7.3466 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.5877ms | 0.4879ms | 2.0496 KOps/s | 2.0730 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3875ms | 0.2602ms | 3.8434 KOps/s | 3.8411 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.1884ms | 0.1462ms | 6.8394 KOps/s | 6.9587 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1802ms | 71.2530μs | 14.0345 KOps/s | 14.6954 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1614ms | 98.8583μs | 10.1155 KOps/s | 10.0645 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.4499ms | 0.4117ms | 2.4292 KOps/s | 2.4748 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1739ms | 0.1352ms | 7.3951 KOps/s | 7.3896 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 49.0610μs | 18.7013μs | 53.4723 KOps/s | 53.2675 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 65.1500μs | 31.5165μs | 31.7294 KOps/s | 31.8655 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1145ms | 70.5343μs | 14.1775 KOps/s | 14.3641 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.2387ms | 52.5451μs | 19.0313 KOps/s | 19.1129 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 1.6235ms | 0.3922ms | 2.5500 KOps/s | 2.1611 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.8735ms | 2.6373ms | 379.1824 Ops/s | 379.1402 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 1.5977ms | 0.4323ms | 2.3134 KOps/s | 2.2594 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.0276ms | 2.6388ms | 378.9533 Ops/s | 381.9982 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.5496ms | 0.1201ms | 8.3253 KOps/s | 8.5128 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5637ms | 84.6004μs | 11.8203 KOps/s | 11.6824 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.6545ms | 0.1138ms | 8.7840 KOps/s | 8.8105 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.4692ms | 71.5230μs | 13.9815 KOps/s | 13.8635 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1743ms | 0.1148ms | 8.7119 KOps/s | 8.7277 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.4907ms | 72.0632μs | 13.8767 KOps/s | 13.8734 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1664ms | 0.1041ms | 9.6100 KOps/s | 9.9534 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.4158ms | 17.2604μs | 57.9361 KOps/s | 57.5758 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.5048ms | 95.7785μs | 10.4408 KOps/s | 10.3908 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 71.8100μs | 15.6984μs | 63.7010 KOps/s | 63.6035 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.5041ms | 99.5899μs | 10.0412 KOps/s | 10.1192 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.4070ms | 15.6538μs | 63.8824 KOps/s | 64.7475 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.5152ms | 0.1056ms | 9.4694 KOps/s | 9.8573 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5593ms | 17.2024μs | 58.1315 KOps/s | 58.7122 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.4967ms | 95.9531μs | 10.4218 KOps/s | 10.3226 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 51.0100μs | 15.7284μs | 63.5793 KOps/s | 64.2241 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.4896ms | 96.4307μs | 10.3701 KOps/s | 10.3038 KOps/s | |
test_compile_indexing[int-pytree-eager] | 0.4037ms | 15.5472μs | 64.3202 KOps/s | 64.5883 KOps/s | |
test_mod_add[eager] | 0.4384ms | 40.3971μs | 24.7542 KOps/s | 26.2611 KOps/s | |
test_mod_add[compile] | 0.4833ms | 81.3039μs | 12.2995 KOps/s | 11.5096 KOps/s | |
test_mod_add[compile-overhead] | 0.3310ms | 0.1689ms | 5.9214 KOps/s | 5.1342 KOps/s | |
test_mod_wrap[eager] | 0.6642ms | 0.2492ms | 4.0125 KOps/s | 3.9414 KOps/s | |
test_mod_wrap[compile] | 0.5871ms | 0.2975ms | 3.3613 KOps/s | 3.4744 KOps/s | |
test_mod_wrap[compile-overhead] | 7.4457ms | 3.8536ms | 259.5002 Ops/s | 264.0500 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.5086ms | 1.3583ms | 736.1985 Ops/s | 685.3425 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.3666ms | 1.2746ms | 784.5444 Ops/s | 777.7880 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3693ms | 0.9329ms | 1.0719 KOps/s | 1.0712 KOps/s | |
test_seq_add[eager] | 0.1879ms | 0.1197ms | 8.3514 KOps/s | 8.1025 KOps/s | |
test_seq_add[compile] | 0.1789ms | 93.6966μs | 10.6727 KOps/s | 10.6812 KOps/s | |
test_seq_add[compile-overhead] | 0.2102ms | 0.1301ms | 7.6873 KOps/s | 7.6153 KOps/s | |
test_seq_wrap[eager] | 0.5161ms | 0.4365ms | 2.2908 KOps/s | 2.3005 KOps/s | |
test_seq_wrap[compile] | 0.3685ms | 0.3058ms | 3.2705 KOps/s | 3.2507 KOps/s | |
test_seq_wrap[compile-overhead] | 0.2869ms | 0.2260ms | 4.4241 KOps/s | 4.3604 KOps/s | |
test_func_call_runtime[False-eager] | 0.8687ms | 0.7897ms | 1.2662 KOps/s | 1.3346 KOps/s | |
test_func_call_runtime[False-compile] | 0.9588ms | 0.7538ms | 1.3266 KOps/s | 1.3269 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4257ms | 0.3669ms | 2.7254 KOps/s | 2.7041 KOps/s | |
test_func_call_runtime[True-eager] | 0.9717ms | 0.9073ms | 1.1021 KOps/s | 1.1009 KOps/s | |
test_func_call_runtime[True-compile] | 0.8871ms | 0.7829ms | 1.2773 KOps/s | 1.2504 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4410ms | 0.3870ms | 2.5839 KOps/s | 2.5688 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8454ms | 0.7795ms | 1.2829 KOps/s | 1.3456 KOps/s | |
test_func_call_cm_runtime[False-compile] | 1.1070ms | 0.7567ms | 1.3216 KOps/s | 1.3282 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4189ms | 0.3683ms | 2.7153 KOps/s | 2.7073 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1044ms | 1.0048ms | 995.2661 Ops/s | 976.3317 Ops/s | |
test_func_call_cm_runtime[True-compile] | 1.1076ms | 1.0060ms | 994.0393 Ops/s | 989.6385 Ops/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.0689ms | 1.0028ms | 997.2021 Ops/s | 983.9413 Ops/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5282ms | 2.1035ms | 475.4059 Ops/s | 467.4250 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9862ms | 0.8214ms | 1.2174 KOps/s | 1.2034 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5702ms | 0.4183ms | 2.3908 KOps/s | 2.3660 KOps/s | |
test_distributed | 3.0196ms | 0.1933ms | 5.1727 KOps/s | 8.4654 KOps/s | |
test_tdmodule | 57.0110μs | 20.7861μs | 48.1090 KOps/s | 48.2111 KOps/s | |
test_tdmodule_dispatch | 63.8900μs | 37.1617μs | 26.9094 KOps/s | 27.2221 KOps/s | |
test_tdseq | 42.5500μs | 21.8762μs | 45.7119 KOps/s | 48.0281 KOps/s | |
test_tdseq_dispatch | 78.0010μs | 41.0342μs | 24.3699 KOps/s | 25.6122 KOps/s | |
test_instantiation_functorch | 1.6785ms | 1.5456ms | 646.9935 Ops/s | 637.6680 Ops/s | |
test_exec_functorch | 0.2011ms | 0.1444ms | 6.9256 KOps/s | 6.9161 KOps/s | |
test_exec_functional_call | 0.2124ms | 0.1375ms | 7.2724 KOps/s | 7.1023 KOps/s | |
test_exec_td_decorator | 0.3781ms | 0.1901ms | 5.2605 KOps/s | 5.2424 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8389ms | 0.6886ms | 1.4521 KOps/s | 1.4430 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8189ms | 0.6913ms | 1.4465 KOps/s | 1.4423 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7160ms | 0.6004ms | 1.6655 KOps/s | 1.6592 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7423ms | 0.6000ms | 1.6666 KOps/s | 1.6549 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.9385ms | 19.3135ms | 51.7772 Ops/s | 51.7390 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.4224ms | 19.3268ms | 51.7416 Ops/s | 51.8224 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.2109ms | 19.1082ms | 52.3334 Ops/s | 52.2726 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.5489ms | 19.1547ms | 52.2065 Ops/s | 52.2559 Ops/s | |
test_to_module_speed[True] | 1.4609ms | 0.9881ms | 1.0120 KOps/s | 1.0155 KOps/s | |
test_to_module_speed[False] | 1.3852ms | 0.9677ms | 1.0334 KOps/s | 1.0376 KOps/s | |
test_tc_init | 58.5100μs | 36.1058μs | 27.6964 KOps/s | 26.9386 KOps/s | |
test_tc_init_nested | 0.1667ms | 71.9176μs | 13.9048 KOps/s | 13.2733 KOps/s | |
test_tc_first_layer_tensor | 21.3300μs | 0.7912μs | 1.2639 MOps/s | 1.2587 MOps/s | |
test_tc_first_layer_nontensor | 22.3110μs | 2.2220μs | 450.0496 KOps/s | 447.0886 KOps/s | |
test_tc_second_layer_tensor | 13.1852μs | 1.4022μs | 713.1722 KOps/s | 713.5022 KOps/s | |
test_tc_second_layer_nontensor | 34.8800μs | 2.9491μs | 339.0857 KOps/s | 338.0095 KOps/s | |
test_unbind | 0.2135s | 12.1936ms | 82.0102 Ops/s | 139.3735 Ops/s | |
test_full_like | 9.4743ms | 9.2124ms | 108.5488 Ops/s | 108.2303 Ops/s | |
test_zeros_like | 4.7086ms | 4.1993ms | 238.1324 Ops/s | 231.3950 Ops/s | |
test_ones_like | 5.0022ms | 4.3329ms | 230.7925 Ops/s | 231.1030 Ops/s | |
test_clone | 11.3711ms | 9.1209ms | 109.6377 Ops/s | 69.0717 Ops/s | |
test_squeeze | 59.2300μs | 10.0215μs | 99.7857 KOps/s | 99.7274 KOps/s | |
test_unsqueeze | 0.1217ms | 77.0992μs | 12.9703 KOps/s | 13.0711 KOps/s | |
test_split | 0.3825ms | 0.1696ms | 5.8959 KOps/s | 6.0956 KOps/s | |
test_permute | 0.2494ms | 0.1962ms | 5.0967 KOps/s | 5.1224 KOps/s | |
test_stack | 50.8269ms | 50.4611ms | 19.8173 Ops/s | 19.7246 Ops/s | |
test_cat | 50.9674ms | 50.4291ms | 19.8298 Ops/s | 19.7985 Ops/s |
vmoens
added a commit
that referenced
this pull request
Feb 20, 2025
ghstack-source-id: 526f9ce8202fc48bc64ed7c8094c9c72f3bc4a71 Pull Request resolved: #1223
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):