-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unique cache locations in backward for pipeline prefetching #2151
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D51339208 |
…rch#2151) Summary: When pipeline prefetching is enabled (`prefetch_pipeline=True`) for `EmbeddingLocation.MANAGED_CACHING`, TBE has to update `lxu_cache_locations` to ensure cache consistency. Prior to this diff, TBE performs the full cache lookup when updating `lxu_cache_locations` (i.e., looking up all indices although they are duplicated). The time complexity of such the lookup grows with the number of indices. Thus, the lookup can be expensive when the number of indices is large. This diff addresses this problem by looking up only the unique indices (which is normally much smaller than the full indices). The number of unique indices tends to stay more or less the same even when the total number of indices grows. Thus, looking up only unique indices can reduce cost of cache lookup effectively. The unique `lxu_cache_locations` are fed directly to TBE backward to consume. Thus, there is no decompression cost. Reviewed By: jianyuh Differential Revision: D51339208
9ba3e7a
to
d5a6b53
Compare
This pull request was exported from Phabricator. Differential Revision: D51339208 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D51339208 |
…rch#2151) Summary: When pipeline prefetching is enabled (`prefetch_pipeline=True`) for `EmbeddingLocation.MANAGED_CACHING`, TBE has to update `lxu_cache_locations` to ensure cache consistency. Prior to this diff, TBE performs the full cache lookup when updating `lxu_cache_locations` (i.e., looking up all indices although they are duplicated). The time complexity of such the lookup grows with the number of indices. Thus, the lookup can be expensive when the number of indices is large. This diff addresses this problem by looking up only the unique indices (which is normally much smaller than the full indices). The number of unique indices tends to stay more or less the same even when the total number of indices grows. Thus, looking up only unique indices can reduce cost of cache lookup effectively. The unique `lxu_cache_locations` are fed directly to TBE backward to consume. Thus, there is no decompression cost. Reviewed By: jianyuh Differential Revision: D51339208
d5a6b53
to
1b49fbd
Compare
This pull request was exported from Phabricator. Differential Revision: D51339208 |
2 similar comments
This pull request was exported from Phabricator. Differential Revision: D51339208 |
This pull request was exported from Phabricator. Differential Revision: D51339208 |
…rch#2151) Summary: Pull Request resolved: pytorch#2151 When pipeline prefetching is enabled (`prefetch_pipeline=True`) for `EmbeddingLocation.MANAGED_CACHING`, TBE has to update `lxu_cache_locations` to ensure cache consistency. Prior to this diff, TBE performs the full cache lookup when updating `lxu_cache_locations` (i.e., looking up all indices although they are duplicated). The time complexity of such the lookup grows with the number of indices. Thus, the lookup can be expensive when the number of indices is large. This diff addresses this problem by looking up only the unique indices (which is normally much smaller than the full indices). The number of unique indices tends to stay more or less the same even when the total number of indices grows. Thus, looking up only unique indices can reduce cost of cache lookup effectively. The unique `lxu_cache_locations` are fed directly to TBE backward to consume. Thus, there is no decompression cost. Reviewed By: jianyuh Differential Revision: D51339208 fbshipit-source-id: a89b54a529a3628eb37e983b7f210d3a29ea315c
1b49fbd
to
5363d8c
Compare
Summary: This diff adds/updates LXU cache APIs for pipeline prefetching: - Update `lxu_cache_lookup` to allow for unique linear cache indices lookup and external output tensor to be passed to the op - Update `lxu_cache_locations_update` to support unique cache locations update - Add Python binding for `get_unique_indices` Reviewed By: levythu Differential Revision: D51532548
…rch#2151) Summary: When pipeline prefetching is enabled (`prefetch_pipeline=True`) for `EmbeddingLocation.MANAGED_CACHING`, TBE has to update `lxu_cache_locations` to ensure cache consistency. Prior to this diff, TBE performs the full cache lookup when updating `lxu_cache_locations` (i.e., looking up all indices although they are duplicated). The time complexity of such the lookup grows with the number of indices. Thus, the lookup can be expensive when the number of indices is large. This diff addresses this problem by looking up only the unique indices (which is normally much smaller than the full indices). The number of unique indices tends to stay more or less the same even when the total number of indices grows. Thus, looking up only unique indices can reduce cost of cache lookup effectively. The unique `lxu_cache_locations` are fed directly to TBE backward to consume. Thus, there is no decompression cost. Reviewed By: ehsanardestani, jianyuh Differential Revision: D51339208
5363d8c
to
600b56a
Compare
This pull request was exported from Phabricator. Differential Revision: D51339208 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D51339208 |
…rch#2151) Summary: When pipeline prefetching is enabled (`prefetch_pipeline=True`) for `EmbeddingLocation.MANAGED_CACHING`, TBE has to update `lxu_cache_locations` to ensure cache consistency. Prior to this diff, TBE performs the full cache lookup when updating `lxu_cache_locations` (i.e., looking up all indices although they are duplicated). The time complexity of such the lookup grows with the number of indices. Thus, the lookup can be expensive when the number of indices is large. This diff addresses this problem by looking up only the unique indices (which is normally much smaller than the full indices). The number of unique indices tends to stay more or less the same even when the total number of indices grows. Thus, looking up only unique indices can reduce cost of cache lookup effectively. The unique `lxu_cache_locations` are fed directly to TBE backward to consume. Thus, there is no decompression cost. Reviewed By: ehsanardestani, jianyuh Differential Revision: D51339208
This pull request has been merged in 035ed1f. |
Summary:
When pipeline prefetching is enabled (
prefetch_pipeline=True
) forEmbeddingLocation.MANAGED_CACHING
, TBE has to updatelxu_cache_locations
to ensure cache consistency. Prior to thisdiff, TBE performs the full cache lookup when updating
lxu_cache_locations
(i.e., looking up all indices although they areduplicated). The time complexity of such the lookup grows with
the number of indices. Thus, the lookup can be expensive when the
number of indices is large. This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices). The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively. The unique
lxu_cache_locations
are fed directly to TBEbackward to consume. Thus, there is no decompression cost.
Differential Revision: D51339208