New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use unique cache locations in backward for pipeline prefetching #2151

Closed

sryap wants to merge 2 commits into pytorch:main from sryap:export-D51339208

Contributor

sryap commented Nov 21, 2023

Summary:
When pipeline prefetching is enabled (prefetch_pipeline=True) for
EmbeddingLocation.MANAGED_CACHING, TBE has to update
lxu_cache_locations to ensure cache consistency. Prior to this
diff, TBE performs the full cache lookup when updating
lxu_cache_locations (i.e., looking up all indices although they are
duplicated). The time complexity of such the lookup grows with
the number of indices. Thus, the lookup can be expensive when the
number of indices is large. This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices). The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively. The unique lxu_cache_locations are fed directly to TBE
backward to consume. Thus, there is no decompression cost.

Differential Revision: D51339208

netlify bot commented Nov 21, 2023 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`600b56a`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6565364c4b7f5700078878a6

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Nov 21, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

facebook-github-bot added the fb-exported label

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Use unique cache locations in backward for pipeline prefetching (pyto…

d5a6b53

…rch#2151)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: jianyuh

Differential Revision: D51339208

sryap force-pushed the export-D51339208 branch from 9ba3e7a to d5a6b53 Compare

November 22, 2023 18:58

Contributor

facebook-github-bot commented Nov 22, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

1 similar comment

Contributor

facebook-github-bot commented Nov 22, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Use unique cache locations in backward for pipeline prefetching (pyto…

1b49fbd

…rch#2151)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: jianyuh

Differential Revision: D51339208

sryap force-pushed the export-D51339208 branch from d5a6b53 to 1b49fbd Compare

November 22, 2023 21:09

Contributor

facebook-github-bot commented Nov 22, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

2 similar comments

Contributor

facebook-github-bot commented Nov 22, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

Contributor

facebook-github-bot commented Nov 22, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Use unique cache locations in backward for pipeline prefetching (pyto…

5363d8c

…rch#2151)

Summary:
Pull Request resolved: pytorch#2151

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: jianyuh

Differential Revision: D51339208

fbshipit-source-id: a89b54a529a3628eb37e983b7f210d3a29ea315c

sryap force-pushed the export-D51339208 branch from 1b49fbd to 5363d8c Compare

November 22, 2023 21:38

Sarunya Pumma added 2 commits

November 27, 2023 16:36


          Add/modify LXU cache lookup ops for pipeline prefetching (pytorch#2154)

57163d4

Summary:

This diff adds/updates LXU cache APIs for pipeline prefetching:

- Update `lxu_cache_lookup` to allow for unique linear cache indices
lookup and external output tensor to be passed to the op
- Update `lxu_cache_locations_update` to support unique cache
locations update
- Add Python binding for `get_unique_indices`

Reviewed By: levythu

Differential Revision: D51532548


          Use unique cache locations in backward for pipeline prefetching (pyto…

600b56a

…rch#2151)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: ehsanardestani, jianyuh

Differential Revision: D51339208

sryap force-pushed the export-D51339208 branch from 5363d8c to 600b56a Compare

November 28, 2023 00:37

Contributor

facebook-github-bot commented Nov 28, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

1 similar comment

Contributor

facebook-github-bot commented Nov 28, 2023

This pull request was exported from Phabricator. Differential Revision: D51339208

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Use unique cache locations in backward for pipeline prefetching (pyto…

4508edc

…rch#2151)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: ehsanardestani, jianyuh

Differential Revision: D51339208

facebook-github-bot closed this in

035ed1f

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Nov 28, 2023

This pull request has been merged in 035ed1f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged