Add cache conflict miss support (backend) #2596

sryap · 2024-05-16T17:54:13Z

Differential Revision: D55998215

facebook-github-bot · 2024-05-16T17:54:21Z

This pull request was exported from Phabricator. Differential Revision: D55998215

netlify · 2024-05-16T17:54:28Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`3d84b25`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/664d9809b35ad30008cec06f
😎 Deploy Preview	https://deploy-preview-2596--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-05-20T05:23:01Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

facebook-github-bot · 2024-05-20T07:02:53Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

facebook-github-bot · 2024-05-20T10:34:37Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

facebook-github-bot · 2024-05-22T05:08:29Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

facebook-github-bot · 2024-05-22T06:43:50Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Prior to this diff, SSD TBE lacked support for the conflict cache miss scenario. It operated under the assumption that the cache, located in GPU memory, was sufficiently large to hold all prefetched data from SSD. In the event of a conflict cache miss, the behavior of SSD TBE would be unpredictable (it could either fail or potentially access illegal memory). Note that a conflict cache miss happens when an embedding row is absent in the cache, and after being fetched from SSD, it cannot be inserted into the cache due to capacity constraints or associativity limitations. This diff introduces support for conflict cache misses by storing rows that cannot be inserted into the cache due to conflicts in a scratch pad, which is a temporary GPU tensor. In the case where rows are missed from the cache, TBE kernels can access the scratch pad. Prior to this diff, during the SSD prefetch stage, any row that was missed the cache and required fetching from SSD would be first fetched into a CPU scratch pad and then transferred to GPU. Rows that could be inserted into the cache would subsequently be copied from the GPU scratch pad into the cache. If conflict misses occurred, the prefetch behavior would be unpredictable. With this diff, conflict missed rows are now retained in the scratch pad, which is kept alive until the current iteration completes. Throughout the forward and backward + optimizer stages of TBE, both the cache and scratch pad are equivalent in terms of usage. However, following the completion of the backward + optimizer step, rows in the scratch pad are flushed back to SSD, unlike rows residing in the cache which are not evicted for future usage (see the diagram below for more details). {F1645878181} Differential Revision: D55998215

facebook-github-bot · 2024-05-22T06:52:56Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Prior to this diff, SSD TBE lacked support for the conflict cache miss scenario. It operated under the assumption that the cache, located in GPU memory, was sufficiently large to hold all prefetched data from SSD. In the event of a conflict cache miss, the behavior of SSD TBE would be unpredictable (it could either fail or potentially access illegal memory). Note that a conflict cache miss happens when an embedding row is absent in the cache, and after being fetched from SSD, it cannot be inserted into the cache due to capacity constraints or associativity limitations. This diff introduces support for conflict cache misses by storing rows that cannot be inserted into the cache due to conflicts in a scratch pad, which is a temporary GPU tensor. In the case where rows are missed from the cache, TBE kernels can access the scratch pad. Prior to this diff, during the SSD prefetch stage, any row that was missed the cache and required fetching from SSD would be first fetched into a CPU scratch pad and then transferred to GPU. Rows that could be inserted into the cache would subsequently be copied from the GPU scratch pad into the cache. If conflict misses occurred, the prefetch behavior would be unpredictable. With this diff, conflict missed rows are now retained in the scratch pad, which is kept alive until the current iteration completes. Throughout the forward and backward + optimizer stages of TBE, both the cache and scratch pad are equivalent in terms of usage. However, following the completion of the backward + optimizer step, rows in the scratch pad are flushed back to SSD, unlike rows residing in the cache which are not evicted for future usage (see the diagram below for more details). {F1645878181} Differential Revision: D55998215

facebook-github-bot · 2024-05-22T07:00:15Z

This pull request was exported from Phabricator. Differential Revision: D55998215

Summary: Pull Request resolved: pytorch#2596 Prior to this diff, SSD TBE lacked support for the conflict cache miss scenario. It operated under the assumption that the cache, located in GPU memory, was sufficiently large to hold all prefetched data from SSD. In the event of a conflict cache miss, the behavior of SSD TBE would be unpredictable (it could either fail or potentially access illegal memory). Note that a conflict cache miss happens when an embedding row is absent in the cache, and after being fetched from SSD, it cannot be inserted into the cache due to capacity constraints or associativity limitations. This diff introduces support for conflict cache misses by storing rows that cannot be inserted into the cache due to conflicts in a scratch pad, which is a temporary GPU tensor. In the case where rows are missed from the cache, TBE kernels can access the scratch pad. Prior to this diff, during the SSD prefetch stage, any row that was missed the cache and required fetching from SSD would be first fetched into a CPU scratch pad and then transferred to GPU. Rows that could be inserted into the cache would subsequently be copied from the GPU scratch pad into the cache. If conflict misses occurred, the prefetch behavior would be unpredictable. With this diff, conflict missed rows are now retained in the scratch pad, which is kept alive until the current iteration completes. Throughout the forward and backward + optimizer stages of TBE, both the cache and scratch pad are equivalent in terms of usage. However, following the completion of the backward + optimizer step, rows in the scratch pad are flushed back to SSD, unlike rows residing in the cache which are not evicted for future usage (see the diagram below for more details). {F1645878181} Differential Revision: D55998215

facebook-github-bot · 2024-05-24T21:52:17Z

This pull request has been merged in db4d379.

facebook-github-bot added the cla signed label May 16, 2024

facebook-github-bot added the fb-exported label May 16, 2024

sryap force-pushed the export-D55998215 branch from 269b61f to 607bfd3 Compare May 20, 2024 05:23

sryap added a commit to sryap/FBGEMM that referenced this pull request May 20, 2024

Add cache conflict miss support (backend) (pytorch#2596)

607bfd3

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

sryap added a commit to sryap/FBGEMM that referenced this pull request May 20, 2024

Add cache conflict miss support (backend) (pytorch#2596)

9435b51

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

sryap force-pushed the export-D55998215 branch from 607bfd3 to 9435b51 Compare May 20, 2024 07:03

sryap added a commit to sryap/FBGEMM that referenced this pull request May 20, 2024

Add cache conflict miss support (backend) (pytorch#2596)

4fe203b

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

sryap force-pushed the export-D55998215 branch from 9435b51 to 4fe203b Compare May 20, 2024 10:34

sryap force-pushed the export-D55998215 branch from 4fe203b to ee00d88 Compare May 22, 2024 05:08

sryap added a commit to sryap/FBGEMM that referenced this pull request May 22, 2024

Add cache conflict miss support (backend) (pytorch#2596)

ee00d88

Summary: Pull Request resolved: pytorch#2596 Differential Revision: D55998215

sryap force-pushed the export-D55998215 branch from ee00d88 to 7b4e689 Compare May 22, 2024 06:44

sryap force-pushed the export-D55998215 branch from 7b4e689 to 30f1025 Compare May 22, 2024 06:52

sryap force-pushed the export-D55998215 branch from 30f1025 to 3d84b25 Compare May 22, 2024 07:00

facebook-github-bot closed this in db4d379 May 24, 2024

facebook-github-bot added the Merged label May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache conflict miss support (backend) #2596

Add cache conflict miss support (backend) #2596

sryap commented May 16, 2024

facebook-github-bot commented May 16, 2024

netlify bot commented May 16, 2024 •

edited

Loading

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 24, 2024

Add cache conflict miss support (backend) #2596

Add cache conflict miss support (backend) #2596

Conversation

sryap commented May 16, 2024

facebook-github-bot commented May 16, 2024

netlify bot commented May 16, 2024 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 20, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 22, 2024

facebook-github-bot commented May 24, 2024

netlify bot commented May 16, 2024 •

edited

Loading