Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache conflict miss support (backend) #2596

Closed
wants to merge 1 commit into from

Commits on May 22, 2024

  1. Add cache conflict miss support (pytorch#2596)

    Summary:
    Pull Request resolved: pytorch#2596
    
    Prior to this diff, SSD TBE lacked support for the conflict cache miss
    scenario. It operated under the assumption that the cache, located in
    GPU memory, was sufficiently large to hold all prefetched data from
    SSD. In the event of a conflict cache miss, the behavior of SSD TBE
    would be unpredictable (it could either fail or potentially access
    illegal memory). Note that a conflict cache miss happens when an
    embedding row is absent in the cache, and after being fetched from
    SSD, it cannot be inserted into the cache due to capacity constraints
    or associativity limitations.
    
    This diff introduces support for conflict cache misses by storing rows
    that cannot be inserted into the cache due to conflicts in a scratch
    pad, which is a temporary GPU tensor. In the case where rows are
    missed from the cache, TBE kernels can access the scratch pad.
    
    Prior to this diff, during the SSD prefetch stage, any row that was
    missed the cache and required fetching from SSD would be first fetched
    into a CPU scratch pad and then transferred to GPU. Rows that could be
    inserted into the cache would subsequently be copied from the GPU
    scratch pad into the cache. If conflict misses occurred, the prefetch
    behavior would be unpredictable. With this diff, conflict missed rows
    are now retained in the scratch pad, which is kept alive until the
    current iteration completes.  Throughout the forward and backward +
    optimizer stages of TBE, both the cache and scratch pad are equivalent
    in terms of usage. However, following the completion of the backward +
    optimizer step, rows in the scratch pad are flushed back to SSD,
    unlike rows residing in the cache which are not evicted for future
    usage (see the diagram below for more details).
    
     {F1645878181}
    
    Differential Revision: D55998215
    sryap authored and facebook-github-bot committed May 22, 2024
    Configuration menu
    Copy the full SHA
    3d84b25 View commit details
    Browse the repository at this point in the history