-
Notifications
You must be signed in to change notification settings - Fork 515
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix on prefetch list issues (thread safe access on forward function) (#…
…810) Summary: Pull Request resolved: #810 ## Root cause: In inference, the TBE class is the same for all sparse streams (different GPU sparse workers) with multi-thread. This means the variable in the class in global (e.g., https://fburl.com/phabricator/r5m9sq38 ). When handling multiple requests in the predictor, we will have multi-thread invocation of `forward` function. If we have mutable values (e.g., `self.timestep`, `self.timesteps_prefetched`, etc.), it will cause race condition and report the errors in the following post. ## Solution: - Create a torch custom class `AtomicCounter` (thread safe) for the counter used for timesteps. - Remove `timesteps_prefetched` list and use the atomic counter to record the size of the prefetch steps - Create a torch custom class `TensorQueue` (thread safe) for `lxu_cache_locations_list`. Reviewed By: yinghai Differential Revision: D32954954 fbshipit-source-id: fc9cdd394c50832d4bdf455eb0336308eaff49fd
- Loading branch information
1 parent
ff93b93
commit 60be17f
Showing
2 changed files
with
156 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters