forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allocate a big output tensor and split in group_index_select_dim0_bac…
…kward (pytorch#1764) Summary: Pull Request resolved: pytorch#1764 Before this diff, `group_index_select_dim0` backward calls `at::zeros` `group_size` number of times which launches `group_size` elementwise kernels. Since `group_size` can be a large value (up to 55), this can be costly. This diff fixes the problem by allocating one big tensor and splitting it into smaller tensors. This will launch only one elementwise kernel per group. However, this can cause higher overhead on the host side. Reviewed By: jspark1105 Differential Revision: D45823864 fbshipit-source-id: 92939fbd3801c599c475f45609c55dcc23cedbfc
- Loading branch information
1 parent
5cef9fc
commit 1fc2382
Showing
1 changed file
with
33 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters