Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure explicit output dtype for pad_across_processes #3219

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/accelerate/utils/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -669,7 +669,7 @@ def _pad_across_processes(tensor, dim=0, pad_index=0, pad_first=False):
old_size = tensor.shape
new_size = list(old_size)
new_size[dim] = max_size
new_tensor = tensor.new_zeros(tuple(new_size)) + pad_index
new_tensor = (tensor.new_zeros(tuple(new_size)) + pad_index).to(tensor.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm wondering how safe this is in case that the tensor dtype cannot represent the new data. E.g. when pad_index is not 0 or 1, casting this to bool will result in a loss of information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pad_index in the context of LLMs is usually -100.

Copy link
Contributor Author

@mariusarvinte mariusarvinte Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any loss of useful information per se, given that the original data is retained downstream

new_tensor[indices] = tensor

What this does change though is the actual pad value. Any non-zero pad_index (e.g., -100) will result in padding with True for bool.

Does it make sense to always pad with False for bool? In our usecase, we directly manipulated bool tensors across devices and left pad_index = 0 by default. Not sure if bool actually appears in LLMs.

if pad_first:
indices = tuple(
slice(max_size - old_size[dim], max_size) if i == dim else slice(None) for i in range(len(new_size))
Expand Down
8 changes: 8 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,8 @@ def forward(self, x):
def test_pad_across_processes(self):
from torch.nested import nested_tensor

from accelerate import Accelerator

nt = nested_tensor([[1, 2, 3], [1], [1, 2]])
with self.assertWarns(CannotPadNestedTensorWarning):
nt2 = pad_across_processes(nt)
Expand All @@ -316,6 +318,12 @@ def test_pad_across_processes(self):
padded_tensor = pad_across_processes(tensor, dim=-4)
assert padded_tensor is tensor

# Booleans should be returned with the correct type
accelerator = Accelerator()
tensor = (torch.randn(4, 3, 100 * (accelerator.process_index + 1)) > 0).to(torch.bool)
padded_tensor = pad_across_processes(tensor, dim=-1)
assert padded_tensor.dtype == torch.bool

def test_slice_and_concatenate(self):
# First base case: 2 processes, batch size of 1
num_processes = 2
Expand Down