Batch size affecting output when using GPT2Model #14743

wade3han · 2021-12-13T10:35:51Z

Environment info

transformers version:
Platform: 4.12.5
Python version: Python 3.8.12
PyTorch version (GPU?): 1.10.0 (GPU)
Tensorflow version (GPU?): X
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

import torch
from transformers import AutoModel, AutoTokenizer

def get_device_from_arg(device_id):
    if (device_id is not None and
            torch.cuda.is_available() and
            0 <= device_id < torch.cuda.device_count()):
        return torch.device(f'cuda:{device_id}')
    else:
        return CPU_DEVICE

def get_model(model_name, tokenizer, device_id):
    device = get_device_from_arg(device_id)
    model = AutoModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id).to(device)
    model = model.eval()
    return model

def get_tokenizer(model_name='gpt2'):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return tokenizer

TOKENIZER = get_tokenizer('gpt2-large')
MODEL = get_model('gpt2-large', TOKENIZER, 0)
human_texts = ["Hello World!", "What is huggingface?"]
tokenized_texts = [
    TOKENIZER.encode(sen, return_tensors='pt', truncation=True, max_length=1024)
    for sen in human_texts
]
device = next(MODEL.parameters()).device
padded_chunk = torch.nn.utils.rnn.pad_sequence([t.view(-1) for t in tokenized_texts],
                                               batch_first=True,
                                               padding_value=0).to(device)
attention_mask = torch.nn.utils.rnn.pad_sequence(
            [torch.ones(len(t.view(-1))).long() for t in tokenized_texts],
            batch_first=True,
            padding_value=0).to(device)

outs = MODEL(input_ids=padded_chunk,
             attention_mask=attention_mask,
             past_key_values=None,
             output_hidden_states=True,
             return_dict=True,
             output_attentions=True)

outs2 = MODEL(input_ids=padded_chunk[:1],
              attention_mask=attention_mask[:1],
              past_key_values=None,
              output_hidden_states=True,
              return_dict=True,
              output_attentions=True)

print(outs.hidden_states[0][0] - outs2.hidden_states[0][0])
print(outs.hidden_states[-1][0] - outs2.hidden_states[-1][0])

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0',
       grad_fn=<SubBackward0>)
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [-2.9385e-04, -1.7121e-05, -3.2863e-04,  ..., -1.3408e-04,
         -1.4349e-04, -9.2506e-05],
        [ 9.9063e-05, -3.7980e-04,  2.1064e-04,  ...,  5.2011e-04,
          1.3547e-04, -4.0713e-04],
        [-1.8436e-04,  4.5538e-05, -7.6592e-06,  ...,  1.5700e-04,
         -4.7076e-05, -2.0326e-04],
        [-2.0707e-04, -6.7145e-05, -1.3128e-04,  ...,  6.8665e-05,
         -2.5548e-04, -1.2420e-04]], device='cuda:0', grad_fn=<SubBackward0>)

The value of hidden states at first is same between two outputs, however the difference gets slightly bigger at last.
#2401 also tackled same issue, however it isn't resolved.

Expected behavior

The model outputs should be exactly same.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2022-01-11T09:25:18Z

Hello! I ran your code sample on CPU and got the following results:

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], grad_fn=<SubBackward0>)
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], grad_fn=<SubBackward0>)

Do you also get the same when running on CPU?

wade3han · 2022-02-03T15:01:08Z

I got the results below when rerun on CPU. It seems the error gets lower!

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], grad_fn=<SubBackward0>)
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [-3.5763e-07, -9.2387e-07, -2.9802e-07,  ..., -1.1921e-07,
         -3.0920e-07,  0.0000e+00],
        [ 0.0000e+00, -5.3644e-07, -7.7486e-07,  ...,  3.5763e-07,
         -2.0117e-07,  2.6450e-07],
        [-2.3842e-07,  8.9407e-08, -5.9605e-08,  ...,  5.9605e-07,
         -7.8231e-08,  1.4901e-08],
        [-5.9605e-08,  2.0862e-07, -1.9073e-06,  ...,  1.1921e-06,
          5.1036e-07, -8.7544e-08]], grad_fn=<SubBackward0>)

github-actions · 2022-02-27T15:02:14Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

wade3han mentioned this issue Dec 14, 2021

Add batched implementation of mauve krishnap25/mauve#1

Merged

github-actions bot closed this as completed Mar 7, 2022

infinitylogesh mentioned this issue Feb 19, 2023

Add batch evaluation support when batch_size > 1 bigcode-project/bigcode-evaluation-harness#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size affecting output when using GPT2Model #14743

Batch size affecting output when using GPT2Model #14743

wade3han commented Dec 13, 2021

LysandreJik commented Jan 11, 2022

wade3han commented Feb 3, 2022

github-actions bot commented Feb 27, 2022

Batch size affecting output when using GPT2Model #14743

Batch size affecting output when using GPT2Model #14743

Comments

wade3han commented Dec 13, 2021

Environment info

Information

To reproduce

Expected behavior

LysandreJik commented Jan 11, 2022

wade3han commented Feb 3, 2022

github-actions bot commented Feb 27, 2022