fix random attention for pytorch's bigbird/pegasus_bigbird #23056

Bearnardd · 2023-04-28T23:43:58Z

Fixes # (issue)
#23055

What does this PR do?

Add control over usage of random attention of BigBird based on current mode (training/eval)

Who can review?

@sanchit-gandhi @ydshieh

HuggingFaceDocBuilderDev · 2023-04-29T00:03:32Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2023-05-02T12:38:34Z

Hi @Bearnardd Thank you for the PR.

I have one question: why def _bigbird_block_rand_mask_with_head is not modified for this pytorch BigBird file ..?

sanchit-gandhi

Very cool @Bearnardd - thanks for jumping on fixing the PyTorch model so quickly! Just a few small suggestions / fixes from me!

src/transformers/models/big_bird/modeling_big_bird.py

sanchit-gandhi · 2023-05-02T17:29:17Z

src/transformers/models/big_bird/modeling_big_bird.py

@@ -1054,7 +1060,7 @@ def _get_rand_attn_plan(from_seq_length, from_block_size, num_rand_blocks):

    @staticmethod
    def _bigbird_block_rand_mask(
-        from_seq_length, to_seq_length, from_block_size, to_block_size, num_rand_blocks, last_idx=-1
+        from_seq_length, to_seq_length, from_block_size, to_block_size, num_rand_blocks, deterministic, last_idx=-1


Wonder if it's better to make this a non-static method? E.g. so that we can do self.training from within the method and not have to pass deterministic as an argument? Just a suggestion, fine either way for me.

@sanchit-gandhi yeah I was thinking about it and that is why I asked about @staticmethod for this method :)

Bearnardd · 2023-05-03T12:16:57Z

Hi @sanchit-gandhi! I have removed the static method as I think it is the best approach.

Bearnardd · 2023-05-03T12:20:26Z

Hi @Bearnardd Thank you for the PR.

I have one question: why def _bigbird_block_rand_mask_with_head is not modified for this pytorch BigBird file ..?

Thanks for the comment! To be honest I am not sure If I understand you correctly, since from what I can see this function is updated. Could you elaborate what exactly is missing?

ydshieh · 2023-05-03T12:23:27Z

Hi @Bearnardd Thank you for the PR.
I have one question: why def _bigbird_block_rand_mask_with_head is not modified for this pytorch BigBird file ..?

Thanks for the comment! To be honest I am not sure If I understand you correctly, since from what I can see this function is updated. Could you elaborate what exactly is missing?

Sorry, my bad. You are right :-)

ydshieh · 2023-05-03T12:32:31Z

tests/models/big_bird/test_modeling_big_bird.py

@Bearnardd Do you know why, before this PR, the test like test_inference_block_sparse_pretraining could get deterministic outputs? It's somehow strange to me that it is the case as the goal of this PR is to fix

random attention is used no matter whether we are in training/eval mode. Corect behaviour is that during inference (eval) we should not introduce any randomness, hence we random attention should not be used.

@ydshieh Yeah, sure. It has deterministic output because the random seed for random numpy operations is hardcoded which results in the same random attention indices every time you run tests.

Thanks for the explanation!

sanchit-gandhi

Awesome - thanks for fixing the big bird models @Bearnardd!

Bearnardd · 2023-05-06T17:49:17Z

cc @sgugger

sgugger

Thanks for your PR! Just two small nits and it should be good to merge.

sgugger · 2023-05-06T19:36:26Z

src/transformers/models/big_bird/modeling_big_bird.py

-                    self.max_seqlen, self.max_seqlen, from_block_size, to_block_size, n_rand_blocks, last_idx=1024
+                    self.max_seqlen,
+                    self.max_seqlen,
+                    from_block_size,
+                    to_block_size,
+                    n_rand_blocks,
+                    last_idx=1024,


There is no need for the restyle anymore here, might be some leftover from when there was deterministic passed?

Yup, exactly :)

sgugger · 2023-05-06T19:36:33Z

src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py

+                    self.max_seqlen,
+                    self.max_seqlen,
+                    from_block_size,
+                    to_block_size,
+                    n_rand_blocks,
+                    last_idx=1024,


…to fix_pytorch_big_bird_rand_attn

Bearnardd · 2023-05-07T22:32:12Z

I have pushed the changes @sgugger :)

…ce#23056) * fix random attention usage for bigbird and pegasus_bigbird * remove staticmethod, update tests target valus * revert style changes

fix random attention usage for bigbird and pegasus_bigbird

c8da8e0

Bearnardd mentioned this pull request Apr 29, 2023

Pytorch BigBird random attention #23055

Closed

sanchit-gandhi reviewed May 2, 2023

View reviewed changes

remove staticmethod, update tests target valus

ee62297

Bearnardd requested a review from sanchit-gandhi May 3, 2023 12:20

ydshieh reviewed May 3, 2023

View reviewed changes

sanchit-gandhi approved these changes May 5, 2023

View reviewed changes

sgugger reviewed May 6, 2023

View reviewed changes

Bearnardd added 2 commits May 7, 2023 17:23

Merge branch 'main' of https://github.com/huggingface/transformers in…

0a95079

…to fix_pytorch_big_bird_rand_attn

revert style changes

279eb14

sgugger approved these changes May 7, 2023

View reviewed changes

sgugger merged commit 6f8a028 into huggingface:main May 7, 2023

This was referenced May 15, 2023

Fix BigBirdForMaskedLM doctest #23369

Merged

Update Bigbird Pegasus tests #23431

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix random attention for pytorch's bigbird/pegasus_bigbird #23056

fix random attention for pytorch's bigbird/pegasus_bigbird #23056

Bearnardd commented Apr 28, 2023

HuggingFaceDocBuilderDev commented Apr 29, 2023 •

edited

Loading

ydshieh commented May 2, 2023

sanchit-gandhi left a comment

sanchit-gandhi May 2, 2023

Bearnardd May 3, 2023

Bearnardd commented May 3, 2023

Bearnardd commented May 3, 2023

ydshieh commented May 3, 2023

ydshieh May 3, 2023 •

edited

Loading

Bearnardd May 3, 2023 •

edited

Loading

ydshieh May 3, 2023

sanchit-gandhi left a comment

Bearnardd commented May 6, 2023

sgugger left a comment

sgugger May 6, 2023

Bearnardd May 7, 2023

sgugger May 6, 2023

Bearnardd commented May 7, 2023

fix random attention for pytorch's bigbird/pegasus_bigbird #23056

fix random attention for pytorch's bigbird/pegasus_bigbird #23056

Conversation

Bearnardd commented Apr 28, 2023

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Apr 29, 2023 • edited Loading

ydshieh commented May 2, 2023

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi May 2, 2023

Choose a reason for hiding this comment

Bearnardd May 3, 2023

Choose a reason for hiding this comment

Bearnardd commented May 3, 2023

Bearnardd commented May 3, 2023

ydshieh commented May 3, 2023

ydshieh May 3, 2023 • edited Loading

Choose a reason for hiding this comment

Bearnardd May 3, 2023 • edited Loading

Choose a reason for hiding this comment

ydshieh May 3, 2023

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Bearnardd commented May 6, 2023

sgugger left a comment

Choose a reason for hiding this comment

sgugger May 6, 2023

Choose a reason for hiding this comment

Bearnardd May 7, 2023

Choose a reason for hiding this comment

sgugger May 6, 2023

Choose a reason for hiding this comment

Bearnardd commented May 7, 2023

HuggingFaceDocBuilderDev commented Apr 29, 2023 •

edited

Loading

ydshieh May 3, 2023 •

edited

Loading

Bearnardd May 3, 2023 •

edited

Loading