Falcon: Add RoPE scaling #25878

gante · 2023-08-30T17:42:33Z

What does this PR do?

In the same spirit as #24653, adds RoPE scaling to Falcon. It also borrows a few changes from #25740, to allow for codellama-style scaling.

In addition to the changes above, it also adds the max_position_embeddings to the config attributes, needed for one of the scaling strategies.

Python script to validate these changes: https://pastebin.com/SJmUpDU1
Before this PR 👉 outputs gibberish
After this PR 👉 recognizes that the super large prompt is about llama 2

gante · 2023-08-30T17:43:35Z

src/transformers/models/falcon/configuration_falcon.py

+        if self.rope_scaling is None:
+            return
+
+        if self.rotary:


This function is copy/paste from #24653 except for this if block

gante · 2023-08-30T17:44:27Z

src/transformers/models/falcon/modeling_falcon.py

-            t = torch.arange(total_length, device=device, dtype=self.inv_freq.dtype)
-            freqs = torch.einsum("i,j->ij", t, self.inv_freq)
-            emb = torch.cat((freqs, freqs), dim=-1).to(device)
+    def _set_cos_sin_cache(self, seq_len, device, dtype):


separates the logic for the creation of self.cos_cached and self.sin_cached, since these are the only bits the other scaling factors need to overwrite

Makes sense to me!

HuggingFaceDocBuilderDev · 2023-08-30T18:04:57Z

The documentation is not available anymore as the PR was closed or merged.

Rocketknight1

LGTM, details seem correct when I went through it and it shouldn't have any backward compatibility implications. Still, please check with the authors before merging it!

Rocketknight1 · 2023-08-31T15:41:54Z

src/transformers/models/falcon/modeling_falcon.py

-            t = torch.arange(total_length, device=device, dtype=self.inv_freq.dtype)
-            freqs = torch.einsum("i,j->ij", t, self.inv_freq)
-            emb = torch.cat((freqs, freqs), dim=-1).to(device)
+    def _set_cos_sin_cache(self, seq_len, device, dtype):


Makes sense to me!

src/transformers/models/falcon/configuration_falcon.py

amyeroberts

Thanks for adding this! 🤗

Just a few implementation questions / requests.

src/transformers/models/falcon/configuration_falcon.py

amyeroberts · 2023-08-31T17:01:03Z

src/transformers/models/falcon/configuration_falcon.py

+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.


Why isn't this part of the rope scaling dict?

Technically this isn't a rope scaling parameter, it is a constant used to compute the embeddings. As such, I agree with the original decision to keep it separate, in the CodeLlama PR :)

Coincidently, it can also be used for scaling (i.e. by increasing it then fine-tuning the resulting model).

src/transformers/models/falcon/configuration_falcon.py

amyeroberts · 2023-08-31T17:08:39Z

src/transformers/models/falcon/modeling_falcon.py

+        if self.config.rope_scaling is None:
+            rotary_emb = FalconRotaryEmbedding(
+                self.head_dim,
+                base=self.config.rope_theta,


I realise this is copied elsewhere and not from this PR - but this is an indication to me this parameter is misnamed. If no rope scaling is happening, then a rope parameter shouldn't be used here

(see comment above, I believe the two questions are related)

amyeroberts · 2023-08-31T17:30:05Z

src/transformers/models/falcon/modeling_falcon.py

+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.seq_len_cached = seq_len
+        t = torch.arange(seq_len, device=device, dtype=self.inv_freq.dtype)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)


Could you re-write this with classic matrix operations? Unfortunately einsum creates issues when using traced models atm :(

amyeroberts · 2023-08-31T17:31:53Z

src/transformers/models/falcon/modeling_falcon.py

+
+    def _set_cos_sin_cache(self, seq_len, device, dtype):
+        self.seq_len_cached = seq_len
+        t = torch.arange(seq_len, device=device, dtype=self.inv_freq.dtype)


Same here re the parameter 't'

amyeroberts · 2023-08-31T17:32:00Z

src/transformers/models/falcon/modeling_falcon.py

+        # This line is the only difference from FalconRotaryEmbedding._set_cos_sin_cache
+        t = t / self.scaling_factor
+
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)


And here re einsum

amyeroberts · 2023-08-31T17:33:22Z

src/transformers/models/falcon/modeling_falcon.py

+            base = self.base * (
+                (self.scaling_factor * seq_len / self.max_position_embeddings) - (self.scaling_factor - 1)
+            ) ** (self.head_dim / (self.head_dim - 2))
+            inv_freq = 1.0 / (base ** (torch.arange(0, self.head_dim, 2).float().to(device) / self.head_dim))


Do we want to conditionally call .float here too?

Yes, we want to keep these auxiliary calculations in fp32 and then cast down the final results (sin_cache and cos_cache) if needed :)

amyeroberts · 2023-08-31T17:33:42Z

src/transformers/models/falcon/modeling_falcon.py

+        t = torch.arange(seq_len, device=device, dtype=self.inv_freq.dtype)
+        freqs = torch.einsum("i,j->ij", t, self.inv_freq)


Same comments about 't' and einsum

gante · 2023-09-01T11:05:40Z

t and einsum PR comments 👉 as discussed offline, this will be fixed in a follow-up PR

I've also tested this PR against the thing I wanted to test, it is working correctly with and without RoPE scaling!

Merging :)

Add RoPE scaling to Falcon

0be7cde

gante requested review from Rocketknight1 and amyeroberts August 30, 2023 17:42

gante commented Aug 30, 2023

View reviewed changes

Rocketknight1 approved these changes Aug 31, 2023

View reviewed changes

amyeroberts approved these changes Aug 31, 2023

View reviewed changes

PR reviews

5fd8eda

gante merged commit 53e2fd7 into huggingface:main Sep 1, 2023

gante deleted the falcon_rope_scaling branch September 1, 2023 11:05

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

Falcon: Add RoPE scaling (huggingface#25878)

5ae5e90

blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023

Falcon: Add RoPE scaling (huggingface#25878)

f77498d

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023

Falcon: Add RoPE scaling (huggingface#25878)

b442c9f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon: Add RoPE scaling #25878

Falcon: Add RoPE scaling #25878

gante commented Aug 30, 2023 •

edited

Loading

gante Aug 30, 2023

gante Aug 30, 2023

Rocketknight1 Aug 31, 2023

HuggingFaceDocBuilderDev commented Aug 30, 2023 •

edited

Loading

Rocketknight1 left a comment

Rocketknight1 Aug 31, 2023

amyeroberts left a comment

amyeroberts Aug 31, 2023

gante Aug 31, 2023 •

edited

Loading

amyeroberts Aug 31, 2023

gante Aug 31, 2023

amyeroberts Aug 31, 2023

amyeroberts Aug 31, 2023

amyeroberts Aug 31, 2023

amyeroberts Aug 31, 2023

gante Aug 31, 2023

amyeroberts Aug 31, 2023

gante commented Sep 1, 2023

		rope_theta (`float`, optional, defaults to 10000.0):
		The base period of the RoPE embeddings.

		t = torch.arange(seq_len, device=device, dtype=self.inv_freq.dtype)
		freqs = torch.einsum("i,j->ij", t, self.inv_freq)

Falcon: Add RoPE scaling #25878

Falcon: Add RoPE scaling #25878

Conversation

gante commented Aug 30, 2023 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 30, 2023 • edited Loading

Rocketknight1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante commented Sep 1, 2023

gante commented Aug 30, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 30, 2023 •

edited

Loading

gante Aug 31, 2023 •

edited

Loading