Infer step size for Embeddings #1647

KuuCi · 2024-11-07T00:57:10Z

Theres two cases:

For in batch negatives, total number of negatives is batch size - 1. And pos_step_size is 2.
For hard negatives, each sample has a list of negatives that can be as long as you like. In this case you have to change the pos_step_size.

We can remove setting step_size and autoinfer without needing the pos_step_size. We will need to look at the first batch to determine how many hard negatives. If there are hard negatives, we will use that - 1 as the step size. Otherwise, we will defer to batch negatives, setting step_size = 2

Testing:

embedding-ft-no-pos-zXOtvz (off of this branch, removed pos_step_size keyword):

embedding-ft-pos-kl2fPy (off foundry main pos_step_size = 2):

embedding-ft-pos-neg-gMJSGx (off foundry main pos_step_size = 21)

embedding-ft-no-pos-neg-3GoEms (off of this branch, removed pos_step_size keyword):

jacobfulano

Small comments

llmfoundry/models/llm_embed/modeling_llm_embed.py

…r query

mrdrozdov · 2024-11-08T01:00:07Z

I really like this idea. I do have a couple soft suggestions:

Rather than self._first_batch_seen can you simply set self.step_size = None during init? Then check if self.step_size is None instead of using first_batch_seen.
It could be more tidy to just make step_size part of the batch. I think you could do something like batch["step_size"] = step_size, so that it is accessible in compute_score? That being said, I'm not familiar enough to foundry to know if this causes complications, and you might need to do something like batch["step_size"] = torch.tensor([step_size] * batch_size, dtype=torch.long). If you go this route, then you can ignore (1).

Anyway, the current approach is fine too, but wanted to share just in case.

llmfoundry/models/llm_embed/modeling_llm_embed.py

mrdrozdov

This looks mostly good to me!

A few requests for integration tests:

Confirm expected behavior when training data has no hard negatives.
Confirm when using hard negatives = 1.
Confirm when using hard negatives = 3 (or some number besides 1).

Maybe you've already done this? In which case I will switch to approve after you address my minor comments.

llmfoundry/models/llm_embed/modeling_llm_embed.py

KuuCi · 2024-11-12T07:25:09Z

Tested with the following runs:

0 negatives: embedding-ft-no-pos-neg-0-7jyk87

1 negative: embedding-ft-no-pos-neg-1-zAeAzg

20 negatives: embedding-ft-no-pos-neg-20-EicOP8

All looks good

mrdrozdov

v-chen_data added 4 commits November 6, 2024 13:40

infer step size

c99bc2d

default 2

b293234

step size + 1

86777ee

precommit

d17dbf8

jacobfulano reviewed Nov 7, 2024

View reviewed changes

llmfoundry/models/llm_embed/modeling_llm_embed.py Show resolved Hide resolved

v-chen_data added 4 commits November 7, 2024 14:09

rm minus one

5c649b2

log

e1acc0c

verbose

3c17069

working

dbcdd4a

KuuCi requested review from andrewdrozdov, milocress and jacobfulano November 7, 2024 23:46

v-chen_data added 3 commits November 7, 2024 15:48

run on every batch to allow for different number of hard negatives pe…

7612d73

…r query

precommit

be1849e

revert last commit

55f0951

v-chen_data added 10 commits November 7, 2024 17:05

andrew comments

b12638d

rm -1

d836f4d

verbose

fb16371

verbose

0db2f5f

verbose

cd19aad

clean

00b2769

clean

2d57af3

verbose

4efe1ff

verbose

f9550b4

revert verbose

5bd98d2

KuuCi requested a review from mrdrozdov November 8, 2024 20:35

precommit

26c887a

KuuCi changed the title ~~Infer step size for Embeddings~~ Infer step size and gather_in_batch_negatives for Embeddings Nov 8, 2024

Merge branch 'main' into embedding-infer-step-size

5fb5a24

v-chen_data added 3 commits November 8, 2024 13:58

precommit

1adb602

precommit

b74e70f

precommit

c0d460a

mrdrozdov reviewed Nov 8, 2024

View reviewed changes

llmfoundry/models/llm_embed/modeling_llm_embed.py Outdated Show resolved Hide resolved

add gather back

058fb25

KuuCi marked this pull request as ready for review November 9, 2024 00:26

KuuCi requested a review from a team as a code owner November 9, 2024 00:26

KuuCi requested a review from mrdrozdov November 9, 2024 00:26

assert

3c4a87e

mrdrozdov suggested changes Nov 11, 2024

View reviewed changes

llmfoundry/models/llm_embed/modeling_llm_embed.py Outdated Show resolved Hide resolved

llmfoundry/models/llm_embed/modeling_llm_embed.py Show resolved Hide resolved

Merge branch 'main' into embedding-infer-step-size

ebe9c18

indices

1d7dedd

mrdrozdov approved these changes Nov 12, 2024

View reviewed changes

KuuCi changed the title ~~Infer step size and gather_in_batch_negatives for Embeddings~~ Infer step size for Embeddings Nov 12, 2024

Merge branch 'main' into embedding-infer-step-size

947e815

milocress approved these changes Nov 12, 2024

View reviewed changes

KuuCi merged commit 415ada2 into main Nov 12, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer step size for Embeddings #1647

Infer step size for Embeddings #1647

KuuCi commented Nov 7, 2024 •

edited

Loading

jacobfulano left a comment

mrdrozdov commented Nov 8, 2024

mrdrozdov left a comment

KuuCi commented Nov 12, 2024

mrdrozdov left a comment

Infer step size for Embeddings #1647

Infer step size for Embeddings #1647

Conversation

KuuCi commented Nov 7, 2024 • edited Loading

jacobfulano left a comment

Choose a reason for hiding this comment

mrdrozdov commented Nov 8, 2024

mrdrozdov left a comment

Choose a reason for hiding this comment

KuuCi commented Nov 12, 2024

mrdrozdov left a comment

Choose a reason for hiding this comment

KuuCi commented Nov 7, 2024 •

edited

Loading