Fix gather when collecting 'num_input_tokens_seen' #31974

CodeCreator · 2024-07-15T12:16:52Z

What does this PR do?

Following from #29099, the allgather for num_input_tokens_seen is still getting stuck with distributed training, since the tensors are still on CPU and have not been moved to device in _prepare_inputs yet. This still results in torch.distributed errors such as:

File ".../site-packages/torch/distributed/distributed_c10d.py", line 2948, in all_gather_into_tensor
     work = group._allgather_base(output_tensor, input_tensor, opts)
ValueError: Tensors must be CUDA and dense

This PR simply moves the token count to self.args.device and then moves them back to cpu after gathering.
This problem was also mentioned in the discussion of issue #28791, but the issue was closed when the padding was fixed.

Who can review?

@pacman100
@muellerzr

muellerzr

Note that pacman no longer works at HF, so you can ping me in the future :)

This makes sense since we need to eventually gather. cc @SunMarc

muellerzr · 2024-07-16T13:19:55Z

can you run pip install -e .[quality]; make style; make quality?

SunMarc

Nice ! Thanks for fixing !

CodeCreator · 2024-07-16T14:28:16Z

@muellerzr Thanks for the pointer! I ran the code formatter as suggested.

HuggingFaceDocBuilderDev · 2024-07-16T17:17:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for fixing!

* Move token count to device before gathering * Run 'make style; make quality'

CodeCreator added 2 commits July 15, 2024 07:51

Move token count to device before gathering

dd59ef0

Merge branch 'huggingface:main' into main

7c828d5

muellerzr approved these changes Jul 16, 2024

View reviewed changes

muellerzr requested review from amyeroberts and SunMarc July 16, 2024 13:20

SunMarc approved these changes Jul 16, 2024

View reviewed changes

CodeCreator added 2 commits July 16, 2024 16:23

Merge branch 'huggingface:main' into main

6eac544

Run 'make style; make quality'

cbbca13

amyeroberts approved these changes Jul 16, 2024

View reviewed changes

amyeroberts merged commit e391706 into huggingface:main Jul 16, 2024
21 checks passed

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jul 19, 2024

Fix gather when collecting 'num_input_tokens_seen' (huggingface#31974)

ce521b7

* Move token count to device before gathering * Run 'make style; make quality'

MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024

Fix gather when collecting 'num_input_tokens_seen' (huggingface#31974)

4ebc739

* Move token count to device before gathering * Run 'make style; make quality'

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 24, 2024

Fix gather when collecting 'num_input_tokens_seen' (huggingface#31974)

6ebad8b

* Move token count to device before gathering * Run 'make style; make quality'

itazap pushed a commit that referenced this pull request Jul 25, 2024

Fix gather when collecting 'num_input_tokens_seen' (#31974)

fb9456d

* Move token count to device before gathering * Run 'make style; make quality'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gather when collecting 'num_input_tokens_seen' #31974

Fix gather when collecting 'num_input_tokens_seen' #31974

CodeCreator commented Jul 15, 2024

muellerzr left a comment

muellerzr commented Jul 16, 2024

SunMarc left a comment

CodeCreator commented Jul 16, 2024

HuggingFaceDocBuilderDev commented Jul 16, 2024

amyeroberts left a comment

Fix gather when collecting 'num_input_tokens_seen' #31974

Fix gather when collecting 'num_input_tokens_seen' #31974

Conversation

CodeCreator commented Jul 15, 2024

What does this PR do?

Who can review?

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr commented Jul 16, 2024

SunMarc left a comment

Choose a reason for hiding this comment

CodeCreator commented Jul 16, 2024

HuggingFaceDocBuilderDev commented Jul 16, 2024

amyeroberts left a comment

Choose a reason for hiding this comment