Add `EvaluationDistributedSampler` and examples on distributed evaluation #1886

SkafteNicki · 2023-07-06T10:45:26Z

What does this PR do?

The original issue is about if we should implement a join context such that metrics could be evaluated on uneven number of samples in distributed settings. Just to remind, we normally discourage users from evaluating in distributed because the default distributed sampler from Pytorch will add additional samples to make all processes do even work, which messes with results.

After investigating this issue, it seems that we do not need a join context at all due to the custom synchronization we have for metrics. To understand this we need to look at the two different states we can have: tensor state and list of tensor states.

For tensor states the logic is fairly simple: even if rank 0 is evaluated on more samples or more batches than rank 1, we still only need to do one all-gather operation regardless of how many samples/batches each rank has seen.

For list states we need are saved by the custom logic we have. Imaging that rank 0 state is a list of two tensors [t_01, t_02] and rank 1 state is a list of one tensor [t_11] (rank 0 have seen one more batch than rank 1). We list states are encountered internally we make sure to concatenate the states into one tensor to not need to call allgather for each tensor in the list

torchmetrics/src/torchmetrics/metric.py

Lines 418 to 419 in 879595d

    
           if reduction_fn == dim_zero_cat and isinstance(input_dict[attr], list) and len(input_dict[attr]) > 1: 
        
               input_dict[attr] = [dim_zero_cat(input_dict[attr])]

such after this each state is a single tensor t_0 and t_1 but clearly t_0.shape != t_1.shape. Again, internally we deal with this by padding to same size and then doing a all gather:

torchmetrics/src/torchmetrics/utilities/distributed.py

Lines 136 to 148 in 879595d

    
           # 3. If not, we need to pad each local tensor to maximum size, gather and then truncate 
        
           pad_dims = [] 
        
           pad_by = (max_size - local_size).detach().cpu() 
        
           for val in reversed(pad_by): 
        
               pad_dims.append(0) 
        
               pad_dims.append(val.item()) 
        
           result_padded = F.pad(result, pad_dims) 
        
           gathered_result = [torch.zeros_like(result_padded) for _ in range(world_size)] 
        
           torch.distributed.all_gather(gathered_result, result_padded, group) 
        
           for idx, item_size in enumerate(local_sizes): 
        
               slice_param = [slice(dim_size) for dim_size in item_size] 
        
               gathered_result[idx] = gathered_result[idx][slice_param] 
        
           return gathered_result

Thus in both cases, even if one rank sees more samples/batches, we still do the same number of distributed operations per rank, which should mean that everything works.

To highlight this feature of TM this PR does a couple of things:

Introduce a new EvaluationDistributedSampler that does not add extra samplers. Thus, users can use this as a drop in replacement for any DistributedSampler if they want to do proper distributed evaluation (else they just need to secure that number of samples are even divisible by the number of processes).
Add unittests that supports the above
Add example on how to do this distributed evaluation in both lightning + standard torch

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

📚 Documentation preview 📚: https://torchmetrics--1886.org.readthedocs.build/en/1886/

for more information, see https://pre-commit.ci

codecov · 2023-07-26T15:39:52Z

Codecov Report

Merging #1886 (311bce3) into master (29f3289) will decrease coverage by 0%.
The diff coverage is 50%.

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #1886   +/-   ##
======================================
- Coverage      87%     87%   -0%     
======================================
  Files         270     270           
  Lines       15581   15592   +11     
======================================
+ Hits        13483   13488    +5     
- Misses       2098    2104    +6

justusschock

can we please add tests for validation and training as well? And maybe an fsdp test? Also some notes on caveats might be good to add to the sampler docs

justusschock · 2023-08-21T10:11:04Z

src/torchmetrics/utilities/distributed.py

+        super().__init__(dataset=dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle, seed=seed)
+
+        len_dataset = len(self.dataset)  # type: ignore[arg-type]
+        if not self.drop_last and len_dataset % self.num_replicas != 0:


the issue with this that it wouldn't necessarily work with validation, since not all ranks would reach the same distributed function calls and therefore time out which would kill the entire process. Also this would never work with FSDP, since some ranks have a batch more and for fsdp, not all processes would reach the forward syncing points also resulting in timeouts.

I agree that in the context of Lightning this wouldn't work well, as it does not support Join (Lightning-AI/pytorch-lightning#3325)
FSDP also doesn't support join afaik (pytorch/pytorch#64683)

But outside Lightning, and taking FSDP out of the equation, I agree this can work and is a good utility to have IMO. It also suits the metric design well, since synchronization is only necessary when all processes have finished collecting their statistics and .compute() can be called.

justusschock · 2023-08-21T10:12:47Z

calling @awaelchli for distributed review :)

SkafteNicki · 2023-08-21T13:07:38Z

can we please add tests for validation and training as well? And maybe an fsdp test? Also some notes on caveats might be good to add to the sampler docs

You are right that we need to test this feature better to clearly state the limitations.
I am going to remove it from the 1.1 milestone to future because it is not important to get done right now.

SkafteNicki · 2023-08-21T13:08:40Z

Converted to draft until better tested.

docs/source/references/utilities.rst

src/torchmetrics/utilities/distributed.py

awaelchli · 2023-08-21T19:03:45Z

src/torchmetrics/utilities/distributed.py

+        super().__init__(dataset=dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle, seed=seed)
+
+        len_dataset = len(self.dataset)  # type: ignore[arg-type]
+        if not self.drop_last and len_dataset % self.num_replicas != 0:


I agree that in the context of Lightning this wouldn't work well, as it does not support Join (Lightning-AI/pytorch-lightning#3325)
FSDP also doesn't support join afaik (pytorch/pytorch#64683)

But outside Lightning, and taking FSDP out of the equation, I agree this can work and is a good utility to have IMO. It also suits the metric design well, since synchronization is only necessary when all processes have finished collecting their statistics and .compute() can be called.

awaelchli · 2023-08-21T19:18:31Z

src/torchmetrics/utilities/distributed.py

+
+    """
+
+    def __init__(


In Lightning we have a very similar class: https://github.com/Lightning-AI/lightning/blob/fbdbe632c67b05158804b52f4345944781ca4f07/src/lightning/pytorch/overrides/distributed.py#L194

I think the main difference is that yours respects the setting drop_last. I'm not sure why we have the __iter__ overridden there but if you are interested you can compare the two.

awaelchli · 2023-08-21T19:22:08Z

tests/unittests/bases/test_ddp.py

+            metric_class=metric_class,
+        ),
+        range(NUM_PROCESSES),
+    )


In addition, a unit test for just the sampler alone could be useful, one that doesn't launch processes (not needed) but rather just assert the indices returned on each rank match the expectation, e.g.:

sampler = EvaluationDistributedSampler(dataset, rank=0, num_replicas=3, drop_last=...) assert list(iter(sampler)) == .... sampler = EvaluationDistributedSampler(dataset, rank=2, num_replicas=3, drop_last=...) assert list(iter(sampler)) == ....

and so on to test all edge cases.

Co-authored-by: Adrian Wälchli <[email protected]>

Borda · 2024-05-21T09:05:06Z

@SkafteNicki, what is missing here to make it land? 🐿️

Borda · 2024-07-22T07:59:33Z

@SkafteNicki this seems to be pending for a while, shall we continue? 🐰

SkafteNicki · 2024-09-07T10:27:56Z

Closing as while this would be a nice to have it is probably too limited in scope to provide enough value + it is not a core feature of torchmetrics.

SkafteNicki and others added 14 commits July 6, 2023 12:31

distributed sampler + example

6bade5b

add tests

e4a0992

Merge branch 'master' into distributed

fa8034f

Merge branch 'master' into distributed

cb4d6c8

changelog

af52d07

update api description

64f4899

update documentation

c594de9

lightning integration

f16955e

more documentation

4e949f4

update example

248a08c

improve testing

6228f76

improve example

9d78350

revert some

8171502

update tests

de2c0cf

SkafteNicki added the enhancement New feature or request label Jul 26, 2023

SkafteNicki added this to the v1.1.0 milestone Jul 26, 2023

SkafteNicki added 2 commits July 26, 2023 16:20

improve testing

7e267e9

Merge branch 'master' into distributed

4983914

SkafteNicki marked this pull request as ready for review July 26, 2023 14:21

SkafteNicki requested review from lantiga, Borda, justusschock and stancld as code owners July 26, 2023 14:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

983785a

for more information, see https://pre-commit.ci

SkafteNicki changed the title ~~Investigating distributed com~~ Add EvaluationDistributedSampler and examples on distributed evaluation Jul 26, 2023

SkafteNicki added 3 commits July 26, 2023 16:26

reorder test

1939942

fix mistakes

f728b52

fix inputs in tests

c5a4b3e

mergify bot added the has conflicts label Jul 28, 2023

Merge branch 'master' into distributed

f4d5d44

mergify bot added the ready label Aug 21, 2023

SkafteNicki added 3 commits August 21, 2023 11:42

fix headings

a8adc34

Merge branch 'master' into distributed

26ed08d

fix mypy

311bce3

justusschock reviewed Aug 21, 2023

View reviewed changes

justusschock requested a review from awaelchli August 21, 2023 10:13

Borda mentioned this pull request Aug 21, 2023

refactor path to root preventing circular import Lightning-AI/pytorch-lightning#18357

Merged

7 tasks

SkafteNicki modified the milestones: v1.1.0, future Aug 21, 2023

SkafteNicki marked this pull request as draft August 21, 2023 13:08

mergify bot removed the ready label Aug 21, 2023

awaelchli reviewed Aug 21, 2023

View reviewed changes

Apply suggestions from code review

c8f5b0b

Co-authored-by: Adrian Wälchli <[email protected]>

Borda force-pushed the master branch from 696e673 to f6760b5 Compare August 22, 2023 20:19

Borda modified the milestones: future, v1.2.0 Aug 24, 2023

SkafteNicki modified the milestones: v1.2.0, v1.3.0 Sep 15, 2023

Borda force-pushed the master branch from 04684b9 to 80a7b68 Compare January 11, 2024 12:55

Borda modified the milestones: v1.3.0, v1.4.0 Jan 11, 2024

Borda force-pushed the master branch from 306bb3d to 4ed43e6 Compare March 14, 2024 12:36

SkafteNicki closed this Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `EvaluationDistributedSampler` and examples on distributed evaluation #1886

Add `EvaluationDistributedSampler` and examples on distributed evaluation #1886

SkafteNicki commented Jul 6, 2023 •

edited

Loading

codecov bot commented Jul 26, 2023 •

edited

Loading

justusschock left a comment

justusschock Aug 21, 2023

awaelchli Aug 21, 2023

justusschock commented Aug 21, 2023

SkafteNicki commented Aug 21, 2023

SkafteNicki commented Aug 21, 2023

awaelchli Aug 21, 2023

awaelchli Aug 21, 2023

awaelchli Aug 21, 2023 •

edited

Loading

Borda commented May 21, 2024

Borda commented Jul 22, 2024

SkafteNicki commented Sep 7, 2024

	if reduction_fn == dim_zero_cat and isinstance(input_dict[attr], list) and len(input_dict[attr]) > 1:
	input_dict[attr] = [dim_zero_cat(input_dict[attr])]

	# 3. If not, we need to pad each local tensor to maximum size, gather and then truncate
	pad_dims = []
	pad_by = (max_size - local_size).detach().cpu()
	for val in reversed(pad_by):
	pad_dims.append(0)
	pad_dims.append(val.item())
	result_padded = F.pad(result, pad_dims)
	gathered_result = [torch.zeros_like(result_padded) for _ in range(world_size)]
	torch.distributed.all_gather(gathered_result, result_padded, group)
	for idx, item_size in enumerate(local_sizes):
	slice_param = [slice(dim_size) for dim_size in item_size]
	gathered_result[idx] = gathered_result[idx][slice_param]
	return gathered_result

Add EvaluationDistributedSampler and examples on distributed evaluation #1886

Add EvaluationDistributedSampler and examples on distributed evaluation #1886

Conversation

SkafteNicki commented Jul 6, 2023 • edited Loading

What does this PR do?

Did you have fun?

codecov bot commented Jul 26, 2023 • edited Loading

Codecov Report

justusschock left a comment

Choose a reason for hiding this comment

justusschock Aug 21, 2023

Choose a reason for hiding this comment

awaelchli Aug 21, 2023

Choose a reason for hiding this comment

justusschock commented Aug 21, 2023

SkafteNicki commented Aug 21, 2023

SkafteNicki commented Aug 21, 2023

awaelchli Aug 21, 2023

Choose a reason for hiding this comment

awaelchli Aug 21, 2023

Choose a reason for hiding this comment

awaelchli Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

Borda commented May 21, 2024

Borda commented Jul 22, 2024

SkafteNicki commented Sep 7, 2024

Add `EvaluationDistributedSampler` and examples on distributed evaluation #1886

Add `EvaluationDistributedSampler` and examples on distributed evaluation #1886

SkafteNicki commented Jul 6, 2023 •

edited

Loading

codecov bot commented Jul 26, 2023 •

edited

Loading

awaelchli Aug 21, 2023 •

edited

Loading