Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silently failing DDP syncing when initializing Metric with jsonargparse #1651

Merged
merged 7 commits into from
Mar 28, 2023

Conversation

basveeling
Copy link
Contributor

@basveeling basveeling commented Mar 24, 2023

Metrics silently fail to sync across gpus when initialized with jsonargparse (as used in the LightningCLI)

This PR fixes this issue.

This happens due the following combination of factors.

  • jsonargparse has a docstring parsing function enabled when installed with pip install jsonargparse[signatures]
  • torchmetrics.Metric has a docstring that mentions an optional distributed_available_fn
  • If distributed_available_fn is not set in **kwargs, Metric.__init__ sets a default pytorch function which enables ddp syncing of metrics
  • When a metric is initialized in a yaml config, jsonargparse recognizes the distributed_available_fn field in the docstring of Metric.__init__ and passes a default value of distributed_available_fn=None when initializing the metric object.
  • Hence, any subclass of Metric initialized by jsonargparse has distributed_available_fn = None. These metrics silently fail to sync across gpus.

What does this PR do?

Fix broken ddp syncing of metrics.

Did you have fun?

I had a fun two days debugging this issue :-).

…ingCLI)

`Metric`s silently fail to sync across gpus when initialized with jsonargparse (as used in the LightningCLI)

This PR fixes this issue.


This happens due the following combination of factors.
- jsonargparse has a docstring parsing function enabled when installed with `pip install jsonargparse[signatures]`
- `torchmetrics.Metric` has a docstring that mentions an optional` distributed_available_fn`
- If `distributed_available_fn` is not set in `**kwargs`, `Metric.__init__` sets a default pytorch function which enables ddp syncing of metrics\
- When a metric is initialized in a yaml config, jsonargparse recognizes the `distributed_available_fn` field in the docstring and passes a default value of `distributed_available_fn=None` 
- Hence, any subclass of `Metric` initialized by jsonargparse has `distributed_available_fn = None`. These metrics silently fail to sync across gpus.
@SkafteNicki SkafteNicki added the bug / fix Something isn't working label Mar 27, 2023
@SkafteNicki SkafteNicki added this to the v0.12 milestone Mar 27, 2023
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Merging #1651 (fea4709) into master (82a5f6d) will decrease coverage by 1%.
The diff coverage is 100%.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1651   +/-   ##
======================================
- Coverage      88%     88%   -1%     
======================================
  Files         228     228           
  Lines       12448   12448           
======================================
- Hits        10960   10897   -63     
- Misses       1488    1551   +63     

@mergify mergify bot added the ready label Mar 27, 2023
@SkafteNicki SkafteNicki enabled auto-merge (squash) March 28, 2023 09:39
@SkafteNicki SkafteNicki disabled auto-merge March 28, 2023 09:39
@SkafteNicki SkafteNicki enabled auto-merge (squash) March 28, 2023 09:41
@mergify mergify bot removed the has conflicts label Mar 28, 2023
@Borda Borda disabled auto-merge March 28, 2023 12:47
@Borda Borda merged commit 4d5147a into Lightning-AI:master Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants