bug to save log dict #8887

qqueing · 2021-08-13T11:44:54Z

🐛 Bug

To Reproduce

    def validation_epoch_end(self, metrics):
        total_metrics = {}
        total_metrics["val_imp_sum"] = 198487
        self.log_dict(total_metrics)

but return result is 'val_imp_sum': tensor(198487.0156, device='cuda:0'),

inner result class maybe has cum batch size, but not matched information.

results
Out[9]: {False, device(type='cuda', index=0), {'validation_epoch_end.val_imp_sum': ResultMetric('val_imp_sum', value=372957088.0, cumulated_batch_size=1879.0)}}

Expected behavior

but return result is 'val_imp_sum': tensor(198487.000, device='cuda:0'),

Environment

PyTorch Lightning Version (e.g., 1.3.0): 1.4.2
PyTorch Version (e.g., 1.8) 1.9
Python version:
OS (e.g., Linux): linux
CUDA/cuDNN version: 10.2
GPU models and configuration:
How you installed PyTorch (conda, pip, source):
If compiling from source, the output of torch.__config__.show():
Any other relevant information:

Additional context

1.3.8 version works fine. but 1.4.2 versions don't work.

The text was updated successfully, but these errors were encountered:

carmocca · 2021-08-14T12:37:28Z

Hi @qqueing. Can you elaborate exactly on what issue are you seeing?

The results class you are observing is entirely internal and you shouldn't need to do anything with it. The internal values look right, as you are using the mean reduction and 372957088 / 1879 = 198487 .

If you were to access trainer.callback_metrics right after, you would see:

{'val_imp_sum': tensor(198487.)}

Which is what you expect

qqueing · 2021-08-15T13:08:19Z

I registered val_imp_sum as 198487. but return value is 198487.0156.
if return value is 198487, cum sum value is 372957073 not 372957088.
I don't know this mechnism...

carmocca · 2021-08-15T14:03:50Z

Seems like it's a floating point precision issue. Would you mind trying to reproduce it?

This does print 198487.0 on my machine.

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        self.log("a", 198487)

    def validation_step(self, *args, **kwargs):
        self.log("b", 198487)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=3,
        limit_val_batches=3,
        num_sanity_val_steps=0,
        max_epochs=1,
        weights_summary=None,
        progress_bar_refresh_rate=0,
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)

    print({k: v.item() for k, v in trainer.callback_metrics.items()})


if __name__ == "__main__":
    run()

tchaton · 2021-08-27T11:23:12Z

Dear @qqueing,

Closing this issue as it doesn't seem related to Lightning and carmocca showed a working example.

Best,
T.C

qqueing added bug Something isn't working help wanted Open to be worked on labels Aug 13, 2021

Borda added the logging Related to the `LoggerConnector` and `log()` label Aug 13, 2021

Borda assigned tchaton Aug 13, 2021

carmocca added working as intended Working as intended and removed bug Something isn't working help wanted Open to be worked on labels Aug 14, 2021

carmocca added the waiting on author Waiting on user action, correction, or update label Aug 14, 2021

qqueing mentioned this issue Aug 17, 2021

Strange visualization of the log during the test #8930

Closed

tchaton closed this as completed Aug 27, 2021

qqueing mentioned this issue Sep 27, 2021

logging precision issue #9713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug to save log dict #8887

bug to save log dict #8887

qqueing commented Aug 13, 2021 •

edited by Borda

Loading

carmocca commented Aug 14, 2021

qqueing commented Aug 15, 2021

carmocca commented Aug 15, 2021

tchaton commented Aug 27, 2021

bug to save log dict #8887

bug to save log dict #8887

Comments

qqueing commented Aug 13, 2021 • edited by Borda Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

carmocca commented Aug 14, 2021

qqueing commented Aug 15, 2021

carmocca commented Aug 15, 2021

tchaton commented Aug 27, 2021

qqueing commented Aug 13, 2021 •

edited by Borda

Loading