-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug to save log dict #8887
Comments
Hi @qqueing. Can you elaborate exactly on what issue are you seeing? The If you were to access
Which is what you expect |
I registered val_imp_sum as 198487. but return value is 198487.0156. |
Seems like it's a floating point precision issue. Would you mind trying to reproduce it? This does print import os
import torch
from torch.utils.data import DataLoader, Dataset
from pytorch_lightning import LightningModule, Trainer
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
self.log("a", 198487)
def validation_step(self, *args, **kwargs):
self.log("b", 198487)
def configure_optimizers(self):
return torch.optim.SGD(self.layer.parameters(), lr=0.1)
def run():
train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
model = BoringModel()
trainer = Trainer(
default_root_dir=os.getcwd(),
limit_train_batches=3,
limit_val_batches=3,
num_sanity_val_steps=0,
max_epochs=1,
weights_summary=None,
progress_bar_refresh_rate=0,
)
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
print({k: v.item() for k, v in trainer.callback_metrics.items()})
if __name__ == "__main__":
run() |
Dear @qqueing, Closing this issue as it doesn't seem related to Lightning and carmocca showed a working example. Best, |
🐛 Bug
To Reproduce
but return result is
'val_imp_sum': tensor(198487.0156, device='cuda:0'),
inner result class maybe has cum batch size, but not matched information.
Expected behavior
but return result is
'val_imp_sum': tensor(198487.000, device='cuda:0'),
Environment
conda
,pip
, source):torch.__config__.show()
:Additional context
1.3.8 version works fine. but 1.4.2 versions don't work.
The text was updated successfully, but these errors were encountered: