Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre_train ValueError: Expected parameter logits #165

Open
Conyboy opened this issue Dec 30, 2024 · 1 comment
Open

pre_train ValueError: Expected parameter logits #165

Conyboy opened this issue Dec 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Conyboy
Copy link

Conyboy commented Dec 30, 2024

I am using the LOTSA dataset along with my own dataset (around 3GB) for pre-training. After training for a certain number of epochs, I encounter the following error.

File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/share/home/defaultTenant/caiyx/python_workspace/uni2ts/src/uni2ts/model/moirai/module.py", line 177, in forward
distr = self.distr_output.distribution(distr_param, loc=loc, scale=scale)
File "/share/home/defaultTenant/caiyx/python_workspace/uni2ts/src/uni2ts/distribution/_base.py", line 171, in distribution
distr = self._distribution(distr_params, validate_args=validate_args)
File "/share/home/defaultTenant/caiyx/python_workspace/uni2ts/src/uni2ts/distribution/mixture.py", line 182, in _distribution
weights=Categorical(
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/distributions/categorical.py", line 72, in init
super().init(batch_shape, validate_args=validate_args)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/distributions/distribution.py", line 71, in init
raise ValueError(
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 153, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 270, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 238, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
return func.get(opt, opt.class)(*args, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
out = func(*args, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/optimizer.py", line 91, in _use_grad
ret = func(self, *args, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/adamw.py", line 197, in step
loss = closure()
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
closure_result = closure()
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 137, in wrapper
return func.get(opt, opt.class)(*args, **kwargs)
File "/share/home/defaultTenant/caiyx/.conda/envs/uni2ts/lib/python3.10/site-packages/torch/optim/optimizer.py", line 487, in wrapper
out = func(*args, **kwargs)
ValueError: Expected parameter logits (Tensor of shape (32, 512, 128, 4)) of distribution Categorical(logits: torch.Size([32, 512, 128, 4])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[[[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
...,

I think the issue is with my own dataset, but I'm not sure what the specific problem is. Could you help me identify the exact issue with the dataset?

Below is the code I used to build the dataset.

import os
from pathlib import Path

import pandas as pd

from uni2ts.data.builder.simple import SimpleDatasetBuilder


def get_file_list(path="/mnt/c/Users/Administrator/PycharmProjects/uni2ts/mywork/finetune/build_data/data"):
    file_obj_list = []
    # 循环读取文件夹下所有文件夹里面的所有文件
    for root, dirs, files in os.walk(path):
        for dir_name in dirs:
            if dir_name == "extreme_data":
                continue
            dir_path = os.path.join(root, dir_name)
            for file in os.listdir(dir_path):
                file_obj_list.append((file,
                                      os.path.join(dir_path, file)))
    return file_obj_list


if __name__ == '__main__':
    storage_path = "/mnt/c/Users/Administrator/PycharmProjects/uni2ts/mywork/finetune/build_data/arrow_data"
    file_list = get_file_list()
    for file_name, file_path in file_list:
        df = pd.read_csv(file_path)
        df.set_index("datetime", inplace=True)
        drop_columns = []
        for col in ["code_x", "code_y", "code", "grid_connection_time", "forecast_id", "type_data"]:
            if col in df.columns:
                drop_columns.append(col)
        df.drop(columns=drop_columns, inplace=True)
        SimpleDatasetBuilder(dataset=file_name[0: -4], storage_path=Path(storage_path)).build_dataset(
            df=df,
            dataset_type="wide",
            offset=None,
            date_offset=None,
            freq="15min",
        )
        print(f"{file_name} done")
@Conyboy Conyboy added the bug Something isn't working label Dec 30, 2024
@chenghaoliu89
Copy link
Contributor

You can check this discussion thread #19. I suggest you to add +trainer.detect_anomaly=True flag during pre-training, the stacktrace message would be helpful to locate the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants