Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss render is NaN,The following problems occurred. I did not modify the training parameters. Is the training parameters wrong? #5

Open
manjidada opened this issue Jul 11, 2023 · 4 comments

Comments

@manjidada
Copy link

Traceback (most recent call last):
File "train.py", line 35, in
main()
File "train.py", line 32, in main
m.train(opt)
File "G:\work_document\python_work\L2G-NeRF-main\model\nerf.py", line 61, in train
if self.it%opt.freq.val==0: self.validate(opt,self.it)
File "G:\work_document\Tensorflow\Miniconda3\envs\L2G-NeRF\lib\site-
packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\work_document\python_work\L2G-NeRF-main\model\l2g_nerf.py", line 89, in validate
super().validate(opt,ep=ep)
File "G:\work_document\Tensorflow\Miniconda3\envs\L2G-NeRF\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\work_document\python_work\L2G-NeRF-main\model\base.py", line 154, in validate
loss = self.summarize_loss(opt,var,loss)
File "G:\work_document\python_work\L2G-NeRF-main\model\base.py", line 139, in summarize_loss
assert not torch.isnan(loss[key]),"loss {} is NaN".format(key)
AssertionError: loss render is NaN

@manjidada
Copy link
Author

python train.py --model=l2g_nerf --yaml=l2g_nerf_blender --group=exp_synthetic --name=l2g_lego --data.scene=lego --data.root=./data/blender/nerf_synthetic --camera.noise_r=0.07 --camera.n
oise_t=0.5
Process ID: 21456
[train.py] (PyTorch code for training NeRF/BARF/L2G_NeRF)
setting configurations...
loading options/base.yaml...
loading options/nerf_blender.yaml...
loading options/barf_blender.yaml...
loading options/l2g_nerf_blender.yaml...

  • H: 400
  • W: 400
  • arch:
    • density_activ: softplus
    • embedding_dim: 128
    • layers_feat: [None, 256, 256, 256, 256, 256, 256, 256, 256]
    • layers_rgb: [None, 128, 3]
    • layers_warp: [None, 256, 256, 256, 256, 256, 256, 6]
    • posenc:
      • L_3D: 10
      • L_view: 4
    • skip: [4]
    • skip_warp: [4]
    • tf_init: True
  • barf_c2f: [0.1, 0.5]
  • batch_size: None
  • camera:
    • model: perspective
    • ndc: False
    • noise: True
    • noise_r: 0.07
    • noise_t: 0.5
  • cpu: False
  • data:
    • augment:
    • bgcolor: 1
    • center_crop: None
    • dataset: blender
    • image_size: [400, 400]
    • num_workers: 4
    • preload: True
    • root: ./data/blender/nerf_synthetic
    • scene: lego
    • train_sub: None
    • val_on_test: False
    • val_sub: 4
  • device: cuda:0
  • error_map_size: None
  • freq:
    • ckpt: 5000
    • scalar: 200
    • val: 2000
    • vis: 1000
  • gpu: 0
  • group: exp_synthetic
  • load: None
  • loss_weight:
    • global_alignment: 2
    • render: 0
    • render_fine: None
  • max_epoch: None
  • max_iter: 200000
  • model: l2g_nerf
  • name: l2g_lego
  • nerf:
    • density_noise_reg: None
    • depth:
      • param: metric
      • range: [2, 6]
    • fine_sampling: False
    • rand_rays: 1024
    • sample_intvs: 128
    • sample_intvs_fine: None
    • sample_stratified: True
    • setbg_opaque: False
    • view_dep: True
  • optim:
    • algo: Adam
    • lr: 0.0005
    • lr_end: 0.0001
    • lr_pose: 0.001
    • lr_pose_end: 1e-08
    • sched:
      • gamma: None
      • type: ExponentialLR
    • sched_pose:
      • gamma: None
      • type: ExponentialLR
    • test_iter: 100
    • test_photo: True
    • warmup_pose: None
  • output_path: output/exp_synthetic/l2g_lego
  • output_root: output
  • resume: False
  • seed: 0
  • tb:
    • num_images: [4, 8]
  • trimesh:
    • chunk_size: 16384
    • range: [-1.2, 1.2]
    • res: 128
    • thres: 25.0
  • visdom:
    • cam_depth: 0.5
    • port: 8600
    • server: localhost
  • yaml: l2g_nerf_blender
    existing options file found (identical)
    Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
    Loading model from: G:\work_document\Tensorflow\Miniconda3\envs\L2G-NeRF\lib\site-packages\lpips\weights\v0.1\alex.pth
    loading training data...
    number of samples: 100
    loading test data...
    number of samples: 4
    building networks...
    setting up optimizers...
    initializing weights from scratch...
    setting up visualizers...
    visdom port (8600) not open, retry? (y/n) n
    Setting up a new session...

@rover-xingyu
Copy link
Owner

Sorry, we can not reproduce the issue. You could change the random seed to see if it happens again.

@Chaphlagical
Copy link

Sorry, we can not reproduce the issue. You could change the random seed to see if it happens again.

@manjidada I got the same issue on blender dataset. It seems depth range will become 0 when entering the iteration loop like above, which is weird.

Therefore, I force the depth range to be scalar instead of torch tensor during validation and it work for me. Like:

class Graph(nerf.Graph):
    ...
    def forward(self, opt, var, mode=None):
        # rescale the size of the scene condition on the pose
        if opt.data.dataset == "blender":
            depth_min, depth_max = opt.nerf.depth.range
            position = camera.Pose().invert(
                self.optimised_training_poses.weight.data.detach().clone().view(-1, 3, 4))[..., -1]
            diameter = ((position[self.idx_grid[..., 0]] -
                        position[self.idx_grid[..., 1]]).norm(dim=-1)).max()
            depth_min_new = (depth_min/(depth_max+depth_min))*diameter
            depth_max_new = (depth_max/(depth_max+depth_min))*diameter
            if mode in ["train"]:
                opt.nerf.depth.range = [
                    depth_min_new, depth_max_new]
            else:
                # force scalar
                opt.nerf.depth.range = [
                    depth_min_new.item(), depth_max_new.item()]
        ...
    ....

    @torch.no_grad()
    def validate(self, opt, ep=None):
        pose, pose_GT = self.get_all_training_poses(opt)
        _, self.graph.sim3 = self.prealign_cameras(opt, pose, pose_GT)
        # force scalar
        if torch.is_tensor(opt.nerf.depth.range[0]):
            opt.nerf.depth.range[0] = opt.nerf.depth.range[0].item()
        if torch.is_tensor(opt.nerf.depth.range[1]):
            opt.nerf.depth.range[1] = opt.nerf.depth.range[1].item()
        super().validate(opt, ep=ep)

Hope for the official solution.

@rover-xingyu
Copy link
Owner

Thanks for pointing this out. I rescale the size of the blender objects(near/far) condition on the optimized poses as shown here. I guess the depth range becoming 0 is caused by the diameter becoming 0, but it is weird as the diameter is determined by the maximum distance between the two cameras. Does anyone have any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants