Runtime error #14

biaoyanf · 2024-04-09T06:25:12Z

Hi,

I run the training code:

python train.py --seed 2022 --batch-size 32 \
--num-epoch 3 --devices 0 \
--model-name roberta-large --ckpt-save-path ./ckpt/ \
--data-path ./data/training/ \
--max-samples-per-dataset 500000 --trainin-datasets mnli

and encounters the error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

How can we solve this?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

pradeepiisc · 2024-05-08T03:55:25Z

Were you able to fix the error? @biaoyanf

pradeepiisc · 2024-05-08T06:17:06Z

``After going through the code and browsing internet, the main reason for this error is disabling of computation gradient somewhere during the training step.
If you notice in training step, the code uses AdamW that is a third party implementation and has a function called def step() with a decorator torch.no_grad() inside the optimization.py class of transformers.

Some links to support my arguments:

https://github.com/Lightning-AI/pytorch-lightning/issues/18222
https://github.com/Lightning-AI/pytorch-lightning/issues/18254

The resolution is to explicilty enable grad via torch.enable_grad() decorator or a function call.
https://github.com/Lightning-AI/pytorch-lightning/pull/18268/files

I referred to this bug fix PR of lighting and added this to the optimizer_loop.py class myself instead of upgrading the torch/lighting version.

@torch.enable_grad() def closure(self, *args: Any, **kwargs: Any) -> ClosureResult: step_output = self._step_fn()

With the above change, the runtimeError is not coming anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime error #14

Runtime error #14

biaoyanf commented Apr 9, 2024

pradeepiisc commented May 8, 2024

pradeepiisc commented May 8, 2024

Runtime error #14

Runtime error #14

Comments

biaoyanf commented Apr 9, 2024

pradeepiisc commented May 8, 2024

pradeepiisc commented May 8, 2024