You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cell In[16], line 13, in train(model, train_loader, criterion, optimizer)
11 outputs = model(inputs)
12 loss = criterion(outputs, labels) # Calculate loss between model outputs and ground truth
---> 13 loss.backward()
14 optimizer.step()
15 running_loss += loss.item() * inputs.size(0) # Update running loss
File ~/.conda/envs/torchTest1/lib/python3.12/site-packages/torch/autograd/init.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
261 retain_graph = create_graph
263 # The reason we repeat the same comment below is that
264 # some Python versions print out the first line of a multi-line function
265 # calls in the traceback and some print out the last line
--> 266 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
267 tensors,
268 grad_tensors,
269 retain_graph,
270 create_graph,
271 inputs,
272 allow_unreachable=True,
273 accumulate_grad=True,
274 )
RuntimeError: GET was unable to find an engine to execute this computation'
hi, there. I fixed a similar problem by matching the version of torch, torchvision, as well as torchaudio according to what is said on the PyTorch official release website. One such feasible solution is:
torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0
I run into the problem again. I think the solution is not really the matching versions between. torch, torch vision, and torch audio. The solution should be:
echo $LD_LIBRARY_PATH;
go to the directory
rename the problematic libcudnn_cnn_train.so.8 (or whatever is mentioned in message) as a copy.
Now the system wouldn't go to this env var for cuda/cudnn shit. The underlying reason is that torch brings its own cuda/cudnn. We need to make them called.
Hello Everyone,
I'm using pytorch version=2.2.1 and CUDA=12.1, python version = 3.12.2 and I'm getting the following error;
'RuntimeError: RuntimeError Traceback (most recent call last)
Cell In[16], line 47
45 num_epochs = 10
46 for epoch in range(num_epochs):
---> 47 train_loss, train_time = train(model, train_loader, criterion, optimizer)
48 val_loss, val_accuracy, val_time = validate(model, val_loader, criterion)
49 print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Time: {train_time:.2f}s, '
50 f'Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}, Val Time: {val_time:.2f}s')
Cell In[16], line 13, in train(model, train_loader, criterion, optimizer)
11 outputs = model(inputs)
12 loss = criterion(outputs, labels) # Calculate loss between model outputs and ground truth
---> 13 loss.backward()
14 optimizer.step()
15 running_loss += loss.item() * inputs.size(0) # Update running loss
File ~/.conda/envs/torchTest1/lib/python3.12/site-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
512 if has_torch_function_unary(self):
513 return handle_torch_function(
514 Tensor.backward,
515 (self,),
(...)
520 inputs=inputs,
521 )
--> 522 torch.autograd.backward(
523 self, gradient, retain_graph, create_graph, inputs=inputs
524 )
File ~/.conda/envs/torchTest1/lib/python3.12/site-packages/torch/autograd/init.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
261 retain_graph = create_graph
263 # The reason we repeat the same comment below is that
264 # some Python versions print out the first line of a multi-line function
265 # calls in the traceback and some print out the last line
--> 266 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
267 tensors,
268 grad_tensors,
269 retain_graph,
270 create_graph,
271 inputs,
272 allow_unreachable=True,
273 accumulate_grad=True,
274 )
RuntimeError: GET was unable to find an engine to execute this computation'
Originally posted by @VikasAmaraneni in ultralytics/ultralytics#4060 (comment)
The text was updated successfully, but these errors were encountered: