Federated Learning | Concept 24 FL for MNIST #403

LeonMac · 2022-03-04T10:41:10Z

Description

When the DS launch up a remote training, on DO side, report "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

How to Reproduce

run the code line-by-line, everything works fine, until arriving to PART 3: Training. (I have a GPU and CUDA )
Training will stop at epoch 1 and no progress anymore.
on DO side I can see the error report as above "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."

Expected Behavior

This is a classic issue for general ML and I can find solution, but how to handle this by using FL lib (by which the training happen on DO side actually)

System Information

OS: ubuntu18.04
Language Version: Python:3.7.10, torch:1.8.1, torchvision:0.9.1
Package Manager Version: [e.g. conda 4.11.0, pip 21.2.2 ]

LeonMac added the Type: Bug 🐛 Some functionality not working in the codebase as intended label Mar 4, 2022

LeonMac changed the title ~~Federated Learning | Concept 24 FL for MINST~~ Federated Learning | Concept 24 FL for MNIST Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Federated Learning | Concept 24 FL for MNIST #403

Federated Learning | Concept 24 FL for MNIST #403

LeonMac commented Mar 4, 2022

Federated Learning | Concept 24 FL for MNIST #403

Federated Learning | Concept 24 FL for MNIST #403

Comments

LeonMac commented Mar 4, 2022

Description

How to Reproduce

Expected Behavior

System Information