Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A mistake occurs When I try to train this nework? #3

Open
datar001 opened this issue Dec 29, 2022 · 3 comments
Open

A mistake occurs When I try to train this nework? #3

datar001 opened this issue Dec 29, 2022 · 3 comments

Comments

@datar001
Copy link

datar001 commented Dec 29, 2022

I try to train this network according the command "python ppo_single_large_hiar.py train"。
But It makes a mistake as followings:

Failure # 1 (occurred at 2022-12-29_17-59-27)
Traceback (most recent call last):
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py", line 989, in get_next_executor_event
future_result = ray.get(ready_future)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/worker.py", line 2277, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::PPO.init()�[39m (pid=1210644, ip=192.168.124.36, repr=PPO)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in init
super().init(config=config, logger_creator=logger_creator, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 157, in init
self.setup(copy.deepcopy(self.config))
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 418, in setup
self.workers = WorkerSet(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 171, in init
self._local_worker = self._make_worker(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 661, in _make_worker
worker = cls(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 613, in init
self._build_policy_map(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1784, in _build_policy_map
self.policy_map.create_policy(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 123, in create_policy
self[policy_id] = create_policy_for_framework(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/policy.py", line 80, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 66, in init
self._initialize_loss_from_dummy_batch()
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy.py", line 1050, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 483, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1016, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/complex_input_net.py", line 207, in forward
nn_out, _ = self.flatten[i](
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/fcnet.py", line 146, in forward
self._features = self._hidden_layers(self._last_flat_in)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/misc.py", line 169, in forward
return self._model(x)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

It seems to be a cuda mistake. But I test the cuda environment in the python console and it's ok

image

@lethaiq
Copy link
Owner

lethaiq commented Jan 2, 2023

can you check if pytorch and python versions are the same?

@datar001
Copy link
Author

datar001 commented Jan 2, 2023 via email

@lethaiq
Copy link
Owner

lethaiq commented Jan 2, 2023

How about the version of ray? Can you also try the test code with the provided best checkpoint to see if you have same errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants