A mistake occurs When I try to train this nework? #3

datar001 · 2022-12-29T10:23:18Z

I try to train this network according the command "python ppo_single_large_hiar.py train"。
But It makes a mistake as followings:

Failure # 1 (occurred at 2022-12-29_17-59-27)
Traceback (most recent call last):
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/execution/ray_trial_executor.py", line 989, in get_next_executor_event
future_result = ray.get(ready_future)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/_private/worker.py", line 2277, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::PPO.init()�[39m (pid=1210644, ip=192.168.124.36, repr=PPO)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in init
super().init(config=config, logger_creator=logger_creator, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 157, in init
self.setup(copy.deepcopy(self.config))
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 418, in setup
self.workers = WorkerSet(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 171, in init
self._local_worker = self._make_worker(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 661, in _make_worker
worker = cls(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 613, in init
self._build_policy_map(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1784, in _build_policy_map
self.policy_map.create_policy(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy_map.py", line 123, in create_policy
self[policy_id] = create_policy_for_framework(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/policy.py", line 80, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 66, in init
self._initialize_loss_from_dummy_batch()
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/policy.py", line 1050, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 483, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1016, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/complex_input_net.py", line 207, in forward
nn_out, _ = self.flatten[i](
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/fcnet.py", line 146, in forward
self._features = self._hidden_layers(self._last_flat_in)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/ray/rllib/models/torch/misc.py", line 169, in forward
return self._model(x)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/m2i-ubuntu/anaconda3/envs/socialbot/lib/python3.8/site-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

It seems to be a cuda mistake. But I test the cuda environment in the python console and it's ok

lethaiq · 2023-01-02T16:01:25Z

can you check if pytorch and python versions are the same?

datar001 · 2023-01-02T16:03:05Z

yep，python=3.8, pytorch=1.5.1

…

---- Replied Message ---- | From | Thai ***@***.***> | | Date | 01/03/2023 00:01 | | To | ***@***.***> | | Cc | ***@***.***>***@***.***> | | Subject | Re: [lethaiq/Adversarial_SocialBots_WWW22] A mistake occurs When I try to train this nework? (Issue #3) | can you check if pytorch and python versions are the same? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

lethaiq · 2023-01-02T16:13:44Z

How about the version of ray? Can you also try the test code with the provided best checkpoint to see if you have same errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A mistake occurs When I try to train this nework? #3

A mistake occurs When I try to train this nework? #3

datar001 commented Dec 29, 2022 •

edited

Loading

lethaiq commented Jan 2, 2023

datar001 commented Jan 2, 2023 via email

lethaiq commented Jan 2, 2023

A mistake occurs When I try to train this nework? #3

A mistake occurs When I try to train this nework? #3

Comments

datar001 commented Dec 29, 2022 • edited Loading

lethaiq commented Jan 2, 2023

datar001 commented Jan 2, 2023 via email

lethaiq commented Jan 2, 2023

datar001 commented Dec 29, 2022 •

edited

Loading