-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failed #48
Comments
[07/20 19:56:19 ERROR:] [train worker 0] Encountered TypeError, exiting. [engine.py: 1858] [07/20 19:56:19 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467] |
Unfortunately I have the same problem with the author, I'm also trying to use in docker enviornment.
Based on the error I re-run the container with adding
I'm running this container on headless remote GPU cluster. Could you suggest something I can do? |
do you solve it? |
Hi @herveyrobot and @nbqu , The [d != torch.device("cpu") and d >= 0 for d in devices]
TypeError: 'NoneType' object is not iterable
[engine.py: 1861] the parameter task_sampler_params = TwoPhaseRGBBaseExperimentConfig.stagewise_task_sampler_args(
stage="train", process_ind=0, total_processes=1,
) to task_sampler_params = TwoPhaseRGBBaseExperimentConfig.stagewise_task_sampler_args(
stage="train", process_ind=0, total_processes=1, devices=[0]
) and trying again? |
i have the same error with ubuntu 18.04. |
Hi, I have the same problem. It shows the docker can't find an X display. So I add |
Hi @Wallong , Thanks for pointing out this fix! Can I ask what your entire |
i can't run the test code (ai2thor-rearrangement# allenact -o rearrange_out -b . baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py
).
thanks for any answers.
[07/20 19:14:38 INFO:] Running with args Namespace(experiment='baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py', eval=False, config_kwargs=None, extra_tag='', output_dir='rearrange_out', save_dir_fmt=<SaveDirFormat.FLAT: 'FLAT'>, seed=None, experiment_base='.', checkpoint=None, infer_output_dir=False, approx_ckpt_step_interval=None, restart_pipeline=False, deterministic_cudnn=False, max_sampler_processes_per_worker=None, deterministic_agents=False, log_level='info', disable_tensorboard=False, disable_config_saving=False, collect_valid_results=False, valid_on_initial_weights=False, test_expert=False, distributed_ip_and_port='127.0.0.1:0', machine_id=0, callbacks='', enable_crash_recovery=False, test_date=None, approx_ckpt_steps_count=None, skip_checkpoints=0) [main.py: 452]
fatal: not a git repository (or any of the parent directories): .git
[07/20 19:14:39 WARNING:] Failed to get a git diff of the current project. Is it possible that /root/ai2thor-rearrangement is not under version control? [runner.py: 892]
[07/20 19:14:39 INFO:] Config files saved to rearrange_out/used_configs/OnePhaseRGBResNetDagger_40proc/2023-07-20_19-14-39 [runner.py: 935]
[07/20 19:14:39 INFO:] Using 1 train workers on devices (device(type='cpu'),) [runner.py: 317]
[07/20 19:14:39 INFO:] Using local worker ids [0] (total 1 workers in machine 0) [runner.py: 326]
[07/20 19:14:39 INFO:] Started 1 train processes [runner.py: 595]
[07/20 19:14:39 INFO:] No processes allocated to validation, no validation will be run. [runner.py: 626]
[07/20 19:14:41 INFO:] train 0 args {'experiment_name': 'OnePhaseRGBResNetDagger_40proc', 'config': <baseline_configs.one_phase.one_phase_rgb_resnet_dagger.OnePhaseRGBResNetDaggerExperimentConfig object at 0x7f8809b19670>, 'callback_sensors': [], 'results_queue': <multiprocessing.queues.Queue object at 0x7f8809b196d0>, 'checkpoints_queue': None, 'checkpoints_dir': 'rearrange_out/checkpoints/OnePhaseRGBResNetDagger_40proc/2023-07-20_19-14-39', 'seed': 1118467761, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.SpawnContext object at 0x7f86d3c559a0>, 'num_workers': 1, 'device': device(type='cpu'), 'distributed_ip': '127.0.0.1', 'distributed_port': 0, 'max_sampler_processes_per_worker': None, 'save_ckpt_after_every_pipeline_stage': True, 'initial_model_state_dict': '[SUPPRESSED]', 'first_local_worker_id': 0, 'distributed_preemption_threshold': 0.7, 'try_restart_after_task_error': False, 'mode': 'train', 'worker_id': 0} [runner.py: 416]
[07/20 19:14:41 ERROR:] [train worker 0] Encountered TypeError, exiting. [engine.py: 1858]
[07/20 19:14:41 ERROR:] Traceback (most recent call last):
File "/root/ai2thor-rearrangement/baseline_configs/rearrange_base.py", line 292, in stagewise_task_sampler_args
x_displays = get_open_x_displays(throw_error_if_empty=True)
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact_plugins/ithor_plugin/ithor_util.py", line 88, in get_open_x_displays
raise IOError(
OSError: Could not find any open X-displays on which to run AI2-THOR processes. Please see the AI2-THOR installation instructions at https://allenact.org/installation/installation-framework/#installation-of-ithor-ithor-plugin for information as to how to start such displays.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 1850, in train
self.run_pipeline(valid_on_initial_weights=valid_on_initial_weights)
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 1506, in run_pipeline
self.initialize_storage_and_viz(
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 483, in initialize_storage_and_viz
observations = self.vector_tasks.get_observations()
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 327, in vector_tasks
sampler_fn_args=self.get_sampler_fn_args(seeds),
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 382, in get_sampler_fn_args
return [
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/engine.py", line 383, in
fn(
File "/root/ai2thor-rearrangement/baseline_configs/rearrange_base.py", line 374, in train_task_sampler_args
**cls.stagewise_task_sampler_args(
File "/root/ai2thor-rearrangement/baseline_configs/rearrange_base.py", line 308, in stagewise_task_sampler_args
[d != torch.device("cpu") and d >= 0 for d in devices]
TypeError: 'NoneType' object is not iterable
[engine.py: 1861]
[07/20 19:14:41 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467]
[07/20 19:14:41 ERROR:] Traceback (most recent call last):
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[runner.py: 1468]
Traceback (most recent call last):
File "/opt/miniconda3/envs/rearrange/lib/python3.9/site-packages/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[07/20 19:14:41 INFO:] Terminating train 0 [runner.py: 1543]
[07/20 19:14:41 INFO:] Joining train 0 [runner.py: 1543]
[07/20 19:14:41 INFO:] Closed train 0 [runner.py: 1543]
The text was updated successfully, but these errors were encountered: