Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anible Error while trying to Reattach OpenBach Agents #2

Open
godelc7 opened this issue Jun 5, 2023 · 4 comments
Open

Anible Error while trying to Reattach OpenBach Agents #2

godelc7 opened this issue Jun 5, 2023 · 4 comments

Comments

@godelc7
Copy link

godelc7 commented Jun 5, 2023

I got an ansible error(see below) when trying to reattach agents after detaching them using the auditorium scripts. Detaching works fine, but reattaching them again always fails.

[My OpenBach Topology]:
- Entity name: controller, associated agent: Controller, IP address: 192.168.1.210
- Entity name: TrafficGenerator1, associated agent: test_central, IP address: 192.168.1.211 (the one I'm trying to reattach below)
- Entity name: TrafficGenerator2, associated agent: test_remote, IP address: 192.168.1.212

[CLI Command]:
python3 /usr/local/lib/python3.8/dist-packages/auditorium_scripts/install_agent.py 192.168.1.211 192.168.1.210 TrafficGenerator1 --username <MY_LOGIN_TO_OPENBACH> --password <MY_PASSWORD_TO_OPENBACH> --controller 192.168.1.210 --reattach

[ERROR]:
{'response': {'192.168.1.211': [{'_ansible_no_log': False,
'msg': "The conditional check '{{ item.remove }}' failed. The error was: error while evaluating conditional ({{ item.remove }}): 'dict object' has no attribute "
"'remove'\n"
'\n'
"The error appears to be in '/opt/openbach/controller/ansible/push_files.yml': line 26, column 7, but may\n"
'be elsewhere in the file depending on the exact syntax problem.\n'
'\n'
'The offending line appears to be:\n'
'\n'
'msg': "The conditional check '{{ item.remove }}' failed. The error was: error while evaluating conditional ({{ item.remove }}): 'dict object' has no attribute "
"'remove'\n"
'\n'
"The error appears to be in '/opt/openbach/controller/ansible/push_files.yml': line 26, column 7, but may\n"
'be elsewhere in the file depending on the exact syntax problem.\n'
'\n'
'The offending line appears to be:\n'
'\n'
'\n'
' - name: Remove file on source\n'
' ^ here\n'}],
'error': 'Ansible playbook execution failed'},
'returncode': 422}
{'response': {'192.168.1.211': [{'msg': "The conditional check '{{ item.remove }}' failed. The error was: error while evaluating conditional ({{ item.remove }}): 'dict object' has no attribute 'remove'\n\nThe error appears to be in '/opt/openbach/controller/ansible/push_files.yml': line 26, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Remove file on source\n ^ here\n", '_ansible_no_log': False}], 'error': 'Ansible playbook execution failed'}, 'returncode': 422}

@Kniyl
Copy link

Kniyl commented Jun 6, 2023

Hi,

Are you able to edit the /opt/openbach/controller/ansible/{push,pull}_files.yml files on your controller and change line 32 (resp. 23) from when: "{{ item.remove }}" to when: "{{ item.remove | default(False) }}"?

If so, does it fix your issue?

@godelc7
Copy link
Author

godelc7 commented Jun 6, 2023

It seems to work fine. At least, the agents are now present in the data base, even though I still need to add them manually to the project topology. I will run my scenarios to see if everything still works as expected.

What wonders me is that, even though reattaching procedure itself seems to be successful, there is a text generated on the CLI saying that the ansible playbook execution failed. Here it is:

[ERROR]
{'assign_collector': None,
'install': {'last_operation_date': '2023-06-06T17:49:43.802Z', 'response': None, 'returncode': 204},
'log_severity': None,
'uninstall': {'last_operation_date': '2023-04-25T13:02:35.073Z',
'response': {'response': {'192.168.1.211': [{'_ansible_item_label': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal stable',
'_ansible_no_log': False,
'ansible_loop_var': 'item',
'changed': False,
'invocation': {'module_args': {'codename': None,
'filename': None,
'install_python_apt': True,
'mode': None,
'repo': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal stable',
'state': 'absent',
'update_cache': True,
'update_cache_retries': 5,
'update_cache_retry_max_delay': 12,
'validate_certs': True}},
'item': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal stable',
'msg': 'Failed to update apt cache: unknown reason'},
{'changed': False,
'msg': 'One or more items failed',
'results': [{'_ansible_item_label': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal stable', '_ansible_no_log': False,
'ansible_loop_var': 'item',
'changed': False,
'failed': True,
'invocation': {'module_args': {'codename': None,
'filename': None,
'install_python_apt': True,
'mode': None,
'repo': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal '
'stable',
'state': 'absent',
'update_cache': True,
'update_cache_retries': 5,
'update_cache_retry_max_delay': 12,
'validate_certs': True}},
'item': 'deb https://raw.githubusercontent.com/CNES/net4sat-packages/master/focal/ focal stable',
'msg': 'Failed to update apt cache: unknown reason'}]}],
'error': 'Ansible playbook execution failed'},
'returncode': 422},
'returncode': 422}}
Operation successfull

@Kniyl
Copy link

Kniyl commented Jun 7, 2023

Yes, that's odd, there is clearly an error, as indicated by the 422 return code. Operation successfull is misleading here and I will have a look into it.

However, the culprit here is 'msg': 'Failed to update apt cache: unknown reason'. You should have a look into your machine to try and fix it, otherwise it might prevent you from doing other actions as well, such as, for instance, installing new jobs into the agent.

@godelc7
Copy link
Author

godelc7 commented Jun 23, 2023

Yes, that's odd, there is clearly an error, as indicated by the 422 return code. Operation successful is misleading here and I will have a look into it.

Operation successful is indeed misleading here, but only partly. The agents got reattached. I can rebuild my entire topology with these agents and run other tests as well. The thing is that, some critical services like time synchronization(NTP) with the controller are not restored after the reattachment. Since the time I detached and reattached the agents, I got very strange results from my tests. I have been investigating the cause in many different directions. Only today, I figured out that the lacking of a time synchronization with the controller is the root cause of my wrong results. In sum, the detachment of agents works properly and reliably, but reattachment only works partly. The part that is not working is to my opinion very critical. Therefore, I propose this to be considered as critical bug.

However, the culprit here is 'msg': 'Failed to update apt cache: unknown reason'. You should have a look into your machine to try and fix it, otherwise it might prevent you from doing other actions as well, such as, for instance, installing new jobs into the agent.

I'm aware of this problem with APT on my machines. The reason is that I'm working in a very restricted environment. Many APT mirrors are blocked by the company. But the few APT mirrors that are allowed are sufficient for my daily work. And up to now, I could install every jobs that I have needed on the agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants