Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issues while running the ansible playbook #171

Open
viniciusdc opened this issue Sep 6, 2024 · 1 comment
Open

[BUG] Issues while running the ansible playbook #171

viniciusdc opened this issue Sep 6, 2024 · 1 comment

Comments

@viniciusdc
Copy link
Collaborator

Context

The new Redis addition to the cluster seems to be missing some validation checks in the current role. The final service also seems to be racing against another default service with the same name (mostly a default initialization when the package is first installed), leading to blocking ports, which in turn leads to the conda-store service being down.

This one below is a snipet for the current ansible task falling. This is a quick fix:
Captura de tela de 2024-08-30 17 22 29

This is the troublesome one:
image
appears

Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * Redis version=7.4.0, bits=64, commit=00000000, modified=0, pid=11781, just started
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * Configuration loaded
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.677 * Increased maximum number of open files to 10032 (it was originally set to 1024).
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.677 * monotonic clock: POSIX clock_gettime
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 * Running mode=standalone, port=6379.
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 # Warning: Could not create server TCP listening socket 127.0.0.1:6379: bind: Address already in use
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 # Failed listening on port 6379 (tcp), aborting.
Sep 06 14:37:08 hpc01-test systemd[1]: redis.service: Main process exited, code=exited, status=1/FAILURE
Sep 06 14:37:08 hpc01-test systemd[1]: redis.service: Failed with result 'exit-code'.

Value and/or benefit

Success at deploying and launching nebari-slurm

Anything else?

The main problem might be due to the redis-server.service starting up on installation, it might be as simple as disabling the service before we create our custom redis.service:

- name: Copy the redis systemd service file
become: true
ansible.builtin.copy:
content: |
[Unit]
Description=Redis
After=syslog.target
[Service]
ExecStart=/usr/bin/redis-server /etc/redis/redis.conf
RestartSec=5s
Restart=on-success
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/redis.service
owner: root
group: root
mode: "0644"
register: _redis_service

@viniciusdc
Copy link
Collaborator Author

This issue can, in theory, be worked around by manually disabling the conflicting service using systemctl, though while testing this, I still had issues with conda-store not properly connecting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant