[BUG] Issues while running the ansible playbook #171

viniciusdc · 2024-09-06T14:42:53Z

Context

The new Redis addition to the cluster seems to be missing some validation checks in the current role. The final service also seems to be racing against another default service with the same name (mostly a default initialization when the package is first installed), leading to blocking ports, which in turn leads to the conda-store service being down.

This one below is a snipet for the current ansible task falling. This is a quick fix:

This is the troublesome one:

appears

Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * Redis version=7.4.0, bits=64, commit=00000000, modified=0, pid=11781, just started
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:C 06 Sep 2024 14:37:08.677 * Configuration loaded
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.677 * Increased maximum number of open files to 10032 (it was originally set to 1024).
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.677 * monotonic clock: POSIX clock_gettime
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 * Running mode=standalone, port=6379.
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 # Warning: Could not create server TCP listening socket 127.0.0.1:6379: bind: Address already in use
Sep 06 14:37:08 hpc01-test redis-server[11781]: 11781:M 06 Sep 2024 14:37:08.678 # Failed listening on port 6379 (tcp), aborting.
Sep 06 14:37:08 hpc01-test systemd[1]: redis.service: Main process exited, code=exited, status=1/FAILURE
Sep 06 14:37:08 hpc01-test systemd[1]: redis.service: Failed with result 'exit-code'.

Value and/or benefit

Success at deploying and launching nebari-slurm

Anything else?

The main problem might be due to the redis-server.service starting up on installation, it might be as simple as disabling the service before we create our custom redis.service:

nebari-slurm/roles/redis/tasks/redis.yaml

Lines 40 to 59 in 4ff7083

    
           - name: Copy the redis systemd service file 
        
             become: true 
        
             ansible.builtin.copy: 
        
               content: | 
        
                 [Unit] 
        
                 Description=Redis 
        
                 After=syslog.target 
        
                 [Service] 
        
                 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf 
        
                 RestartSec=5s 
        
                 Restart=on-success 
        
                 [Install] 
        
                 WantedBy=multi-user.target 
        
               dest: /etc/systemd/system/redis.service 
        
               owner: root 
        
               group: root 
        
               mode: "0644" 
        
             register: _redis_service

The text was updated successfully, but these errors were encountered:

viniciusdc · 2024-09-09T13:48:14Z

This issue can, in theory, be worked around by manually disabling the conflicting service using systemctl, though while testing this, I still had issues with conda-store not properly connecting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Issues while running the ansible playbook #171

[BUG] Issues while running the ansible playbook #171

viniciusdc commented Sep 6, 2024

viniciusdc commented Sep 9, 2024

[BUG] Issues while running the ansible playbook #171

[BUG] Issues while running the ansible playbook #171

Comments

viniciusdc commented Sep 6, 2024

Context

Value and/or benefit

Anything else?

viniciusdc commented Sep 9, 2024