Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charmed-mysql.mysqlrouter-service daemon must be restarted if MySQL operator is restarted #105

Closed
NucciTheBoss opened this issue Feb 1, 2024 · 11 comments · Fixed by #107
Labels
bug Something isn't working

Comments

@NucciTheBoss
Copy link

Steps to reproduce

  1. juju deploy slurmctld --channel edge
  2. juju deploy slurmd --channel edge
  3. juju deploy slurmdbd --channel edge
  4. juju deploy slurmrestd --channel edge
  5. juju deploy mysql --channel 8.0/edge
  6. juju deploy mysql-router slurmdbd-mysql-router --channel 8.0/edge
  7. juju integrate slurmctld:slurmd slurmd:slurmd
  8. juju integrate slurmctld:slurmdbd slurmdbd:slurmdbd
  9. juju integrate slurmctld:slurmrestd slurmrestd:slurmrestd
  10. juju integrate slurmdbd:database slurmdbd-mysql-router:database
  11. juju integrate slurmdbd-mysql-router:backend-database mysql:database
  12. Wait for Slurm cluster to stabilize...
  13. juju ssh mysql/leader
  14. sudo systemctl reboot
  15. Wait for mysql operator to reach active status again...
  16. juju ssh slurmdbd/leader
  17. cat /var/log/slurm/slurmdbd.log
  18. See that slurmdbd is unable to reach the MySQL database...
  19. snap restart charmed-mysql.mysqlrouter-service
  20. cat /var/log/slurm/slurmdbd.log
  21. See that slurmdbd is now able to successfully reach the MySQL database...

Expected behavior

The behavior that we expect to see is that the slurmdbd daemon is able to successfully contact the MySQL database through the MySQL router Unix socket after the MySQL operator has been rebooted by a human operator.

Actual behavior

Even though the MySQL operator successfully reaches active status after we restart the machine MySQL is deployed to (used to not be the case), we still need to ssh into the slurmdbd operator and manually restart the mysql router service. Slurmdbd's log file indicates that it is not able to contact the MySQL database until we restart the MySQL router service

Versions

Operating system: Ubuntu 22.04 LTS on amd64

Juju CLI: 3.1.7-genericlinux-amd64

Juju agent: 3.1.6

mysql charm revision: 207
mysql-router charm revision: 127

LXD: 5.20

Log output

/var/log/slurm/slurmdbd.log just contains messages about being unable to establish a successful connection with the MySQL database. The desired message you want to see (means that slurmdbd is fully operational) in the log file is the following:

slurmdbd version 23.02.7 started

Additional context

We must restart our machines because we need to load kernel modules to enable specific high-performance networking features on our machines. We originally had a similar reboot issue with mysql-operator, but this was fixed by canonical/mysql-operator#380

@NucciTheBoss NucciTheBoss added the bug Something isn't working label Feb 1, 2024
Copy link
Contributor

github-actions bot commented Feb 1, 2024

@carlcsaposs-canonical
Copy link
Contributor

juju deploy mysql-router slurmdbd-mysql-router --channel 8.0/edge

@NucciTheBoss 8.0/edge mysql-router is not maintained by Data Platform. Please use channel dpe/edge and re-open if you encounter the same issue

@carlcsaposs-canonical carlcsaposs-canonical closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024
@carlcsaposs-canonical
Copy link
Contributor

Able to reproduce with above steps to reproduce, replacing juju deploy mysql-router slurmdbd-mysql-router --channel 8.0/edge with juju deploy mysql-router slurmdbd-mysql-router --channel dpe/edge

@carlcsaposs-canonical
Copy link
Contributor

Able to reproduce with mysqlsh shell.connect via socket—confirms that issue occurs on router, not client

@NucciTheBoss
Copy link
Author

The issue is with dpe/edge

I typed up the steps to reproduce from memory. We have the issue with the Data Platform channel; it's the one that we use in our integration tests. Sorry about that.

@carlcsaposs-canonical
Copy link
Contributor

Unable to reproduce when connecting to mysql-router over TCP; issue appears to be socket related

@carlcsaposs-canonical
Copy link
Contributor

Appears to be a MySQL Router bug resolved in 8.0.35

Routing on named sockets did not resume after Cluster recovery. Error 2002 was logged. (Bug #35503286)

https://dev.mysql.com/doc/relnotes/mysql-router/8.0/en/news-8-0-35.html

@carlcsaposs-canonical
Copy link
Contributor

Should be fixed by #103

@carlcsaposs-canonical
Copy link
Contributor

carlcsaposs-canonical commented Feb 6, 2024

Following steps to reproduce with dpe/edge/8.0.35-noupgrade from #107—no issue

Confirms that #103 should fix this issue

@carlcsaposs-canonical
Copy link
Contributor

Fix released as rev 135/136. Note that MySQL Router is now 8.0.35 instead of 8.0.34

@NucciTheBoss
Copy link
Author

NucciTheBoss commented Feb 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants