Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mysql query metric collection not recovering after database restarts #10811

Conversation

alexandre-normand
Copy link
Contributor

@alexandre-normand alexandre-normand commented Dec 8, 2021

What does this PR do?

This fixes an issue where the query metrics collection would never recover when a monitored mysql instance would be restarted. This would manifest itself by repeated logging of

2021-12-07 22:47:52 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:122 in LogMessage) | mysql:e40c6a86aebc5d6f | (statements.py:134) | Unable to collect statement metrics due to an error
Traceback (most recent call last):
File "/~/dd/integrations-core/mysql/datadog_checks/mysql/statements.py", line 113, in collect_per_statement_metrics
   rows = self._collect_per_statement_metrics()
File "/~/dd/integrations-core/mysql/datadog_checks/mysql/statements.py", line 138, in _collect_per_statement_metrics
  monotonic_rows = self._query_summary_per_statement()
File "/~/dd/integrations-core/mysql/datadog_checks/mysql/statements.py", line 176, in _query_summary_per_statement
  cursor.execute(sql_statement_summary)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/pymysql/cursors.py", line 170, in execute
  result = self._query(query)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/pymysql/cursors.py", line 328, in _query
  conn.query(q)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/pymysql/connections.py", line 516, in query
  self._execute_command(COMMAND.COM_QUERY, sql)
File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/pymysql/connections.py", line 750, in _execute_command
    raise err.InterfaceError("(0, '')")
pymysql.err.InterfaceError: (0, '')

With customers having to restart the agent in order for query metric collection to resume.

This addresses that failure by catching the InterfaceError and closing the query metric's job connection so that the next run gets a new connection.

Motivation

Improve resiliency of the integration.

Additional Notes

This is a tricky one to unit so, to validate that this works, I first reproduced the error by restarting my test RDS mysql instance. I then ran with the changes that you see in this PR and confirmed that the statement metrics collection job was getting shut down and restarted on a lost connection during a database reboot:

dd-agent  | 2021-12-08 19:16:20 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:37 in CheckStarted) | check:mysql | Running check...
dd-agent  | 2021-12-08 19:16:27 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:246) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-metrics] Job loop database error: (1053, 'Server shutdown in progress')
dd-agent  | 2021-12-08 19:16:27 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:126 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:267) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-metrics] Shutting down job loop
dd-agent  | 2021-12-08 19:16:31 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:246) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-samples] Job loop database error: (2013, 'Lost connection to MySQL server during query')
dd-agent  | 2021-12-08 19:16:31 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:126 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:267) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-samples] Shutting down job loop
dd-agent  | 2021-12-08 19:16:37 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:126 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:232) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-metrics] Starting job loop
dd-agent  | 2021-12-08 19:16:37 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:126 in LogMessage) | mysql:e40c6a86aebc5d6f | (utils.py:232) | [dbinstanceidentifier:dbm-alex-normand,instanceID:1,server:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,port:3306,hostname:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,region:us-east-1,host:dbm-alex-normand.c7ug0vvtkhqv.us-east-1.rds.amazonaws.com,job:statement-samples] Starting job loop

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@codecov
Copy link

codecov bot commented Dec 8, 2021

Codecov Report

Merging #10811 (ee24772) into master (6a456b3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Flag Coverage Δ
mysql 87.14% <100.00%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

mysql/datadog_checks/mysql/statements.py Outdated Show resolved Hide resolved
@alexandre-normand alexandre-normand merged commit 57c4cc6 into master Dec 8, 2021
@alexandre-normand alexandre-normand deleted the alex.normand/fix-mysql-query-metrics-not-collected-after-db-restart branch December 8, 2021 20:43
cswatt pushed a commit that referenced this pull request Jan 5, 2022
…rts (#10811)

* Fix mysql query metric collection not recovering after database restarts

* Revert "Fix mysql query metric collection not recovering after database restarts"

This reverts commit 2e46b8d.

* Let Exceptions Bubble up and Restart Query Metrics Job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants