Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collector for replication_group_members #459

Merged

Conversation

nirgilboa
Copy link
Contributor

@nirgilboa nirgilboa commented Mar 31, 2020

Fixes #362

The scraper sends a new Gauge metric with a static value of 1 and labels corresponding to columns of the replication_group_members table. I made this choice because the labels are very much static - if there's an idea for something more elegant I'm quite happy to change it ☺️

Both 5.7 and 8.0 are supported, by way of a different query for each. The first query sent is used to determine whether the additional columns present in the 8.0 schema are there, and falls back to the 5.7 schema.

Hope this makes sense..

@nirgilboa nirgilboa force-pushed the collector-replication_group_members branch from 36b82cd to a97a009 Compare March 31, 2020 20:21
@nirgilboa
Copy link
Contributor Author

@ezraroi Be happy to get your feedback on this 😄

@nirgilboa nirgilboa changed the title Add collector for replication_group_members. Fixes #362 Add collector for replication_group_members Apr 1, 2020
@nirgilboa
Copy link
Contributor Author

@SuperQ pls review 😅

@roman-vynar
Copy link
Contributor

I will test it as I was about to do the same :)

@roman-vynar
Copy link
Contributor

It works ok however the other question what to do with this.

For example, if you have 3 nodes, each reporting 3 states (one per each node including itself), you will get 9 metrics in Prometheus:

mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb1",job="mysql",member_host="mysqldb1.example.com",member_id="fd9d60cf-4c1d-11ea-b3b8-061fc32cb02a",member_port="3306",member_role="PRIMARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb1",job="mysql",member_host="mysqldb2.example.com",member_id="fdaf41dc-4c1d-11ea-b48b-02779259ef5a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb1",job="mysql",member_host="mysqldb3.example.com",member_id="38e59035-5f13-11ea-81ab-0a2d32a78e1a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb2",job="mysql",member_host="mysqldb1.example.com",member_id="fd9d60cf-4c1d-11ea-b3b8-061fc32cb02a",member_port="3306",member_role="PRIMARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb2",job="mysql",member_host="mysqldb2.example.com",member_id="fdaf41dc-4c1d-11ea-b48b-02779259ef5a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb2",job="mysql",member_host="mysqldb3.example.com",member_id="38e59035-5f13-11ea-81ab-0a2d32a78e1a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb3",job="mysql",member_host="mysqldb1.example.com",member_id="fd9d60cf-4c1d-11ea-b3b8-061fc32cb02a",member_port="3306",member_role="PRIMARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb3",job="mysql",member_host="mysqldb2.example.com",member_id="fdaf41dc-4c1d-11ea-b48b-02779259ef5a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1
mysql_perf_schema_replication_group_member{channel_name="group_replication_applier",instance="mysqldb3",job="mysql",member_host="mysqldb3.example.com",member_id="38e59035-5f13-11ea-81ab-0a2d32a78e1a",member_port="3306",member_role="SECONDARY",member_state="ONLINE",member_version="8.0.19"}	1

Then, any change to member role, state or mysql version will add more metrics like this because those are the label values. Transforming label values into metric values also does not look straight-forward.
So there should be whether insane group query to account such label changes or do it simpler - alert if you have less than 3 ONLINE members reporting by any instance and figure out the rest by yourself:

count(mysql_perf_schema_replication_group_member{member_state="ONLINE"}) by (instance) < count(mysql_perf_schema_replication_group_member) by (instance)

Again, the PR looks good :)

@nirgilboa
Copy link
Contributor Author

@roman-vynar Thanks for taking the time to look at this and test it ! 😄

imho what is going to the common usage of this is:

  1. Check the number of ONLINE members
  2. Check the number of PRIMARY and ONLINE members

I could filter out all other members, so that each exporter will report only on itself.
This seems to be in line with the usage of the exporter in general.
What do you think ?

@roman-vynar
Copy link
Contributor

I think it's ok and it's up to the end-users to decide how to filter out and alert on.

@roman-vynar
Copy link
Contributor

roman-vynar commented Apr 9, 2020

We also need to fix another scraper performance_schema.replication_group_member_stats to include new fields, especially COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE.

@nirgilboa
Copy link
Contributor Author

I think it's ok and it's up to the end-users to decide how to filter out and alert on.

Great, sounds good

We also need to fix another scraper performance_schema.replication_group_member_stats to include new fields, especially COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE.

Sure I'll be happy to add it once this is merged

@roman-vynar
Copy link
Contributor

roman-vynar commented Apr 9, 2020

I have added one for stats #462

@nirgilboa
Copy link
Contributor Author

I have added one for stats #462

Really like what you did there 😃 In order to consistent with what did on the stats I think it'll be better to expose the server's own status only. Agree ?

@roman-vynar
Copy link
Contributor

In this case it is okay to have all servers states exposed because it is possible each server observes the different picture, e.g. during network partitioning when a half of the nodes are locked down from other. I have seen a few times when member states are reported differently from each node.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you also add to the CHANGELOG?

* [FEATURE] Add collector for replication_group_members
#459

collector/perf_schema_replication_group_members.go Outdated Show resolved Hide resolved
collector/perf_schema_replication_group_members_test.go Outdated Show resolved Hide resolved
collector/perf_schema_replication_group_members_test.go Outdated Show resolved Hide resolved
collector/perf_schema_replication_group_members_test.go Outdated Show resolved Hide resolved
@nirgilboa nirgilboa force-pushed the collector-replication_group_members branch from ee6c8c5 to c0dc366 Compare April 13, 2020 13:41
@nirgilboa nirgilboa force-pushed the collector-replication_group_members branch from c0dc366 to d86ab67 Compare April 13, 2020 13:42
Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. A couple of minor things left.

collector/perf_schema_replication_group_members.go Outdated Show resolved Hide resolved
collector/perf_schema_replication_group_members.go Outdated Show resolved Hide resolved
…ration, remove duplicate comment

Signed-off-by: Nir Gilboa <[email protected]>
@nirgilboa nirgilboa force-pushed the collector-replication_group_members branch from fd9faa5 to 20e683f Compare April 13, 2020 14:18
@SuperQ SuperQ merged commit 5667073 into prometheus:master Apr 13, 2020
@SuperQ
Copy link
Member

SuperQ commented Apr 13, 2020

Thanks!

@roman-vynar
Copy link
Contributor

Thanks @SuperQ

@nirgilboa nirgilboa deleted the collector-replication_group_members branch April 13, 2020 14:38
SuperQ added a commit that referenced this pull request Apr 23, 2021
Changes related to `replication_group_member_stats` collector:
* metric "transaction_in_queue" was Counter instead of Gauge
* renamed 3 metrics starting with `mysql_perf_schema_transaction_` to start with `mysql_perf_schema_transactions_` to be consistent with column names
* exposing only server's own stats by matching MEMBER_ID with @@server_uuid resulting "member_id" label to be dropped.

* [CHANGE] Switch to go-kit for logs. #433
* [BUGFIX] Fix binlog metrics on mysql 8.x #419
* [BUGFIX] Fix output value of wsrep_cluster_status #473
* [BUGFIX] Fix collect.info_schema.innodb_metrics for new field names (mariadb 10.5+) #494
* [BUGFIX] Fix log output of collect[] params #505
* [BUGFIX] Fix collect.info_schema.innodb_tablespaces for new table names #516
* [BUGFIX] Fix innodb_metrics for mariadb 10.5+ #523
* [BUGFIX] Allow perf_schema.memory summary current_bytes to be negative #517
* [ENHANCEMENT] Support heartbeats in UTC #471
* [FEATURE] Add `tls.insecure-skip-verify` flag to ignore tls verification errors #417
* [FEATURE] Add collector for AWS Aurora information_schema.replica_host_status #435
* [FEATURE] Add collector for `replication_group_members` #459
* [FEATURE] Add new metrics to `replication_group_member_stats` collector to support MySQL 8.x. #462
* [FEATURE] Add collector for `performance_schema.memory_summary_global_by_event_name` #515
* [FEATURE] Support authenticating using mTLS client cert and no password #539

Signed-off-by: Ben Kochie <[email protected]>
@SuperQ SuperQ mentioned this pull request Apr 23, 2021
SuperQ added a commit that referenced this pull request Apr 25, 2021
BREAKING CHANGES:

Changes related to `replication_group_member_stats` collector:
* metric "transaction_in_queue" was Counter instead of Gauge
* renamed 3 metrics starting with `mysql_perf_schema_transaction_` to start with `mysql_perf_schema_transactions_` to be consistent with column names
* exposing only server's own stats by matching MEMBER_ID with @@server_uuid resulting "member_id" label to be dropped.

Changes:

* [CHANGE] Switch to go-kit for logs. #433
* [FEATURE] Add `tls.insecure-skip-verify` flag to ignore tls verification errors #417
* [FEATURE] Add collector for AWS Aurora information_schema.replica_host_status #435
* [FEATURE] Add collector for `replication_group_members` #459
* [FEATURE] Add new metrics to `replication_group_member_stats` collector to support MySQL 8.x. #462
* [FEATURE] Add collector for `performance_schema.memory_summary_global_by_event_name` #515
* [FEATURE] Support authenticating using mTLS client cert and no password #539
* [ENHANCEMENT] Support heartbeats in UTC #471
* [BUGFIX] Fix binlog metrics on mysql 8.x #419
* [BUGFIX] Fix output value of wsrep_cluster_status #473
* [BUGFIX] Fix collect.info_schema.innodb_metrics for new field names (mariadb 10.5+) #494
* [BUGFIX] Fix log output of collect[] params #505
* [BUGFIX] Fix collect.info_schema.innodb_tablespaces for new table names #516
* [BUGFIX] Fix innodb_metrics for mariadb 10.5+ #523
* [BUGFIX] Allow perf_schema.memory summary current_bytes to be negative #517

Signed-off-by: Ben Kochie <[email protected]>
SuperQ added a commit that referenced this pull request Apr 25, 2021
BREAKING CHANGES:

Changes related to `replication_group_member_stats` collector:
* metric "transaction_in_queue" was Counter instead of Gauge
* renamed 3 metrics starting with `mysql_perf_schema_transaction_` to start with `mysql_perf_schema_transactions_` to be consistent with column names
* exposing only server's own stats by matching MEMBER_ID with @@server_uuid resulting "member_id" label to be dropped.

Changes:

* [CHANGE] Switch to go-kit for logs. #433
* [FEATURE] Add `tls.insecure-skip-verify` flag to ignore tls verification errors #417
* [FEATURE] Add collector for AWS Aurora information_schema.replica_host_status #435
* [FEATURE] Add collector for `replication_group_members` #459
* [FEATURE] Add new metrics to `replication_group_member_stats` collector to support MySQL 8.x. #462
* [FEATURE] Add collector for `performance_schema.memory_summary_global_by_event_name` #515
* [FEATURE] Support authenticating using mTLS client cert and no password #539
* [ENHANCEMENT] Support heartbeats in UTC #471
* [BUGFIX] Fix binlog metrics on mysql 8.x #419
* [BUGFIX] Fix output value of wsrep_cluster_status #473
* [BUGFIX] Fix collect.info_schema.innodb_metrics for new field names (mariadb 10.5+) #494
* [BUGFIX] Fix log output of collect[] params #505
* [BUGFIX] Fix collect.info_schema.innodb_tablespaces for new table names #516
* [BUGFIX] Fix innodb_metrics for mariadb 10.5+ #523
* [BUGFIX] Allow perf_schema.memory summary current_bytes to be negative #517

Signed-off-by: Ben Kochie <[email protected]>
SuperQ added a commit that referenced this pull request May 18, 2021
BREAKING CHANGES:

Changes related to `replication_group_member_stats` collector:
* metric "transaction_in_queue" was Counter instead of Gauge
* renamed 3 metrics starting with `mysql_perf_schema_transaction_` to start with `mysql_perf_schema_transactions_` to be consistent with column names
* exposing only server's own stats by matching MEMBER_ID with @@server_uuid resulting "member_id" label to be dropped.

Changes:

* [CHANGE] Switch to go-kit for logs. #433
* [FEATURE] Add `tls.insecure-skip-verify` flag to ignore tls verification errors #417
* [FEATURE] Add collector for AWS Aurora information_schema.replica_host_status #435
* [FEATURE] Add collector for `replication_group_members` #459
* [FEATURE] Add new metrics to `replication_group_member_stats` collector to support MySQL 8.x. #462
* [FEATURE] Add collector for `performance_schema.memory_summary_global_by_event_name` #515
* [FEATURE] Support authenticating using mTLS client cert and no password #539
* [FEATURE] Add TLS and basic authentication #522
* [ENHANCEMENT] Support heartbeats in UTC #471
* [ENHANCEMENT] Improve parsing of boolean strings #548
* [BUGFIX] Fix binlog metrics on mysql 8.x #419
* [BUGFIX] Fix output value of wsrep_cluster_status #473
* [BUGFIX] Fix collect.info_schema.innodb_metrics for new field names (mariadb 10.5+) #494
* [BUGFIX] Fix log output of collect[] params #505
* [BUGFIX] Fix collect.info_schema.innodb_tablespaces for new table names #516
* [BUGFIX] Fix innodb_metrics for mariadb 10.5+ #523
* [BUGFIX] Allow perf_schema.memory summary current_bytes to be negative #517

Signed-off-by: Ben Kochie <[email protected]>
@SuperQ SuperQ mentioned this pull request May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

collector group replication members information
3 participants