Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Aurora issues while testing PR3515 #3522

Closed
kashak88 opened this issue Jul 12, 2021 · 0 comments
Closed

AWS Aurora issues while testing PR3515 #3522

kashak88 opened this issue Jul 12, 2021 · 0 comments

Comments

@kashak88
Copy link

kashak88 commented Jul 12, 2021

#3515
ec2 ubuntu 20
ProxySQL version 2.2.0-106-gdc22217
aurora mysql 2.10.0
instance-1 = original writer (ip-172-16-0-243)
instance-0 = original reader (ip-172-16-2-114)
new-reader = additional reader (ip-172-16-3-30)

mysql_aws_aurora_hostgroups=
(
 {
  writer_hostgroup=0
  reader_hostgroup=1
  aurora_port=3306
  domain_name=".ZZZZ.ca-central-1.rds.amazonaws.com"
  max_lag_ms=60
  check_interval_ms=100
  check_timeout_ms=80
  writer_is_also_reader=1
  new_reader_weight=1000
 }
)

mysql_query_rules:
(
  {
    rule_id=1,
    match_pattern="^SELECT .* FOR UPDATE$",
    apply=1,
    destination_hostgroup=0
  }
,
  {
    rule_id=2,
    match_pattern="^SELECT",
    apply=1,
    destination_hostgroup=1
  }
)

mysql_servers =
(
  {
    address="instance-1.ZZZZ.ca-central-1.rds.amazonaws.com",
    port=3306,
    hostgroup=0,
    max_connections=1000,
    use_ssl=0,
    weight=1000,
    max_replication_lag=0
  },  {
    address="instance-1.ZZZZ.ca-central-1.rds.amazonaws.com",
    port=3306,
    hostgroup=1,
    max_connections=1000,
    use_ssl=0,
    weight=1000,
    max_replication_lag=0
  },  {
    address="instance-0.ZZZZ.ca-central-1.rds.amazonaws.com",
    port=3306,
    hostgroup=1,
    max_connections=1000,
    use_ssl=0,
    weight=1000,
    max_replication_lag=0
  }
)

Hello, while testing the new fixes on PR 3515, I've noticed a couple of issues:
Scenario 1

  1. Removed .db files and then restarting proxysql (loads config from /etc/proxysql.cnf, starting fresh)
  2. mysql_servers configuration matches initial configuration(HG0 = instance-1, HG0 = both instance-1 and instance-0). runtime_mysql_servers matches. All 3 server entries appear ONLINE in running config.
  3. Tested running SELECTS using this:
    while true; do mysql -h127.0.0.1 -P6033 -uweb_test -ptest-e "select @@hostname"; done
  4. This keeps producing the same result:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-0-243 |
    +-----------------+
    And this in log file:
2021-07-12 19:04:22 MySQL_Session.cpp:4269:handler(): [WARNING] Killing connection instance-1.ZZZZ.ca-central-1.rds.amazonaws.com:3306 because query 'select @@hostname' from client '127.0.0.1':36014 timed out.
2021-07-12 19:04:22 MySQL_Session.cpp:143:kill_query_thread(): [WARNING] KILL QUERY 15553 on instance-1.ZZZZ.ca-central-1.rds.amazonaws.com:3306
2021-07-12 19:04:23 [INFO] Killing query 6068
2021-07-12 19:04:23 mysql_connection.cpp:1027:handler(): [ERROR] Connect timeout on instance-1.ZZZZ.ca-central-1.rds.amazonaws.com:3306 : exceeded by 0us
2021-07-12 19:04:23 [INFO] Killing query 6072
2021-07-12 19:04:23 [INFO] Killing query 6077
2021-07-12 19:04:23 mysql_connection.cpp:1027:handler(): [ERROR] Connect timeout on instance-1.ZZZZ.ca-central-1.rds.amazonaws.com:3306 : exceeded by 0us
  1. Looks like it refuses to use instance-1(current writer) as reader.
  2. Do a failover (let instance-0 be the new writer). runtime_mysql_servers successfully picks up the change. mysql_servers and runtime_mysql_servers table is now different (as expected). HG-1-instance-1 is SHUNNED for a second but then all server entries are ONLINE.
  3. Run the select test again, result:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-0-243 |
    +-----------------+
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-2-114 |
    +-----------------+
  4. Proxysql successfully uses both instances for reads.
  5. Restart proxysql. runtime_mysql_servers is still the same and is still mismatching mysql_servers (as expected).
  6. Run select test, result:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-2-114 |
    +-----------------+
  7. Now proxysql is refusing to use instance-0 as reader.
  8. Failover again and it starts using both instances for reading (until I restart proxysql process again)

Scenario 2

  1. Same first step as above, purge db and restart proxysql so it picks up fresh config.
  2. Run same select test as above, confirm that we still see only 1 server in HG-1 serving selects:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-0-243 |
    +-----------------+
  3. Add new reader to the cluster, name it "new-reader". Proxysql successfully pulls it and adds it to runtime_mysql_servers:
    mysql> select * from runtime_mysql_servers;
    +--------------+-------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
    | hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
    +--------------+-------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
    | 0 | instance-1.ZZZZ.ca-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |
    | 1 | new-reader.ZZZZ.ca-central-1.rds.amazonaws.com| 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |
    | 1 | instance-0.ZZZZ.ca-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |
    | 1 | instance-1.ZZZZ.ca-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | |
    +--------------+-------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
    Log entry while it's still being added:
2021-07-12 19:29:59 MySQL_Monitor.cpp:4294:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com:3306 after 10ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Can't connect to MySQL server on 'new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com' (115).
2021-07-12 19:29:59 MySQL_Monitor.cpp:4294:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com:3306 after 1ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Can't connect to MySQL server on 'new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com' (115).
2021-07-12 19:29:59 MySQL_Monitor.cpp:4294:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com:3306 after 1ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Can't connect to MySQL server on 'new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com' (115).
  1. Run Select test again:
    +----------------+
    | @@hostname |
    +----------------+
    | ip-172-16-3-30 |
    +----------------+
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-0-243 |
    +-----------------+
  2. New reader is being used but proxysql is still refusing to use instance-1(writer) for reading.
  3. Restart proxysql. Observe runtime_mysql_servers - it's the same as before restart and different from mysql_servers (as expected)
  4. Run Select test again:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-0-243 |
    +-----------------+
    +----------------+
    | @@hostname |
    +----------------+
    | ip-172-16-3-30 |
    +----------------+
  5. It's still working as before - instance-1 is still missing...
  6. Failover. Instance-0 becomes writer. Select test results:
    +----------------+
    | @@hostname |
    +----------------+
    | ip-172-16-3-30 |
    +----------------+
  7. Now only the new reader is being used!
  8. Restart proxysql. Run test again:
    +-----------------+
    | @@hostname |
    +-----------------+
    | ip-172-16-2-114 |
    +-----------------+
    +----------------+
    | @@hostname |
    +----------------+
    | ip-172-16-3-30 |
    +----------------+
  9. It's using new-reader and instance-1 as readers, but not the writer.

Afterwards, if I get rid of new-reader by shutting it down/removing:

  1. Keep the Select test running while new-reader is being shut down
  2. new-reader is SHUNNED in runtime_mysql_servers but proxysql is still trying to access it?
2021-07-12 20:08:23 MySQL_Monitor.cpp:4294:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com:3306 after 2ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Unknown MySQL server host 'new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com' (-2).
2021-07-12 20:08:23 MySQL_Monitor.cpp:4294:monitor_AWS_Aurora_thread_HG(): [ERROR] Error on AWS Aurora check for new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com:3306 after 2ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Unknown MySQL server host 'new-reader.celzqyystx3l.ca-central-1.rds.amazonaws.com' (-2).
mysql> select * from REPLICA_HOST_STATUS \G
*************************** 1. row ***************************
                             SERVER_ID: instance-0
                            SESSION_ID: MASTER_SESSION_ID
                                  IOPS: 0
                              READ_IOS: 0
                      PENDING_READ_IOS: 0
                                   CPU: 0
                           DURABLE_LSN: 574414829
                            ACTIVE_LSN: 0
                  LAST_TRANSPORT_ERROR: 0
                  LAST_ERROR_TIMESTAMP: 1970-01-01 00:00:00.000000
                 LAST_UPDATE_TIMESTAMP: 2021-07-12 20:09:23.214661
  MASTER_SLAVE_LATENCY_IN_MICROSECONDS: 0
           REPLICA_LAG_IN_MILLISECONDS: 0
    LOG_STREAM_SPEED_IN_KiB_PER_SECOND: 0
            LOG_BUFFER_SEQUENCE_NUMBER: 0
                            IS_CURRENT: 1
               OLDEST_READ_VIEW_TRX_ID: 0
                  OLDEST_READ_VIEW_LSN: 0
                  HIGHEST_LSN_RECEIVED: 0
                    CURRENT_READ_POINT: 0
CURRENT_REPLAY_LATENCY_IN_MICROSECONDS: 0
AVERAGE_REPLAY_LATENCY_IN_MICROSECONDS: 0
    MAX_REPLAY_LATENCY_IN_MICROSECONDS: 0
*************************** 2. row ***************************
                             SERVER_ID: instance-1
                            SESSION_ID: b268936e-2e23-43c2-b605-d11c6f4f5c87
                                  IOPS: 0
                              READ_IOS: 564
                      PENDING_READ_IOS: 0
                                   CPU: 23.85321044921875
                           DURABLE_LSN: 574414820
                            ACTIVE_LSN: 0
                  LAST_TRANSPORT_ERROR: 0
                  LAST_ERROR_TIMESTAMP: 1970-01-01 00:00:01.000000
                 LAST_UPDATE_TIMESTAMP: 2021-07-12 20:09:22.722399
  MASTER_SLAVE_LATENCY_IN_MICROSECONDS: 0
           REPLICA_LAG_IN_MILLISECONDS: 19
    LOG_STREAM_SPEED_IN_KiB_PER_SECOND: 2.5762556081754733
            LOG_BUFFER_SEQUENCE_NUMBER: 0
                            IS_CURRENT: 1
               OLDEST_READ_VIEW_TRX_ID: 48828381
                  OLDEST_READ_VIEW_LSN: 574414813
                  HIGHEST_LSN_RECEIVED: 574414830
                    CURRENT_READ_POINT: 574414813
CURRENT_REPLAY_LATENCY_IN_MICROSECONDS: 0
AVERAGE_REPLAY_LATENCY_IN_MICROSECONDS: 0
    MAX_REPLAY_LATENCY_IN_MICROSECONDS: 0
2 rows in set (0.02 sec)
  1. Of course, restarting proxysql fixes that particular issue...

Please let me know if you need any additional information and I will be happy to provide it.

renecannao added a commit that referenced this issue Aug 26, 2021
This solves the issue described in issue #3522

It is a side effect of a mistake in commit 5336e72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants