Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtgate: regression from v10 to v11 on stream * from queries #8676

Closed
derekperkins opened this issue Aug 24, 2021 · 8 comments · Fixed by #8755 or #8926
Closed

vtgate: regression from v10 to v11 on stream * from queries #8676

derekperkins opened this issue Aug 24, 2021 · 8 comments · Fixed by #8755 or #8926

Comments

@derekperkins
Copy link
Member

Overview of the Issue

We use Vitess Messaging via the SQL command stream * from messages_table, which has worked since roughly v2.2 without modification, tested on every released version through v11.0.0. After upgrading vtgate from v10.0.2 to v11.0.0 however, it immediately errors with rpc error: code = Unknown desc = No partition found for tabletType unknown in keyspace. This happens regardless of vttablet or vtctld version, and immediately goes away when downgrading vtgate back to any prior version, with no changes to the topo or any other config.

Reproduction Steps

Steps to reproduce this issue:

  1. Deploy the following vschema:

    {
      "sharded": false,
      "vindexes": {
        "hash": {
          "type": "hash"
        },
      "tables": {
    "workspaces__es_bq_perm_msgs": {
      "column_vindexes": [
        {
          "column": "keyspace_id",
          "name": "hash"
        }
      ]
    },
      }
    }
  2. Deploy the following schema:

CREATE TABLE `workspaces__es_bq_perm_msgs` (
  `id` bigint NOT NULL,
  `keyspace_id` bigint NOT NULL,
  `priority` tinyint NOT NULL DEFAULT '50',
  `epoch` smallint NOT NULL DEFAULT '0',
  `time_next` bigint DEFAULT NULL,
  `time_acked` bigint DEFAULT NULL,
  `time_scheduled` bigint NOT NULL,
  `time_created` bigint NOT NULL,
  `attributes` json DEFAULT NULL,
  `data` varbinary(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk__workspaces__es_bq_perm_msgs__events` (`keyspace_id`,`id`),
  KEY `ack_idx` (`time_acked`),
  KEY `next_idx` (`time_next`,`priority`),
  CONSTRAINT `fk__workspaces__es_bq_perm_msgs__events` FOREIGN KEY (`keyspace_id`, `id`) REFERENCES `events` (`workspace_id`, `event_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED COMMENT='vitess_message,vt_ack_wait=30,vt_purge_after=86400,vt_batch_size=10,vt_cache_size=10000,vt_poller_interval=30'
  1. vtgate config
eval exec /vt/bin/vtgate $(cat <<END_OF_COMMAND
  -topo_implementation=etcd2
  -topo_global_server_address="etcd-global-client.vitess:2379"
  -topo_global_root=/vitess/global
  -logtostderr=true
  -stderrthreshold=0
  -port=15001
  -grpc_port=15991

  -mysql_server_port=3306

  -mysql_auth_server_impl="static"
  -mysql_auth_server_static_file="/mysqlcreds/creds.json"

  -gate_query_cache_size 100

  -enable_buffer
  -buffer_drain_concurrency 100
  -buffer_max_failover_duration 20s
  -buffer_min_time_between_failovers 1m0s
  -buffer_size 10000
  -buffer_window 10s

  -max_memory_rows 700000
  -grpc_max_message_size 100000000

  -service_map="grpc-vtgateservice"
  -grpc_prometheus
  -cells_to_watch="uscentral1"
  -tablet_types_to_wait="MASTER,REPLICA"
  -cell="uscentral1"
  -grpc_keepalive_time="5s"
  -grpc_keepalive_timeout="60s"
  -grpc_server_keepalive_enforcement_policy_permit_without_stream="true"
  -healthcheck_timeout="60s"
  -mysql_server_version="8.0.25-Vitess"

END_OF_COMMAND
)
  1. View error
rpc error: code = Unknown desc = No partition found for tabletType unknown in keyspace

Binary version:

v11.0.0 Docker container
connecting via v11.0.0 Go vtitessdriver (grpc)

Operating system and Environment details

GKE 1.20.8-gke900

@mattlord
Copy link
Contributor

mattlord commented Aug 24, 2021

I'm not able to reproduce on an unsharded keyspace built from main @ f670e07. So seems to be related to sharding, which makes sense given the error.

You just recently upgraded this cluster? I feel like we've had several reports of similar issues where the topo was a little off until additional steps were taken.

mysql> select @@version;
+------------------------------+
| @@version                    |
+------------------------------+
| 5.7.9-vitess-12.0.0-SNAPSHOT |
+------------------------------+
1 row in set (0.00 sec)

mysql> show create table customer.my_message;
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| my_message | CREATE TABLE `my_message` (
  `time_scheduled` bigint(20) NOT NULL,
  `id` bigint(20) NOT NULL,
  `time_next` bigint(20) DEFAULT NULL,
  `priority` tinyint(4) NOT NULL DEFAULT '50',
  `epoch` bigint(20) DEFAULT NULL,
  `time_created` bigint(20) DEFAULT NULL,
  `time_acked` bigint(20) DEFAULT NULL,
  `message` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`time_scheduled`,`id`),
  UNIQUE KEY `id_idx` (`id`),
  KEY `next_idx` (`time_next`,`epoch`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='vitess_message,vt_ack_wait=30,vt_purge_after=86400,vt_batch_size=10,vt_cache_size=10000,vt_poller_interval=30' |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> show vitess_tablets;
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| Cell  | Keyspace | Shard | TabletType | State   | Alias            | Hostname     | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| zone1 | commerce | 0     | PRIMARY    | SERVING | zone1-0000000100 | 39e84ecbbf5e | 2021-08-24T04:13:22Z |
| zone1 | commerce | 0     | REPLICA    | SERVING | zone1-0000000101 | 39e84ecbbf5e |                      |
| zone1 | commerce | 0     | RDONLY     | SERVING | zone1-0000000102 | 39e84ecbbf5e |                      |
| zone1 | customer | 0     | PRIMARY    | SERVING | zone1-0000000200 | 39e84ecbbf5e | 2021-08-24T04:14:20Z |
| zone1 | customer | 0     | REPLICA    | SERVING | zone1-0000000201 | 39e84ecbbf5e |                      |
| zone1 | customer | 0     | RDONLY     | SERVING | zone1-0000000202 | 39e84ecbbf5e |                      |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
6 rows in set (0.00 sec)

mysql> set workload=olap;
Query OK, 0 rows affected (0.00 sec)

mysql> stream * from my_message;

@derekperkins
Copy link
Member Author

It's happening on an unsharded keyspace, but that has a full vschema in preparation for sharding sometime in the near future. It did happen while we were testing upgrades recently. I haven't tried main, just v11.0.0, using the Docker image here https://hub.docker.com/r/vitess/vtgate/tags?page=1&ordering=last_updated.

I am not setting the workload to olap myself, though the driver might be doing that under the hood. Here's how we are configuring it in Go.

cnf := vitessdriver.Configuration{
	Protocol:  "grpc",
	Address:   address,
	Target:    target,
	Streaming: true,

	// set dial options with max message sizes
	GRPCDialOptions: []grpc.DialOption{
		grpc.WithDefaultCallOptions(
			grpc.MaxCallRecvMsgSize(ngrpc.MaxMsgSize),
			grpc.MaxCallSendMsgSize(ngrpc.MaxMsgSize),
		),
		grpc.WithKeepaliveParams(keepalive.ClientParameters{
			Time:                300 * time.Second,
			Timeout:             600 * time.Second,
			PermitWithoutStream: true,
		}),
	},
}

db, err := vitessdriver.OpenWithConfiguration(cnf)
if err != nil {
	return err
}

@harshit-gangal
Copy link
Member

@derekperkins In your VStreamRequest are you setting the tabletType? If not, can you set that and try again.

@derekperkins
Copy link
Member Author

@systay / @harshit-gangal I actually did not test this correctly, and it does not solve my problem. When I set the target to master, I get this error:
Unknown database 'master' in vschema

@derekperkins derekperkins reopened this Sep 8, 2021
@derekperkins
Copy link
Member Author

I promise I know how to use Vitess. I was testing target: master instead of target: @master. The attached PR should indeed solve the problem.

@derekperkins
Copy link
Member Author

I'm back. I deployed a canary v11.0.1 vtgate and this problem persists. I can still workaround it by setting target: @master, but if I don't set it, I still get Unknown desc = No partition found for tabletType unknown in keyspace

@derekperkins derekperkins reopened this Sep 10, 2021
@systay systay assigned systay and unassigned harshit-gangal Sep 29, 2021
@systay
Copy link
Collaborator

systay commented Sep 29, 2021

I've spent half a day today trying to reproduce this problem and I have not been successful. Any chance you can whip up an end2end test that shows the issue? I just don't know how to make progress without being able to reproduce it

@systay systay removed their assignment Sep 29, 2021
@derekperkins
Copy link
Member Author

@systay thanks for looking into this. I don't have bandwidth to do an end2end test for it, but I could grant you access to Vitess in our live environment if that would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants