Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VTTablet query.Result contains inconsistent field/values #4661

Closed
setassociative opened this issue Feb 22, 2019 · 6 comments
Closed

VTTablet query.Result contains inconsistent field/values #4661

setassociative opened this issue Feb 22, 2019 · 6 comments

Comments

@setassociative
Copy link

VTTablet responding with stale field information causing a VTgate panic.

When a schema change is made to a tablet to add a new field VTTablet executions of select * may return querypb.QueryResult objects that have an outdated Fields attribute. This contradicts the docs which promise ...As returned by Execute, len(fields) is always equal to len(row) (for each row in rows).

The MakeRowTrusted implementation relies on that promise that len(fields) === len(row.Lengths) so when we hit the case described above the vtgate panics trying to construct a query result in MakeRowTrusted. I have a patch which shifts this panic to (I believe) a less common scenario in #4660.

The stack trace that led us to this conclusion (I'd love alternate readings) follows in the "Log Fragments" section.

Reproduction Steps

No concrete repro steps, will update as we get a chance to test locally. The summary is:

  1. Execute an alter across a keyspace that adds a new column with gh-ost or similar
  2. While the alter is completing and table schemas are cutting over the vtgate will panic on select * queries

Binary version

This is on vitess master: 7d3d74790252e0292450b06b2888508835f6a422

Log Fragments

Panic stack trace taken from the vtgate:

go/src/runtime/panic.go:505
go/src/runtime/panic.go:28
src/vitess.io/vitess/go/sqltypes/result.go:194
src/vitess.io/vitess/go/sqltypes/proto3.go:79
src/vitess.io/vitess/go/sqltypes/proto3.go:108
src/vitess.io/vitess/go/vt/vttablet/grpctabletconn/conn.go:113
src/vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:170
src/vitess.io/vitess/go/vt/vtgate/gateway/discoverygateway.go:322
src/vitess.io/vitess/go/vt/vtgate/gateway/discoverygateway.go:133
src/vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:168
src/vitess.io/vitess/go/vt/vtgate/scatter_conn.go:210
src/vitess.io/vitess/go/vt/vtgate/scatter_conn.go:796
src/vitess.io/vitess/go/vt/vtgate/scatter_conn.go:796
src/vitess.io/vitess/go/vt/vtgate/scatter_conn.go:811
src/vitess.io/vitess/go/vt/vtgate/scatter_conn.go:187
src/vitess.io/vitess/go/vt/vtgate/vcursor_impl.go:162
src/vitess.io/vitess/go/vt/vtgate/engine/route.go:228
src/vitess.io/vitess/go/vt/vtgate/engine/route.go:188
src/vitess.io/vitess/go/vt/vtgate/executor.go:301
src/vitess.io/vitess/go/vt/vtgate/executor.go:175
src/vitess.io/vitess/go/vt/vtgate/executor.go:126
src/vitess.io/vitess/go/vt/vtgate/vtgate.go:288
src/vitess.io/vitess/go/vt/vtgate/plugin_mysql_server.go:154
src/vitess.io/vitess/go/mysql/conn.go:730
src/vitess.io/vitess/go/mysql/server.go:439
go/src/runtime/asm_amd64.s:2361
@setassociative setassociative changed the title VTTablet query.Result contain inconsistent field/values VTTablet query.Result contains inconsistent field/values Feb 22, 2019
@leoxlin
Copy link

leoxlin commented Feb 22, 2019

Did you make the schema change via a DDL against the VTGate or against MySQL directly.

If you ran it against mysql directly, we had a problem very similar to this. The gist is that you need to invoke ReloadSchema on vtctld so the query plan cache is flushed. This needs to happen right after your migration.

@setassociative
Copy link
Author

Yes, we use ghost for schema changes which operates against MySQL directly.

I haven't audited the code path that processes MySQL results into protobuf objects yet but given that it's able to populate the result with a column value I suspect we have accurate field data somewhere so I'm hopeful there is a tablet-based fixed.

@setassociative
Copy link
Author

Took a super cursory look around and I think the solution here is going to be in QueryExecutor::Execute checking the length of fields against the values then returning an error on a mismatch and forcing eviction from the query plan cache in TabletServer.

setassociative pushed a commit to tinyspeck/vitess that referenced this issue Feb 25, 2019
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
@pH14
Copy link
Contributor

pH14 commented Dec 27, 2019

Ah hadn't seen this issue here before -- we ran into this again and put up #5572 which tackles this by adding an option to simply remove fields from the query cache.

ajm188 pushed a commit to tinyspeck/vitess that referenced this issue Apr 28, 2020
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
ajm188 pushed a commit to tinyspeck/vitess that referenced this issue Jun 26, 2020
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
ameetkotian pushed a commit to tinyspeck/vitess that referenced this issue Aug 19, 2020
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
ameetkotian pushed a commit to tinyspeck/vitess that referenced this issue Aug 19, 2020
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
setassociative pushed a commit to tinyspeck/vitess that referenced this issue Mar 3, 2021
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
setassociative pushed a commit to tinyspeck/vitess that referenced this issue Mar 4, 2021
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
setassociative pushed a commit to tinyspeck/vitess that referenced this issue Mar 17, 2021
Patch to handle this case while we move towards a permanent fix upstream.
cf. vitessio#4661 vitessio#4669
@indera-shsp
Copy link
Contributor

indera-shsp commented Jul 14, 2021

We are experiencing an issue with MakeRowTrusted as well.

panic: runtime error: index out of range [4] with length 4
[vtstreamer-x-40x-6cb47c455c-wt55x] goroutine 52 [running]:
[vtstreamer-x-40x-6cb47c455c-wt55x] vitess.io/vitess/go/sqltypes.MakeRowTrusted(0xc000afb120, 0x4, 0x4, 0xc0011a5720, 0x8, 0xdd0840, 0x1)
[vtstreamer-x-40x-6cb47c455c-wt55x] 	/go/pkg/mod/vitess.io/[email protected]/go/sqltypes/result.go:170 +0x178
[vtstreamer-x-40x-6cb47c455c-wt55x] vitess.io/vitess/go/sqltypes.proto3ToRows(0xc000afb120, 0x4, 0x4, 0xc000e775c0, 0x1, 0x1, 0xc0002e3820, 0x415c78, 0x8)
[vtstreamer-x-40x-6cb47c455c-wt55x] 	/go/pkg/mod/vitess.io/[email protected]/go/sqltypes/proto3.go:78 +0xa5
[vtstreamer-x-40x-6cb47c455c-wt55x] vitess.io/vitess/go/sqltypes.Proto3ToResult(0xc0002e3c10, 0xc000e775c0)

We tried to enable schema tracking as a solution as described here

https://vitess.io/docs/design-docs/vreplication/vstream/tracker/

but found that we need more RAM.
The tablet RAM was 2GB and we had to increase to 5GB (otherwise it kept OOMing)

@ajm188 ajm188 removed the P3 label Mar 9, 2023
@deepthi
Copy link
Member

deepthi commented May 17, 2024

This issue was fixed partially by providing a flag in #5572 and completely by removing the caching altogether in #10489.

@deepthi deepthi closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants