-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix the handling of integer types #28690
Conversation
The PR is not yet ready for review. Will ping. |
93bb755
to
f7dd8b4
Compare
ok I think the review can start now. What's missing:
|
f7dd8b4
to
98a3278
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 39 of 44 files at r1, 20 of 20 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/create_table.go, line 1405 at r1 (raw file):
evalCtx *tree.EvalContext, ) error { if d.HasDefaultExpr() {
This code movement seems unrelated to this PR, but this check should indeed come at the start.
pkg/sql/information_schema.go, line 288 at r1 (raw file):
tree.DNull, // character_set_name dStringPtrOrEmpty(column.ComputeExpr), // generation_expression tree.NewDString(column.Type.SQLString()), // crdb_sql_type
This change needs to be addressed in docs, and should probably be mentioned in the release notes.
pkg/sql/show_columns.go, line 37 at r1 (raw file):
IF(inames[1] IS NULL, ARRAY[]:::STRING[], inames) AS indices FROM (SELECT column_name, crdb_sql_type, data_type, is_nullable, column_default, generation_expression, ordinal_position,
is data type used at all anymore?
pkg/sql/coltypes/aliases.go, line 143 at r1 (raw file):
// NewFloat creates a type alias for FLOAT with the given precision. func NewFloat(prec int64) (*TFloat, error) { if prec < 1 {
just for style, perhaps a switch here would be a bit cleaner
pkg/sql/coltypes/arith.go, line 15 at r1 (raw file):
// permissions and limitations under the License. package coltypes
This is just me personally, but I would really like to see each type get it's own file. This goes for all types. Maybe a future PR.
pkg/sql/coltypes/arith.go, line 30 at r1 (raw file):
func (node *TBool) TypeName() string { return "BOOL" } // PGTypeName implements the ColTypeFormatter interface.
I'm a fan of ensuring an interface is implemented using the dummy declaration. Just makes me feel safer.
var _ foo.RequiredInterface = myType{}
pkg/sql/logictest/testdata/planner_test/distsql_agg, line 126 at r2 (raw file):
SELECT url FROM [EXPLAIN (DISTSQL) SELECT avg(c), sum(c), avg(d), sum(d) FROM data] ---- https://cockroachdb.github.io/distsqlplan/decode.html#eJy8lF-L4jAUxd_3U8h9WiFg80fH6VNkYEHYGRf_PC1lyZpLEZxGkhR2GfzuS1thWrFpwbpPmtuc_E7ODfcDMqPxTb2jg_gnUCDAgAAHAgIITCEhcLJmj84ZW2ypBEv9B-KIwCE75b4oJwT2xiLEH-AP_ogQw1b9PuIalUY7iYCARq8OxxJzsod3Zf9KrbwCAqvcxyPJiRSQnAmY3H-e6rxKEWJ6Jv3JizS1mCpv7GTaBG92r18lHQOBl9XubXv5X1ZZrcrGrUZYq5FPfp4Zq9GibsCTc9gqjdq8bnavv5YXXxe3vFEXxWqNmUYbjySdvCw2xe7F5tv31WI7H5ORpGQk-USK4qf1crxxOdq_v3TY_naQa6HNHttf1j8CNmwEHeRaBE-PjYD3j4APG0EHuRbB_LERiP4RiGEj6CDXInj-f4PuhpE1upPJHF4NvNsnR8UgRJ1iNTWdye0ef1izLzHVclXqyoJG56uvtFoss-pTYbAupkExa4jptZiFyR1oHlSLsFjc43saFM_C5Nk95KegeB4mz-8hP4d7FXU8k_Aju2Yn5y__AgAA___R7uSX
Why did these change?
pkg/sql/opt/exec/execbuilder/testdata/distsql_agg, line 126 at r2 (raw file):
SELECT url FROM [EXPLAIN (DISTSQL) SELECT avg(c), sum(c), avg(d), sum(d) FROM data] ---- https://cockroachdb.github.io/distsqlplan/decode.html#eJy8lF-L4jAUxd_3U8h9WiFg80fH6VNkYEHYGRf_PC1lyZpLEZxGkhR2GfzuS1thWrFpwbpPmtuc_E7ODfcDMqPxTb2jg_gnUCDAgAAHAgIITCEhcLJmj84ZW2ypBEv9B-KIwCE75b4oJwT2xiLEH-AP_ogQw1b9PuIalUY7iYCARq8OxxJzsod3Zf9KrbwCAqvcxyPJiRSQnAmY3H-e6rxKEWJ6Jv3JizS1mCpv7GTaBG92r18lHQOBl9XubXv5X1ZZrcrGrUZYq5FPfp4Zq9GibsCTc9gqjdq8bnavv5YXXxe3vFEXxWqNmUYbjySdvCw2xe7F5tv31WI7H5ORpGQk-USK4qf1crxxOdq_v3TY_naQa6HNHttf1j8CNmwEHeRaBE-PjYD3j4APG0EHuRbB_LERiP4RiGEj6CDXInj-f4PuhpE1upPJHF4NvNsnR8UgRJ1iNTWdye0ef1izLzHVclXqyoJG56uvtFoss-pTYbAupkExa4jptZiFyR1oHlSLsFjc43saFM_C5Nk95KegeB4mz-8hP4d7FXU8k_Aju2Yn5y__AgAA___R7uSX
Why did these change?
pkg/sql/opt/memo/private_storage_test.go, line 373 at r1 (raw file):
nf30, _ := coltypes.NewFloat(16) test(coltypes.Float, nf30, false) test(nf30, nf30, true)
What is this line testing?
pkg/sql/parser/parse_test.go, line 1139 at r2 (raw file):
{`SELECT NUMERIC 'foo'`, `SELECT DECIMAL 'foo'`}, {`SELECT REAL 'foo'`, `SELECT FLOAT4 'foo'`}, {`SELECT DOUBLE PRECISION 'foo'`, `SELECT FLOAT8 'foo'`},
Maybe add an array or two as well?
pkg/sql/parser/sql.y, line 5895 at r2 (raw file):
| const_datetime | const_json | BLOB
There's no way to just collapse all BLOB, BYTES and BYTEA into one is there? It would just look so much nicer.
pkg/sql/sem/tree/col_types_test.go, line 30 at r1 (raw file):
testData := []struct { str string norm string
please add a comment about this being empty means the name matches.
pkg/sql/sem/tree/parse_array.go, line 151 at r1 (raw file):
// string representation of the type. Used by dump. It only // supports those type names that can appear immediately before `[]`. func ArrayElementTypeStringToColType(s string) (coltypes.T, error) {
I think this should be a function in coltypes. Ideally inside arrays.go.
pkg/sql/sem/tree/testdata/pretty/7.ref.golden, line 136 at r2 (raw file):
NOTHING 26:
Why did you remove this test case?
pkg/sql/sqlbase/coltype_conversion.go, line 26 at r2 (raw file):
"github.com/pkg/errors" )
All of the functions in this file should have unit tests.
pkg/sql/sqlbase/coltype_conversion.go, line 31 at r2 (raw file):
// After calling this, the caller must also call PopulateTypeAttrs // below. func DatumTypeToColumnType(ptyp types.T) (ColumnType, error) {
This has a mix of some returns inside the cases and some out.
Can you move them all in and get rid of the ctyp variable entirely.
pkg/sql/sqlbase/coltype_conversion.go, line 92 at r2 (raw file):
case 16: base.VisibleType = ColumnType_SMALLINT case 32:
these two lying cases can be joined together.
pkg/sql/sqlbase/coltype_conversion.go, line 180 at r2 (raw file):
// InfoSchemaColumnType returns the string suitable to populate the data_type column // of information_schema.columns.
I'm confused about information_schema.columns.
Are postgres type names compatible but we're not? Because these look like postgres type names.
For pg_catalog, let's go with postgres' type names, but for information schema, let's either go with our type names unless there's another standard we should match.
pkg/sql/sqlbase/coltype_conversion.go, line 279 at r2 (raw file):
// Pre-2.1 columns: the width is not set yet and instead there // is a precision. Reverse-engineer the width from that. if c.Precision < 0 {
switch here might make it a lot nicer to read
Also, this matches FloatProperties(), any way to DRY it up? Especially considering that this has some extra strange logic in it.
pkg/sql/sqlbase/structured.go, line 2221 at r2 (raw file):
} else if c.Precision == 0 { // Special case of poorly initialized coltypes pre-2.1. width = 64
why not just return here? And for all the cases? It will greatly simplify the logic:
func (c *ColumnType) FloatProperties() (int32, int32) {
if c.width == 0 {
// Pre-2.1 columns: the width is not set yet and instead there
// is a precision. Reverse-engineer the width from that.
if c.Precision < 0 {
panic(fmt.Sprintf("programming error: invalid float precision: %d", c.Precision))
}
if c.Precision == 0 {
// Special case of poorly initialized coltypes pre-2.1.
return int32(64), int32(53)
}
if c.Precision <= 24 {
return int32(32), int32(24)
}
if c.Precision <= 54 {
return int32(64), int32(53)
}
panic(fmt.Sprintf("programming error: invalid float precision: %d", c.Precision))
}
if c.width == 64 {
return int32(64), int32(53)
}
return c.width, int32(24)
}
might even be worth adding a switch into the inner if
pkg/sql/sqlbase/structured.go, line 2240 at r2 (raw file):
// data types. Returns false if the data type is not numeric, or if the precision // of the numeric type is not bounded. func (c *ColumnType) NumericPrecision() (int32, bool) {
Does this ever return true for 0? If not, why not remove the bool return value entirely?
pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):
// Note: this enum is becoming deprecated. Do not add items // to it without consideration. // TODO(knz): remove this.
Please add to this comment as to when you can do so.
pkg/sql/sqlbase/structured.proto, line 80 at r2 (raw file):
optional SemanticType semantic_type = 1 [(gogoproto.nullable) = false]; // BIT, INT, FLOAT, DECIMAL, CHAR, BINARY, FLOAT.
nit: this isn't a sentence, no period at the end
pkg/sql/sqlbase/structured.proto, line 82 at r2 (raw file):
// BIT, INT, FLOAT, DECIMAL, CHAR, BINARY, FLOAT. optional int32 width = 2 [(gogoproto.nullable) = false]; // DECIMAL.
nit: this isn't a sentence, no period at the end
Perhaps
// DECIMAL, pre2.1 FLOAT (incorrectly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to ensure that docs is aware of the changes and I think the release note need to be greatly expanded.
Reviewable status: complete! 0 of 0 LGTMs obtained
Thank you for your initial review. I'll iterate on that today. Regarding the doc work I think I will simply take care of doing all the doc updates myself. There is no way the doc team will have the bandwidth to process such a change themselves. In the meantime you can have a look at the SERIAL PR #28575 |
98a3278
to
2e6968e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reactions already, will process the rest tomorrow.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/show_columns.go, line 37 at r1 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
is data type used at all anymore?
Removed.
pkg/sql/coltypes/arith.go, line 30 at r1 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
I'm a fan of ensuring an interface is implemented using the dummy declaration. Just makes me feel safer.
var _ foo.RequiredInterface = myType{}
This is already done in interface.go
.
pkg/sql/logictest/testdata/planner_test/distsql_agg, line 126 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Why did these change?
FLOAT -> FLOAT8
pkg/sql/opt/exec/execbuilder/testdata/distsql_agg, line 126 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Why did these change?
FLOAT -> FLOAT8
pkg/sql/opt/memo/private_storage_test.go, line 373 at r1 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
What is this line testing?
That interning two times the same type yields equal instances. I had botched the test, it is now fixed.
pkg/sql/sem/tree/testdata/pretty/7.ref.golden, line 136 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Why did you remove this test case?
This is auto-generated. The test case disappears because there is no difference in output for this column width.
pkg/sql/sqlbase/coltype_conversion.go, line 180 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
I'm confused about information_schema.columns.
Are postgres type names compatible but we're not? Because these look like postgres type names.
For pg_catalog, let's go with postgres' type names, but for information schema, let's either go with our type names unless there's another standard we should match.
Yes I was confused myself. ACtually there are two completely unrelated things:
- the "pg display name", which is the internal name of the type inside postgresql and which we use in pg_catalog.
- the information_schema type, which is specified by the SQL standard, and which is generated here.
pkg/sql/sqlbase/structured.go, line 2221 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
why not just return here? And for all the cases? It will greatly simplify the logic:
func (c *ColumnType) FloatProperties() (int32, int32) { if c.width == 0 { // Pre-2.1 columns: the width is not set yet and instead there // is a precision. Reverse-engineer the width from that. if c.Precision < 0 { panic(fmt.Sprintf("programming error: invalid float precision: %d", c.Precision)) } if c.Precision == 0 { // Special case of poorly initialized coltypes pre-2.1. return int32(64), int32(53) } if c.Precision <= 24 { return int32(32), int32(24) } if c.Precision <= 54 { return int32(64), int32(53) } panic(fmt.Sprintf("programming error: invalid float precision: %d", c.Precision)) } if c.width == 64 { return int32(64), int32(53) } return c.width, int32(24) }might even be worth adding a switch into the inner if
Done.
pkg/sql/sqlbase/structured.proto, line 68 at r2 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
Please add to this comment as to when you can do so.
I was mistaken; instead I reverted that change and went in a different direction.
2e6968e
to
65dc19c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/create_table.go, line 1405 at r1 (raw file):
Previously, BramGruneir (Bram Gruneir) wrote…
This code movement seems unrelated to this PR, but this check should indeed come at the start.
Forked as #28753.
28753: sql: pull an assertion earlier in validateComputedColumn() r=knz a=knz Extracted from #28690 This is a minor cleanup. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
28943: sqlbase: add the missing comments and group functions by purpose r=knz a=knz First commits from #28942 and priors. Forked from #28690. The file `table.go` was getting unwieldy and was containing functions from different area of concern. This was making maintenance particularly burdensome. This patch splits the functionality in the following files: - `index_encoding.go`: index key prefixes, differences between primary and secondary encodings, how to organize columns in an index. - `column_type_encoding.go`: encoding of SQL values into bytes towards KV. - `column_type_properties.go`: conversions between various representations of types in the SQL layer; derivation of properties from SQL types. - `datum_alloc.go`: the `DatumAlloc` helper. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
28945: sql: fix the reporting of types in information_schema.columns r=knz a=knz First commits from #28944 and priors. Forked off #28690. Fixes #27601. Largely addresses the concerns that led to issue #26925. Prior to this patch, CockroachDB incorrectly placed the "input syntax" of each SQL type in the column `data_type` of `information_schema.columns`. The input syntax is the one reported in SHOW COLUMNS, SHOW CREATE TABLE and other places, and is suitable to reproduce the exact type of at able. In contrast, `information_schema.columns.data_type` is constrained by compatibility with third party tools and PostgreSQL clients. It must report the name of the type like PostgreSQL does, which in turn is constrained by the SQL standard. A text column must be reported as "text" not "string"; a decimal column as "numeric" not "decimal", a float8 column as "double precision" not "float8", and so on. By reporting the wrong string in that column CockroachDB is confusing ORMs, which subsequently decide that the current on-disk type is not the one expected by the app and then initiate a schema change (ALTER COLUMN SET TYPE). This patch corrects this incompatibility by introducing logic that produces the proper information schema names for column types. This is expected to reduce ORM complaints about insufficient support for ALTER COLUMN SET TYPE (but will be tested/evaluated separately). Release note (bug fix): CockroachDB now populates the `data_type` column of `information_schema.columns` like PostgreSQL for compatibility with 3rd party tools and ORMs. Co-authored-by: Raphael 'kena' Poss <[email protected]>
28949: cli,sql: add another crutch to the handling of column types in `dump` r=knz a=knz All commits but the last part of #28945 and priors. PR forked from #28690. There are major problems in `cockroach dump` described separately in #28948. Part of this is `cockroach dump` trying to reverse-engineer a datum type from the announced column type in `information_schema.columns`, by parsing the type name. Prior to this patch, this parsing was incomplete (it went out of sync with the SQL parser over time, naturally as it was bound to do). This entire approach is flawed, but this is no time to redesign `cockroach dump`. Instead, this patch re-syncs the dump-specific type parser with the general SQL parser. It also ensures this parser is still able to parse pre-2.1 type name aliases. Release note (bug fix): `cockroach dump` is now again better able to operate across multiple CockroachDB versions. Co-authored-by: Raphael 'kena' Poss <[email protected]>
28950: sql: desugar TEXT to STRING r=knz a=knz First commits from #28949 and priors. Forked off from #28690. The distinction between "TEXT" and "STRING" inside CockroachDB is inconsequential. Remove it. The data type is already properly reported in pg_catalog and information_schema as "text". Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
7ffbafd
to
5be8d9d
Compare
5be8d9d
to
e3e21e6
Compare
rebased, RFAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knz, can you pull this out into a fresh PR? I'm having some issues trying to review only the final commit.
Reviewed 8 of 56 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pull this out into a fresh PR?
My pleasure: #29020.
Reviewable status: complete! 0 of 0 LGTMs obtained
Prior to this patch, CockroachDB maintained an unnecessary distinction between "INT" and "INTEGER", between "BIGINT" and "INT8", etc. This distinction is unnecessary but also costly, as we were paying the price of a "name" attribute in coltypes.TInt, with a string comparison and hash table lookup on every use of the type. What really matters is that the type shows up properly in introspection; this has already been ensured by various OID-to-pgcatalog mappings and the recently introduced `InformationSchemaTypeName()`. Any distinction beyond that is unnecessary and can be dropped from the implementation. Release note: None
e3e21e6
to
32f9509
Compare
bors r+ |
28690: sql: fix the handling of integer types r=knz a=knz Addresses a large chunk of #26925. Fixes #25098. Informs #24686. Prior to this patch, CockroachDB maintained an unnecessary distinction between "INT" and "INTEGER", between "BIGINT" and "INT8", etc. This distinction is unnecessary but also costly, as we were paying the price of a "name" attribute in coltypes.TInt, with a string comparison and hash table lookup on every use of the type. What really matters is that the type shows up properly in introspection; this has already been ensured by various OID-to-pgcatalog mappings and the recently introduced `InformationSchemaTypeName()`. Any distinction beyond that is unnecessary and can be dropped from the implementation. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
I think you're borsing the wrong PR now. Unless you updated the commit in this PR as well. |
But probably best to close this one and merge the other pr. |
They have the same commit SHAs, just different PR numbers. You'll see, when one merges the other will be merged automatically at the same time. |
Build succeeded |
29006: backport-2.1: fix the handling of SQL types r=knz a=knz Backport 1/1 commits from #28937. Backport 1/1 commits from #28942. Backport 17/17 commits from #28943. Backport 1/1 commits from #28944. Backport 1/1 commits from #28945. Backport 1/1 commits from #28949. Backport 2/2 commits from #28950. Backport 1/1 commits from #28951 (#28959). Backport 1/1 commits from #28952 (#28959). Backport 1/1 commits from #28955 (#28959). Backport 1/1 commits from #28959. Backport 1/1 commits from #28959. Backport 1/1 commits from #28690. cc @cockroachdb/release Co-authored-by: Raphael 'kena' Poss <[email protected]>
29006: backport-2.1: fix the handling of SQL types r=knz a=knz Backport 1/1 commits from #28937. Backport 1/1 commits from #28942. Backport 17/17 commits from #28943. Backport 1/1 commits from #28944. Backport 1/1 commits from #28945. Backport 1/1 commits from #28949. Backport 2/2 commits from #28950. Backport 1/1 commits from #28951 (#28959). Backport 1/1 commits from #28952 (#28959). Backport 1/1 commits from #28955 (#28959). Backport 1/1 commits from #28959. Backport 1/1 commits from #28959. Backport 1/1 commits from #28690. cc @cockroachdb/release Co-authored-by: Raphael 'kena' Poss <[email protected]>
Addresses a large chunk of #26925.
Fixes #25098.
Informs #24686.
Prior to this patch, CockroachDB maintained an unnecessary distinction
between "INT" and "INTEGER", between "BIGINT" and "INT8", etc.
This distinction is unnecessary but also costly, as we were paying the
price of a "name" attribute in coltypes.TInt, with a string comparison
and hash table lookup on every use of the type.
What really matters is that the type shows up properly in
introspection; this has already been ensured by various
OID-to-pgcatalog mappings and the recently introduced
InformationSchemaTypeName()
.Any distinction beyond that is unnecessary and can be dropped from the
implementation.
Release note: None