Specification of type system #1521

leiysky · 2021-08-19T05:57:13Z

It's important to provide a well defined type system.

In ANSI SQL specification, there are many data types, while all of the databases have their own type systems that always have difference fron ANSI SQL.

The most complicated part is the implicit conversion between data types.

For example, there isn't a real BOOLEAN type in MySQL. They represents BOOLEAN with TINYINT(1), thus literal TRUE value is actually 1 with type TINYINT.

Besides, MySQL allows implicit conversion from string type to float, that is, if you write down a SQL query like SELECT SUM(char_column) FROM table, MySQL can still produce a bizzare result(it will implicitly convert char_column into a double_column, and the conversion rule is not well defined).

While PostgreSQL do have BOOLEAN type, and it doesn't allow non-reasonable implicit conversion.

We've discussed about this in #1492, and the detailed discussion about type system will be finished in this issue.

The text was updated successfully, but these errors were encountered:

leiysky · 2021-08-19T05:58:00Z

cc @sundy-li

sundy-li · 2021-08-19T07:11:09Z

Thanks for the issue, I think in the alpha version, the types we must have are:

PrimitiveTypes:

UInt8
UInt16
UInt32
UInt64
Int8
Int16
Int32
Int64
Float32
Float64

UInt8/Int8 aka tinyint, UInt16/Int16 aka smallint, UInt32/Int32 aka int, UInt64/Int64 aka bigint, Float32 aka float, Float64 aka double.

Boolean

Boolean
Some database system takes UInt8/tinyint(1) as Boolean, but In Datafuse I think it's ok to have a native boolean type, which stores one value as a bit. I think it can work as well as UInt8.

String, Binary

Utf8 <---> LargeUtf8 in arrow2
Binary <---> LargeBinary in arrow2

Binary can cover the case of Utf8, seems we can keep Binary type and allow users to add charset constraints to the type.

Complex types

List(Box) <---> LargeList in arrow2
Struct(Vec)

List aka Array, Struct aka tuple.

Date, DateTime Types

Date, DateTime types are logic types, we can map these types into primitive types.
In Arrow's date Specification, they are signed types. I don't thinks we must follow arrow's specification, we can have our own date/datetime types.

Date16 <--> UInt16
Date32 <--> UInt32
Date64 <--> UInt64
DateTime32 <--> UInt32
DateTime64 <--> UInt64

leiysky · 2021-08-19T07:31:46Z

PrimitiveTypes:

Maybe we should involve DECIMAL type as well?

jorgecarleitao · 2021-08-19T08:01:43Z

Note that Utf8 and Binary have a maximum number of characters (over all items) of i32:MAX; this is not a per-item constraint; sum over all characters over all items < i32:MAX.

wrt to not following the arrow spec, the constraint there is that if you need FFI (e.g. UDFs written in python for ML and things like that), it will be more difficult to integrate those types. I am not sure that is a problem per se; just mentioning.

sundy-li · 2021-08-19T08:23:00Z

Maybe we should involve the DECIMAL type as well?

Yes, we can add it later, add Decimal(P, S) after the alpha version.

Note that Utf8 and Binary have a maximum number of characters (over all items) of i32:MAX; this is not a per-item
constraint; sum over all characters over all items < i32:MAX.

Thanks, in Datafuse, we take Utf8 and Binary as LargeUtf8 and LargeBinary in arrow2. Because using i32 to represent the offset is really not enough for large data. I even want to change to offset to be u64 instead of i64, but it will destroy the arrow spec.

the constraint there is that if you need FFI

That's true, still need some considerations and discussions about that.

jorgecarleitao · 2021-08-19T08:34:15Z

out of curiosity, why 65535? I though that that was the max of u16:

fn main() {
    println!("i32::MAX: {}", i32::MAX);
    println!("i16::MAX: {}", i16::MAX);
    println!("u16::MAX: {}", u16::MAX);
}

prints

i32::MAX: 2147483647
i16::MAX: 32767
u16::MAX: 65535

sundy-li · 2021-08-19T08:36:54Z

out of curiosity, why 65535

Oh, that's my bad.

leiysky · 2021-08-20T06:17:50Z

Hi @sundy-li, it seems this specification basically follows the DataType defined here: https://github.com/datafuselabs/datafuse/blob/681efe24288e41adacd9e5ce978f361a6d37ab42/common/datavalues/src/types/data_type.rs#L23-L51

I have need for this in #1538 and #1537 , can I just treat it as a stable API and rely on it?

sundy-li · 2021-08-20T07:33:33Z

Can I just treat it as a stable API and rely on it?

Yes, currently we treat it as our own type.

Xuanwo · 2022-07-02T12:07:44Z

FYI @andylokandy

Xuanwo · 2022-09-16T06:03:24Z

@leiysky, maybe we can close this issue now?

leiysky added the C-improvement Category: improvement label Aug 19, 2021

Xuanwo closed this as completed Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specification of type system #1521

Specification of type system #1521

leiysky commented Aug 19, 2021 •

edited

Loading

leiysky commented Aug 19, 2021

sundy-li commented Aug 19, 2021 •

edited

Loading

leiysky commented Aug 19, 2021

PrimitiveTypes:

jorgecarleitao commented Aug 19, 2021

sundy-li commented Aug 19, 2021 •

edited

Loading

jorgecarleitao commented Aug 19, 2021

sundy-li commented Aug 19, 2021 •

edited

Loading

leiysky commented Aug 20, 2021

sundy-li commented Aug 20, 2021

Xuanwo commented Jul 2, 2022

Xuanwo commented Sep 16, 2022

Specification of type system #1521

Specification of type system #1521

Comments

leiysky commented Aug 19, 2021 • edited Loading

leiysky commented Aug 19, 2021

sundy-li commented Aug 19, 2021 • edited Loading

PrimitiveTypes:

Boolean

String, Binary

Complex types

Date, DateTime Types

leiysky commented Aug 19, 2021

PrimitiveTypes:

jorgecarleitao commented Aug 19, 2021

sundy-li commented Aug 19, 2021 • edited Loading

jorgecarleitao commented Aug 19, 2021

sundy-li commented Aug 19, 2021 • edited Loading

leiysky commented Aug 20, 2021

sundy-li commented Aug 20, 2021

Xuanwo commented Jul 2, 2022

Xuanwo commented Sep 16, 2022

leiysky commented Aug 19, 2021 •

edited

Loading

sundy-li commented Aug 19, 2021 •

edited

Loading

sundy-li commented Aug 19, 2021 •

edited

Loading

sundy-li commented Aug 19, 2021 •

edited

Loading