-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Variant Type #1453
base: main
Are you sure you want to change the base?
Add Variant Type #1453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st review
// Interface returns the underlying value as interface{}. Same as Any. | ||
func (v Variant) Interface() interface{} { | ||
return v.value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point? interface{}
is an alias to any
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put both so users could choose whichever name they prefer within their application. Some apps prefer using any
and others interface{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only a name. Why have both functions?
// Int returns the value as an int if possible | ||
func (v Variant) Int() (int, bool) { | ||
if i, ok := v.value.(int); ok { | ||
return i, true | ||
} | ||
|
||
return 0, false | ||
} | ||
|
||
// Int64 returns the value as an int64 if possible | ||
func (v Variant) Int64() (int64, bool) { | ||
if i, ok := v.value.(int64); ok { | ||
return i, true | ||
} | ||
|
||
return 0, false | ||
} | ||
|
||
// String returns the value as a string if possible | ||
func (v Variant) String() (string, bool) { | ||
if s, ok := v.value.(string); ok { | ||
return s, true | ||
} | ||
|
||
return "", false | ||
} | ||
|
||
// Bool returns the value as an bool if possible | ||
func (v Variant) Bool() (bool, bool) { | ||
if b, ok := v.value.(bool); ok { | ||
return b, true | ||
} | ||
|
||
return false, false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wonder if we need these functions at all. See:
package main
import "fmt"
type Variant struct {
value any
}
func (v *Variant) Value() any {
return v.value
}
func main() {
v := Variant{}
i, ok := v.Value().(int)
fmt.Println(i, ok)
}
We can just expose value and let the user freely type assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More: If we make value
public, we don't need any value access functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering removing Variant
in favor of VariantWithType
as the default. There are many cases where the ClickHouse type needs to be provided. If value
were public then it doesn't make this type any different from a regular any
.
For now I agree it may be best to simply make value
public
|
||
func (c *Variant) AppendRow(v any) error { | ||
var requestedType string | ||
switch v.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch v.(type) { | |
switch vv := v.(type) { |
This will give you a type asserted value ready to use inside case blocks without the need for additional type assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I've seen this elsewhere in the code, good point. I can simplify this switch
if err != nil { | ||
return fmt.Errorf("failed to read variant discriminator version: %w", err) | ||
} else if variantSerializationVersion != SupportedVariantSerializationVersion { | ||
return fmt.Errorf("unsupported variant discriminator version: %d", variantSerializationVersion) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
if err != nil { | |
return fmt.Errorf("failed to read variant discriminator version: %w", err) | |
} else if variantSerializationVersion != SupportedVariantSerializationVersion { | |
return fmt.Errorf("unsupported variant discriminator version: %d", variantSerializationVersion) | |
} | |
if err != nil { | |
return fmt.Errorf("failed to read variant discriminator version: %w", err) | |
} | |
if variantSerializationVersion != SupportedVariantSerializationVersion { | |
return fmt.Errorf("unsupported variant discriminator version: %d", variantSerializationVersion) | |
} |
IMO more readable
if actualType != expectedType { | ||
t.Fatalf("case index %d Variant type index %d column type does not match: expected: \"%s\" actual: \"%s\"", i, j, expectedType, actualType) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not testify assert function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I wanted to be able to specify an index via %d
. I can check the testify asserts to see if there's a function for this
return conn | ||
} | ||
|
||
func TestVariant(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to test Variant:
- stdlib driver?
- with HTTP protocol? Do you expect any differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both stdlib and HTTP it should be the same behavior since they all use the same functions, but I agree it would be good to add tests for these cases just to verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's really important for stdlib. There is an additional abstraction over types (like Scanner) in stdlib and it might have unexpected behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Let's push ongoing conversations before merging.
PR text is pending update
Summary
Implement
Variant
column type. Partially resolves #1430.closes #1195
Implementation
This implementation adds 3 major types to the module:
ColVariant
- the column implementation for (de)serializationVariant
- a container to hold variant values (optional for (de)serialization)VariantWithType
- an extension ofVariant
, with the ability to provide a preferred type in cases where it is ambiguous to existing column type detection (such asArray(UInt8)
vsString
)ColVariant
Serialization
When values are appended via
col.AppendRow()
, the inputv interface{}
type is checked. If it isnil
, aNull
discriminator is appended. If it is aVariantWithType
, then the specified column type will be appended along with its matching discriminator. The underlying column'sAppendRow
function is re-used so that we don't need to re-implement its logic.As a catch-all, the input value will be tested against each column type until it succeeds. For example,
Variant(Bool, Int64, String)
will try to append asbool
,int64
, thenstring
. If a value does not fit into any column type, it will return an error.Sometimes types will conflict. Due to alphabetical sorting of the type,
Array(UInt8)
would be used beforeString
sinceArray
allows forstring
input. I have researched different solutions to this, including a type priority system, but it would be complex to implement. For now it is easiest to let the user simply inputNewVariantWithType(int64(42), "Int64")
orNewVariant(int64(42)).WithType("Int64")
if they want a specific type within the variant. For complex types like maps, reflection will be used if a type isn't specified.After all rows are appended, the Native format is used to serialize the data into the buffer. First with
serializationVersion
, then theuint8
array fordiscriminators
, then each column'sEncode
function is re-used as usual (similar toTuple
).Deserialization
The Native format deserializes the
discriminators
and builds a set ofoffsets
for each column. This allows for storing multiple columns with mixed lengths. When the user wants to read a row, we can index into the correct row of each column to get the corresponding type.In practice this looks like this:
Or, if you know your types ahead of time, you can also scan directly into it:
This pattern works by simply calling the underlying column's
ScanRow
function. It is safest to scan intoVariant
however.If you need to switch types on
Variant
for your own type detection, you can usevariantRow.Any()
orvariantRow.Interface()
to returnany
/interface{}
respectively (provided both for preferred semantics).Variant
Variant is simply a wrapper around
any
. It implements stdlib sql interfaces such asdriver.Value
andScan
. It also has convenience functions for primitives such asInt64
. If you need to access the underlying value you can useAny()
. This type can be constructed with theNewVariant(v)
function.The
Variant
type should be used in structs and when scanning fromColVariant
. It can also be used for insertion, althoughVariantWithType
may be required if there's overlap between types.VariantWithType
VariantWithType
is the same asVariant
, but with astring
included to specify the preferred type. You can use this for insertion when the Variant column has types that overlap. For example if you hadVariant(Array(UInt8), String)
, a Gostring
would be inserted as anArray(UInt8)
. If you wanted to force this to be a ClickHouseString
, you could useNewVariantWithType(v, "String")
to provide the preferred type. If the preferred type is not present in the Variant, the row will fail to append to the block. Types can be added on an existingVariant
by callingexampleVariant.WithType(t string)
, which will return a newVariantWithType
.Checklist
Delete items not relevant to your PR: