Rework type system #312

asdine · 2020-11-09T19:33:05Z

This PR makes an important change in Genji's type system and indexes.

Background

Currently, Genji parses integers and doubles based on their literal representations:

1 -- integer
2.0 -- double

INSERT INTO foo (a) VALUES 
(1), -- integer
(1.2), -- double
(1.3); -- double

This means that the same table can have both integers and doubles depending on how values have been parsed.
Initially, comparison operators were very strict: only values of the same type can be compared together.
However, this led to situations like this:

SELECT * FROM foo WHERE a > 0;
{
  "a": 1
}

Because 0 is parsed as an integer, it was only compared to integers, skipping double values.

We then changed the behavior of comparison operators to support an edge case: If we compare integers and doubles, convert the integer to a double prior comparison.
Since comparison operators are also used for ordering, we expect the following behavior:

SELECT * FROM foo WHERE a > 0 ORDER BY a DESC;
{
  "a": 1.3
}
{
  "a": 1.2
}
{
  "a": 1
}

However, this kind of behavior is also expected if a is indexed:

Integers and doubles must be indexed together
Comparing integers together must be lossless (no conversion to double)
Ordering must be respected

And this is very complex. As of today, we still don't have proper support for the above requirements.

This led for example to the creation of the binarysort package (former key package) which encodes integers and doubles on 16 bytes. While this works, this adds too much complexity and performance issues during index iteration, while obviously taking much more space.

Solution

The solution proposed here is to change the rules: numbers are stored as doubles by default

The parser doesn't change, integers are scanned as integers, doubles as doubles.
Comparison and arithmetic operators keep the same behavior.

SELECT 1 + 1;
{
  "1 + 1": 2 // -- 2 here is an integer
}

The only difference is that right before storing the document, every integer field will be converted to double if there is no explicit constraint on that field.

This means that one field can never be both a double or an integer on two different documents:

No field constraint: All numbers stored as doubles
Field constraint: All numbers stored as specified by the field constraint

This new property allows us to simplify index implementation and to remove all the edge cases.

INSERT INTO foo (a) VALUES 
(1), -- parsed as int, stored as double
(1.5), -- parsed as double, stored as double
([1, 2]); -- array and sub-document values are also stored as double
SELECT * FROM foo WHERE a > 0 ORDER BY a DESC;
{
  "a": [1, 2]
}
{
  "a": 1.5
}
{
  "a": 1
}

…ent-packages

Co-authored-by: Ivan Trubach <[email protected]>

…genji into merge-key-and-document-packages

…ent-packages

codecov-io · 2020-11-09T20:37:58Z

Codecov Report

Merging #312 (5146a0f) into master (a42cf4e) will increase coverage by 0.40%.
The diff coverage is 61.13%.

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
+ Coverage   61.37%   61.78%   +0.40%     
==========================================
  Files          73       74       +1     
  Lines        6701     6837     +136     
==========================================
+ Hits         4113     4224     +111     
+ Misses       2053     2046       -7     
- Partials      535      567      +32

Impacted Files	Coverage Δ
sql/planner/hash_set.go	`0.00% <0.00%> (ø)`
sql/planner/sort.go	`16.66% <0.00%> (+1.38%)`	⬆️
sql/query/expr/comparison.go	`35.74% <10.00%> (+11.31%)`	⬆️
binarysort/binarysort.go	`39.53% <39.53%> (ø)`
document/encoding/custom/codec.go	`64.15% <42.85%> (-0.60%)`	⬇️
sql/planner/input.go	`43.83% <43.47%> (+5.50%)`	⬆️
database/table.go	`52.91% <44.44%> (-3.37%)`	⬇️
document/document.go	`69.81% <54.28%> (-6.12%)`	⬇️
document/array.go	`69.23% <55.55%> (-4.11%)`	⬇️
document/value_encoding.go	`68.42% <68.42%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a42cf4e...3eae6ea. Read the comment docs.

asdine and others added 18 commits October 30, 2020 12:17

Add nsb package

02f99e4

Replace calls to key package by nsb package

4c55420

Add ValueEncoder

d3da97b

Test ValueEncoder

d2b9fdc

Add value.UnmarshalBinary

04f30ac

Test BinaryMarshaling

5ee1dff

Delete key package

d89b38b

Move pkg/nsb to binarysort

ee81ec0

Merge remote-tracking branch 'origin/master' into merge-key-and-docum…

4132512

…ent-packages

Apply suggestions from code review

e4186ac

Co-authored-by: Ivan Trubach <[email protected]>

Merge branch 'merge-key-and-document-packages' of github.com:genjidb/…

96c17a3

…genji into merge-key-and-document-packages

Merge remote-tracking branch 'origin/master' into merge-key-and-docum…

279d159

…ent-packages

Simplify int encoding

424a6e6

Add Apply and ValidateConstraint

0eefe76

Normalize comparison

7ba1226

Add error checks to binarysort

d8ca24a

Simplify ValueEncoder

5dde0ac

Rework how integers and doubles are managed

5146a0f

asdine mentioned this pull request Nov 9, 2020

Merge key and document packages #306

Closed

asdine marked this pull request as ready for review November 9, 2020 20:09

Convert default value to double

3eae6ea

asdine merged commit 2a9a526 into master Nov 10, 2020

asdine deleted the rework-type-system branch November 10, 2020 17:25

asdine added this to the v0.9.0 milestone Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework type system #312

Rework type system #312

asdine commented Nov 9, 2020 •

edited

Loading

codecov-io commented Nov 9, 2020 •

edited

Loading

Rework type system #312

Rework type system #312

Conversation

asdine commented Nov 9, 2020 • edited Loading

Background

Solution

codecov-io commented Nov 9, 2020 • edited Loading

Codecov Report

asdine commented Nov 9, 2020 •

edited

Loading

codecov-io commented Nov 9, 2020 •

edited

Loading