Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework type system #312

Merged
merged 19 commits into from
Nov 10, 2020
Merged

Rework type system #312

merged 19 commits into from
Nov 10, 2020

Conversation

asdine
Copy link
Collaborator

@asdine asdine commented Nov 9, 2020

This PR makes an important change in Genji's type system and indexes.

Background

Currently, Genji parses integers and doubles based on their literal representations:

1 -- integer
2.0 -- double

INSERT INTO foo (a) VALUES 
(1), -- integer
(1.2), -- double
(1.3); -- double

This means that the same table can have both integers and doubles depending on how values have been parsed.
Initially, comparison operators were very strict: only values of the same type can be compared together.
However, this led to situations like this:

SELECT * FROM foo WHERE a > 0;
{
  "a": 1
}

Because 0 is parsed as an integer, it was only compared to integers, skipping double values.

We then changed the behavior of comparison operators to support an edge case: If we compare integers and doubles, convert the integer to a double prior comparison.
Since comparison operators are also used for ordering, we expect the following behavior:

SELECT * FROM foo WHERE a > 0 ORDER BY a DESC;
{
  "a": 1.3
}
{
  "a": 1.2
}
{
  "a": 1
}

However, this kind of behavior is also expected if a is indexed:

  • Integers and doubles must be indexed together
  • Comparing integers together must be lossless (no conversion to double)
  • Ordering must be respected

And this is very complex. As of today, we still don't have proper support for the above requirements.

This led for example to the creation of the binarysort package (former key package) which encodes integers and doubles on 16 bytes. While this works, this adds too much complexity and performance issues during index iteration, while obviously taking much more space.

Solution

The solution proposed here is to change the rules: numbers are stored as doubles by default

The parser doesn't change, integers are scanned as integers, doubles as doubles.
Comparison and arithmetic operators keep the same behavior.

SELECT 1 + 1;
{
  "1 + 1": 2 // -- 2 here is an integer
}

The only difference is that right before storing the document, every integer field will be converted to double if there is no explicit constraint on that field.

This means that one field can never be both a double or an integer on two different documents:

  • No field constraint: All numbers stored as doubles
  • Field constraint: All numbers stored as specified by the field constraint

This new property allows us to simplify index implementation and to remove all the edge cases.

INSERT INTO foo (a) VALUES 
(1), -- parsed as int, stored as double
(1.5), -- parsed as double, stored as double
([1, 2]); -- array and sub-document values are also stored as double
SELECT * FROM foo WHERE a > 0 ORDER BY a DESC;
{
  "a": [1, 2]
}
{
  "a": 1.5
}
{
  "a": 1
}

@asdine asdine marked this pull request as ready for review November 9, 2020 20:09
@codecov-io
Copy link

codecov-io commented Nov 9, 2020

Codecov Report

Merging #312 (5146a0f) into master (a42cf4e) will increase coverage by 0.40%.
The diff coverage is 61.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #312      +/-   ##
==========================================
+ Coverage   61.37%   61.78%   +0.40%     
==========================================
  Files          73       74       +1     
  Lines        6701     6837     +136     
==========================================
+ Hits         4113     4224     +111     
+ Misses       2053     2046       -7     
- Partials      535      567      +32     
Impacted Files Coverage Δ
sql/planner/hash_set.go 0.00% <0.00%> (ø)
sql/planner/sort.go 16.66% <0.00%> (+1.38%) ⬆️
sql/query/expr/comparison.go 35.74% <10.00%> (+11.31%) ⬆️
binarysort/binarysort.go 39.53% <39.53%> (ø)
document/encoding/custom/codec.go 64.15% <42.85%> (-0.60%) ⬇️
sql/planner/input.go 43.83% <43.47%> (+5.50%) ⬆️
database/table.go 52.91% <44.44%> (-3.37%) ⬇️
document/document.go 69.81% <54.28%> (-6.12%) ⬇️
document/array.go 69.23% <55.55%> (-4.11%) ⬇️
document/value_encoding.go 68.42% <68.42%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a42cf4e...3eae6ea. Read the comment docs.

@asdine asdine merged commit 2a9a526 into master Nov 10, 2020
@asdine asdine deleted the rework-type-system branch November 10, 2020 17:25
@asdine asdine added this to the v0.9.0 milestone Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants