Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow primary keys to contain address columns sorted by their resolved values. #8870

Merged
merged 13 commits into from
Feb 24, 2025

Conversation

nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Feb 19, 2025

This is the Dolt part of this change.

GMS PR: dolthub/go-mysql-server#2854
Doltgres PR: dolthub/doltgresql#1214

The goal of the change is to allow for indexes to use an out-of-line variable-length type (like TEXT or BLOB) as a primary key while still storing just the address in the index (instead of being forced to store a prefix of the value).

As a result of this change, any tuple comparison operation may need to resolve a hash in the NodeStore. This poses two complications:

  1. The tuple logic exists at a much lower level than the node store and can't depend on it without creating a dependency cycle. We get around this with a new ValueStore interface that can store and retrieve variable-length bytestrings by their content hash. NodeStore is the only implementation of this interface, but decoupling the interface from the implementation allows us to not depend on NodeStore's internals when passing it to lower-level code.

  2. Tuple comparison operations can now end up doing disk IO, which means they need a context parameter.

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
6e715ae ok 5937457
version total_tests
6e715ae 5937457
correctness_percentage
100.0

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -669,3 +670,53 @@ func (td TupleDesc) Equals(other TupleDesc) bool {
}
return true
}

type AddressTypeHandler struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a type comment

"github.com/dolthub/dolt/go/store/hash"
)

type ValueStore interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for this type and its methods

}
}

func (handler AddressTypeHandler) SerializedCompare(ctx context.Context, v1 []byte, v2 []byte) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a criticism, but I'm not sure this is in the spirit of this method. I.e. if you have to deref an address to get the value to compare, this is probably going to be very slow, and the point of this method is to be faster than deserializing the values just to compare them.

Not really sure what to do about it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that for very large values we only need to load a single chunk at a time: they might differ in the very first chunk. But that would only work if the child handler has certain properties that we can't guarantee. I added a TODO.

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
886e684 ok 5937457
version total_tests
886e684 5937457
correctness_percentage
100.0

@nicktobey nicktobey merged commit 4e02d57 into main Feb 24, 2025
21 checks passed
@nicktobey nicktobey deleted the nicktobey/textindex branch February 24, 2025 22:59
@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
53f404c ok 5937457
version total_tests
53f404c 5937457
correctness_percentage
100.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants