Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-numerical key col(s) #8

Closed
metasoarous opened this issue Dec 11, 2015 · 9 comments
Closed

Support non-numerical key col(s) #8

metasoarous opened this issue Dec 11, 2015 · 9 comments

Comments

@metasoarous
Copy link
Contributor

Would be great if non-numerical key columns were supported.

From @lbradstreet over Gitter:

I spent a bit of time thinking about how you could do it for non-integer keys but the solutions I've come up with are non-ideal. You could use rownums for the queries but the queries won't be stable, so that won't work. You could hash / mod the key, but then you'll have to do a full tablescan downstream to get the results

If it's possible, having a unique identifier specified over multiple columns would be valuable as well.

@mushketyk
Copy link
Contributor

@lbradstreet I would like to implement it. Could you give me an insight on how you wanted to implement this?

@sventech
Copy link

Would "non-numerical" include UUIDv4 (random 128bit integers displayed as text), or do those work OK?

@mushketyk
Copy link
Contributor

@sventech Now only numerical ids are supported, but this change should add support for UUIDs.

@sventech
Copy link

@mushketyk Actually UUIDv4 is underlyingly numerical (stored as 128 bit integer in PSQL) but I'm not sure if their implementation will treat it as such, or if the column also needs to be monotonically increasing and represented as an int in JDBC land. But you're right, if you add support for any column type it will cover UUID regardless.

@sventech
Copy link

Sorry, I forgot this is meant to be cross-platform, so PSQL-specific stuff is not relevant. The partition-table function appears to be the main sticking point -- it does appear to assume a monotonically increasing integer field due to the use of min() and max() on the column.

@lbradstreet
Copy link
Member

@sventech, yup, we should be able to do something fairly cross-platform by doing the following:

If we assume we want n-rows-per-partition, split the uuid key space into n-partitions=(count *)/n-rows-per-partition. Then we select between those ranges, as we currently do with the integer keys.

@sventech
Copy link

Hmm, it looks like for several DB engines, using count(*) instead of max() would generally be faster as well:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=71403

http://stackoverflow.com/questions/5891584/maxsomeattribute-or-select-count-from-table-name-in-sql-server

If the behavior you describe became default, would it break backwards compatibility? Also, it sounds like that would solve the single-column case but perhaps not composite keys (e.g., multiple-columns). But maybe I'm now over my head. :-)

@lbradstreet
Copy link
Member

I think we would have to keep max for integer keyed tables, because the IDs are not uniformly distributed over the key space. With UUIDs we can split up the key space like this because we know that the partitions we generate will contain roughly X rows each.

On 27 Jan 2016, at 1:55 AM, Sven Pedersen [email protected] wrote:

Hmm, it looks like for several DB engines, using count(*) instead of max() would generally be faster as well:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=71403

http://stackoverflow.com/questions/5891584/maxsomeattribute-or-select-count-from-table-name-in-sql-server

If the behavior you describe became default, would it break backwards compatibility? Also, it sounds like that would solve the single-column case but perhaps not composite keys (e.g., multiple-columns). But maybe I'm now over my head. :-)


Reply to this email directly or view it on GitHub.

@MichaelDrogalis
Copy link
Contributor

Completed by @mushketyk. Available on Clojars under version 0.8.11.1-SNAPSHOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants