Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support JOIN #2970

Closed
petermattis opened this issue Oct 30, 2015 · 10 comments
Closed

sql: support JOIN #2970

petermattis opened this issue Oct 30, 2015 · 10 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Milestone

Comments

@petermattis
Copy link
Collaborator

Support JOIN in all of its wonderful incarnations. The initial implementation should focus on correctness on not worry about optimizing the join order based on table statistics.

@petermattis petermattis added this to the 1.0 milestone Oct 30, 2015
@jess-edwards jess-edwards mentioned this issue Oct 30, 2015
78 tasks
@petermattis petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed SQL labels Feb 13, 2016
@Freeaqingme
Copy link

Has there been any work done on this feature behind the scenes? If not, is there perhaps some design documentation already available?

It could be fun to try to give this one a shot to contribute...

@RaduBerinde
Copy link
Member

We have been thinking about the more general problem of how to distribute SQL computation across the cluster, there is an RFC at https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/distributed_sql.md

We are aiming to have some limited join support before that, using code that will be reusable later within the distributed SQL framework. But even that will involve quite a bit of code restructuring, TBH this doesn't seem like a good starter project for someone who isn't familiar with the codebase.

@Freeaqingme
Copy link

Alright. I will let this one slide then, see if there's something interesting with the help-wanted label.

Tnx!

@dt
Copy link
Member

dt commented May 2, 2016

@Freeaqingme
A few options come to mind that might be well suited for getting started with the codebase:

@electrum
Copy link

electrum commented May 4, 2016

Have you considered writing a Presto connector for CockroachDB? Presto is a full distributed SQL query engine with pluggable connectors (data sources) and supports distributed joins, including joins between different connectors.

We support batch index joins for tables that are indexed on the join key and otherwise support broadcast and distributed hash joins.

@petermattis
Copy link
Collaborator Author

@electrum Presto appears targeted at analytics, while CockroachDB is targeted at transactional workloads. Beyond that, Presto is written in Java while CockroachDB is written in Go. Calling out to Java for SQL execution doesn't seem good from a performance perspective for transactional workloads.

@electrum
Copy link

electrum commented May 5, 2016

@petermattis You're correct, Presto is definitely targeted at analytics, although the engine itself is capable of low latency queries. We have an internal connector at Facebook based on a sharded MySQL backend that can do complex, multi-way index join queries for reporting workloads in hundreds of milliseconds: https://www.youtube.com/watch?v=Gf9JqvNNRZg

I'm definitely not suggesting calling out to or trying to use Presto within CoackroachDB -- as you say, that would be horrible for transactional workloads, nor is it technically feasible. However, it could be a good complement for other workloads like reporting, analytics, ETL, batch pipelines, combining heterogeneous data sources, etc., and might also serve as a stop gap.

@electrum
Copy link

electrum commented May 5, 2016

Unrelated, I really like that design document and all the rest of the documentation for the project. It's probably the best documented project I've seen and is a model for others to strive towards.

@petermattis
Copy link
Collaborator Author

CockroachDB speaks the postgres wire protocol and our SQL is similar to the PostgreSQL dialect. The existing presto-postgres connector might work (with some adjustments).

Thanks for the note about the documentation. Is it nice to to hear those efforts are being recognized and appreciated.

@tamird
Copy link
Contributor

tamird commented Jul 2, 2016

I'm going to close this now #7202 is in. There's more work to be done, but the spirit of this issue is implemented.

@tamird tamird closed this as completed Jul 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

6 participants