Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for Same-Table-JOINs #1623

Conversation

acarbonetto
Copy link
Collaborator

Description

Added documentation for a same-table join proposal.

related: https://opensearch.org/docs/latest/field-types/join/

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* Add opensearch-same-table-join-query.md

Signed-off-by: Andrew Carbonetto <[email protected]>

* Update opensearch-same-table-join-query.md for first-pass review

Signed-off-by: Andrew Carbonetto <[email protected]>

* Add same-table-joins ER relations

Signed-off-by: Andrew Carbonetto <[email protected]>

* Update document for review comments

Signed-off-by: Andrew Carbonetto <[email protected]>

---------

Signed-off-by: Andrew Carbonetto <[email protected]>
@acarbonetto acarbonetto force-pushed the integ-design-same-table-joins branch from 055c21b to f7feaab Compare June 6, 2023 21:52
@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.27%. Comparing base (691012d) to head (886e076).
Report is 362 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1623   +/-   ##
=========================================
  Coverage     97.27%   97.27%           
  Complexity     4330     4330           
=========================================
  Files           388      388           
  Lines         10807    10807           
  Branches        761      761           
=========================================
  Hits          10513    10513           
  Misses          287      287           
  Partials          7        7           
Flag Coverage Δ
sql-engine 97.27% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


## 3. Assumptions/Dependencies

* _shard must be defined for any query using a same-table join
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?


* _shard must be defined for any query using a same-table join
* Same-Table joins requires that the following work is complete:
* Table/Indices can be aliased, for example: `SELECT t.fieldA, t.fieldB FROM table t WHERE t.fieldA = t.fieldB`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an example with join?


### 3.3. Storage Type

* `reference to opensearch doc`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs?


Summary:

Example query (SQL):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share the sample data to make example easier to understand. Are you using dataset from IT? In that case it is enough to refer a file in IT and rename index name in the query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can add this the sample data to the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it on the very bottom. Please add a link here to the data.


### 4.3 Case SELECT... FROM... JOIN ON... WHERE parent relation

Summary: In this case, we are joining from `child` to `parent` using the `has_parent` relation query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are doing the same in 4.1, aren't you?


### 6.1 Parser Syntax

```antlrv4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```antlrv4
```antlr

### 6.1 Parser Syntax

```antlrv4
fromClause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mark by comments to identify new/changed and old syntax rules or show it as a diff

Relation->>DataSourceService: getTable
DataSourceService-->>Relation: Table
Relation->>Table: getFieldTypes
Table->>TypeEnvironment: define
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing returns from TypeEnvironment and Table back to Relation

JoinRelation->>TypeEnvironment: getOpenSearchJoinType
TypeEnvironment->>JoinRelation: OpenSearchJoinType
JoinRelation->>JoinRelation: setJoinRelationContext
JoinRelation->>Analyzer: <<LogicalRelation>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JoinRelation->>Analyzer: <<LogicalRelation>>
JoinRelation->>Analyzer: LogicalRelation


## Annex B - Data for examples

The following section includes data that has been formatted to be used by the OpenSearch `got/_bulk` ingest endpoint.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that all data to be added by single index named got (not multiple indices)

@acarbonetto acarbonetto marked this pull request as draft August 8, 2023 17:37
@Swiddis
Copy link
Collaborator

Swiddis commented Dec 27, 2024

Is this still in progress? Should we close it?

@Swiddis
Copy link
Collaborator

Swiddis commented Dec 27, 2024

Closing as stale -- feel free to reopen if work is resumed

@Swiddis Swiddis closed this Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants