Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Dimension Table Join #820

Open
boer0924 opened this issue Jan 9, 2025 · 3 comments
Open

Feature Request: Support for Dimension Table Join #820

boer0924 opened this issue Jan 9, 2025 · 3 comments

Comments

@boer0924
Copy link

boer0924 commented Jan 9, 2025

I would like to request the implementation of dimension table join functionality in Arroyo, similar to the dimension table support in Flink.

Use Case:
Dimension table joins are essential for enriching streaming data by querying external static or slowly changing datasets (e.g., MySQL, PostgreSQL). This feature is particularly useful for scenarios such as:

  • Joining streaming order data with user profiles stored in a database.
  • Enriching event streams with metadata from external sources.

Key Features:

  1. Caching Strategy:
    Support caching of dimension table data in memory to enhance join performance.
    Configurable cache expiration and refresh intervals.

  2. Lookup and Join:
    Allow efficient lookups against dimension tables during streaming data processing.
    Support for handling missing or unmatched keys with default values.

  3. Data Consistency:
    Enable periodic or incremental updates to keep dimension table data consistent with the source.
    Optional support for transactional updates via mechanisms like CDC (Change Data Capture).

It would be great to see a similar feature in Arroyo to enrich its functionality for real-time data processing.

Thank you for considering this request!

@boer0924
Copy link
Author

boer0924 commented Jan 9, 2025

I would like to request the implementation of Lookup Join functionality in Arroyo, similar to Flink's, which enables efficient joins between streaming data and dimension tables, allowing real-time enrichment with static or slowly changing data from external sources.
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/joins/#lookup-join

@mwylde mwylde mentioned this issue Jan 13, 2025
@mwylde
Copy link
Member

mwylde commented Jan 13, 2025

Thanks @boer0924 for the feature request! Lookup joins are now in master (see #821 for details on syntax). Currently it just supports Redis, but MySQL and Postgres will be coming next.

@boer0924
Copy link
Author

It is so timely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants