You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to request the implementation of dimension table join functionality in Arroyo, similar to the dimension table support in Flink.
Use Case:
Dimension table joins are essential for enriching streaming data by querying external static or slowly changing datasets (e.g., MySQL, PostgreSQL). This feature is particularly useful for scenarios such as:
Joining streaming order data with user profiles stored in a database.
Enriching event streams with metadata from external sources.
Key Features:
Caching Strategy:
Support caching of dimension table data in memory to enhance join performance.
Configurable cache expiration and refresh intervals.
Lookup and Join:
Allow efficient lookups against dimension tables during streaming data processing.
Support for handling missing or unmatched keys with default values.
Data Consistency:
Enable periodic or incremental updates to keep dimension table data consistent with the source.
Optional support for transactional updates via mechanisms like CDC (Change Data Capture).
It would be great to see a similar feature in Arroyo to enrich its functionality for real-time data processing.
Thank you for considering this request!
The text was updated successfully, but these errors were encountered:
Thanks @boer0924 for the feature request! Lookup joins are now in master (see #821 for details on syntax). Currently it just supports Redis, but MySQL and Postgres will be coming next.
I would like to request the implementation of dimension table join functionality in Arroyo, similar to the dimension table support in Flink.
Use Case:
Dimension table joins are essential for enriching streaming data by querying external static or slowly changing datasets (e.g., MySQL, PostgreSQL). This feature is particularly useful for scenarios such as:
Key Features:
Caching Strategy:
Support caching of dimension table data in memory to enhance join performance.
Configurable cache expiration and refresh intervals.
Lookup and Join:
Allow efficient lookups against dimension tables during streaming data processing.
Support for handling missing or unmatched keys with default values.
Data Consistency:
Enable periodic or incremental updates to keep dimension table data consistent with the source.
Optional support for transactional updates via mechanisms like CDC (Change Data Capture).
It would be great to see a similar feature in Arroyo to enrich its functionality for real-time data processing.
Thank you for considering this request!
The text was updated successfully, but these errors were encountered: