[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

brijos · 2023-08-23T16:00:57Z

Background

Users store logs, event data, and other data in object stores, like S3, for analysis with batch-based analytics tooling (e.g., Spark). We see an opportunity to unlock complex querying (compute) and improve OpenSearch's visibility to data at rest. Given the popularity of Spark’s open source community and AWS’ investment in Spark, we are exploring an integration with OpenSearch and Spark. Spark’s complex query engine and indexing will mitigate query latency issues for data resting outside of OpenSearch.

Proposed Solution

In this release, we are building on the API based Spark work in 2.9. Users in Dashboard will be able to add a new Spark connection to OpenSearch, setup skipping, materialized views, and covering indexes, as well as search external sources from Observability Logs using PPL. Users in Discover who select a Spark data source will be moved to Observability Logs where they will be able to query using PPL.

Use Cases

Admins will be able to create a new Spark connection and limit access to the connection using OpenSearch FGAC
Admins can accelerate queries using skipping, materialized views, and covering indexes, limit access to who can manage indexes using OpenSearch FGAC, and set index trim settings to for performance and privacy purposes
OpenSearch Dashboards users who use discover will be switched to Observability Logs when they select an external source via Spark and will use OpenSearch PPL as a query language (SQL will be supported using APIs)
Users querying using Observability Logs will be able to direct query external

Resources

vagimeli · 2023-09-25T17:50:53Z

@brijos Please confirm this is on track for release in 2.11. I'll start drafting documentation in the meantime. Thanks :)

brijos · 2023-09-28T13:53:54Z

Confirmed @vagimeli

brijos · 2023-10-13T23:18:01Z

We weren't able to finish everything that we wanted to for the release. We will iteratively launch, and document, what we have, but we will push out formal support into an upcoming release.

anirudha · 2023-12-04T21:17:39Z

this was released as a part of 2.11 releae

brijos added enhancement New feature or request untriaged labels Aug 23, 2023

joshuali925 removed the untriaged label Aug 28, 2023

brijos mentioned this issue Sep 21, 2023

[DOC] Spark Support - Dashboard Sources, Materialized Views, and Covering Indexes opensearch-project/documentation-website#5061

Closed

3 tasks

anirudha closed this as completed Jan 22, 2024

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.11.0 - (Launched) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

brijos commented Aug 23, 2023

vagimeli commented Sep 25, 2023

brijos commented Sep 28, 2023 •

edited

Loading

brijos commented Oct 13, 2023

anirudha commented Dec 4, 2023 •

edited

Loading

[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

Comments

brijos commented Aug 23, 2023

Background

Proposed Solution

Use Cases

Resources

vagimeli commented Sep 25, 2023

brijos commented Sep 28, 2023 • edited Loading

brijos commented Oct 13, 2023

anirudha commented Dec 4, 2023 • edited Loading

brijos commented Sep 28, 2023 •

edited

Loading

anirudha commented Dec 4, 2023 •

edited

Loading