Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Spark Support - Dashboard Connections/Sources, Materialized Views, and Covering Indexes #2027

Closed
brijos opened this issue Aug 23, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@brijos
Copy link

brijos commented Aug 23, 2023

Background

Users store logs, event data, and other data in object stores, like S3, for analysis with batch-based analytics tooling (e.g., Spark). We see an opportunity to unlock complex querying (compute) and improve OpenSearch's visibility to data at rest. Given the popularity of Spark’s open source community and AWS’ investment in Spark, we are exploring an integration with OpenSearch and Spark. Spark’s complex query engine and indexing will mitigate query latency issues for data resting outside of OpenSearch.

Proposed Solution

In this release, we are building on the API based Spark work in 2.9. Users in Dashboard will be able to add a new Spark connection to OpenSearch, setup skipping, materialized views, and covering indexes, as well as search external sources from Observability Logs using PPL. Users in Discover who select a Spark data source will be moved to Observability Logs where they will be able to query using PPL.

Use Cases

  • Admins will be able to create a new Spark connection and limit access to the connection using OpenSearch FGAC
  • Admins can accelerate queries using skipping, materialized views, and covering indexes, limit access to who can manage indexes using OpenSearch FGAC, and set index trim settings to for performance and privacy purposes
  • OpenSearch Dashboards users who use discover will be switched to Observability Logs when they select an external source via Spark and will use OpenSearch PPL as a query language (SQL will be supported using APIs)
  • Users querying using Observability Logs will be able to direct query external

Resources

@vagimeli
Copy link

@brijos Please confirm this is on track for release in 2.11. I'll start drafting documentation in the meantime. Thanks :)

@brijos
Copy link
Author

brijos commented Sep 28, 2023

Confirmed @vagimeli

@brijos
Copy link
Author

brijos commented Oct 13, 2023

We weren't able to finish everything that we wanted to for the release. We will iteratively launch, and document, what we have, but we will push out formal support into an upcoming release.

@anirudha
Copy link
Collaborator

anirudha commented Dec 4, 2023

this was released as a part of 2.11 releae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 2.11.0 - (Launched)
Development

No branches or pull requests

4 participants